Team:Valencia Biocampus/talking

From 2012.igem.org

(Difference between revisions)
(VOICE RECOGNIZER)
(VOICE RECOGNIZER)
Line 118: Line 118:
-->
-->
-
<b>MODELO DE LENGUAJE</b>
+
<b>LANGUAGE MODEL</b>
---------------------------------
---------------------------------
 +
Language model refers to the grammar on which the recognizer will work and specify the phrases that it will be able to identify.
 +
 +
In Julius, recognition grammar is composed by two separate files:<br><br>
 +
 +
- .voca: List of words that the grammar contains. <br>
 +
- .grammar: specifies the grammar of the language to be recognised.<br><br>
 +
 +
Both files must be converted to .dfa and to .dict using the grammar compilator "mkdfa.pl"
 +
The .dfa file generated represents a finite automat. The .dict archive contains the dictionary of words in Julius format.
 +
 +
 +
<!--
El modelo de lenguaje hace referencia a la gramática sobre la que el reconocedor va a trabajar y especificará las frases que podrá identificar.
El modelo de lenguaje hace referencia a la gramática sobre la que el reconocedor va a trabajar y especificará las frases que podrá identificar.
En Julius, la gramática de reconocimiento se compone por dos archivos separados: <br><br>
En Julius, la gramática de reconocimiento se compone por dos archivos separados: <br><br>
Line 126: Line 138:
Ambos archivos deben ser convertidos a .dfa y a .dict utilizando el compilador de gramáticas "mkdfa.pl"
Ambos archivos deben ser convertidos a .dfa y a .dict utilizando el compilador de gramáticas "mkdfa.pl"
El archivo generado .dfa representa un autómata finito y el archivo .dict contiene el diccionario de palabras en el formato de Julius.
El archivo generado .dfa representa un autómata finito y el archivo .dict contiene el diccionario de palabras en el formato de Julius.
 +
-->
<br>
<br>
Line 135: Line 148:
<br><br>
<br><br>
 +
Once the acoustic language and the language model have been defined, all we need is the implementation of the main program in Python.
 +
 +
<!--
Una vez hemos definido el modelo acústico y el modelo de lenguaje, únicamente falta la implementación del programa principal en Phyton.
Una vez hemos definido el modelo acústico y el modelo de lenguaje, únicamente falta la implementación del programa principal en Phyton.
En él ejecutamos Julius y a través del reconocimiento de voz identificamos la pregunta realizada al cultivo.
En él ejecutamos Julius y a través del reconocimiento de voz identificamos la pregunta realizada al cultivo.
-
 
+
-->
<!--  
<!--  
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Revision as of 09:39, 31 August 2012



Talking Interfaces


THE PROCESS


The main objective of our project is to accomplish a verbal communication with our microorganisms. To do that, we need to establish the following process:




  1. The basic life cycle of our biological agent is based on an input/output process through the use of interfaces.
  2. The input used is a voice signal (question), which will be collected by our voice recognizer.
  3. The voice recognizer identifies the question and, through the program in charge of establishing the communication, its corresponding identifier is written in the assigned port of the arduino.
  4. The software of the arduino reads the written identifier and, according to it, the corresponding port is selected, indicating the flourimeter which wavelength has to be emitted on the culture. There are four possible questions (q), and each of them is associated to a different wavelength.
  5. The fluorimeter emits light (Bioinput), exciting the compound through optic filters.
  6. Due to the excitation produced, the compound emits fluorescence (BioOutput), which is measured by the fluorimeter with a sensor.
  7. This fluorescence corresponds to one of the four possible answers (r: response).
  8. The program of the arduino identifies the answer and writes its identifier in the corresponding port.
  9. The communication program reads the identifier of the answer from the port.
  10. "Espeak" emits the answer via a voice signal (Output).


In this section we analyse in detail the main element used in the process:

  • Voice recognizer
  • Arduino
  • Fluorimeter


VOICE RECOGNIZER

Julius is a continuous speech real-time recongizer engine. It is based on Markov's interpretation of hidden models. It's opensource and distributed with a BSD licence. Its main platfomorm is Linux and other Unix systems, but it also works in Windows. It has been developed as a part of a free software kit for research in large-vocabulary continuous speech recognition (LVCSR) from 1977, and the Kyoto University of Japan has continued the work from 1999 to 2003.

In order to use Julius, it is necessary to create a language model and an acoustic model. Julius adopts the acoustic models and the pronunctiation dictionaries from the HTK software, which is not opensource, but can be used and downloaded for its use and posterior generation of acoustic models.

ACOUSTIC MODEL


An acoustic model is a file which contains an statistical representation of each of the different sounds that form a word (phoneme). Julius allows voice recognition through continuous dictation or by using a previously introduced grammar. However, the use of continuous dictation carries a problem. It requires an acoustic model trained with lots of voice files. As the amount of sound files containing voices and different texts increases, the ratio of good hits of the acoustic model will improve. This implies several variations: the pronunciation of the person training the model, the dialect used, etc. Nowadays, Julius doesn't count with any model good enough for continuous dictation in English.

Due to the different problems presented by this kind of recognition, we chose the second option. We designed a grammar using the Voxforce acoustic model based in Markov's Hidden Models. To do this, we need the following file:

-file .dict:a list of all the words that we want our culture to recognize and its corresponding decomposition into phonemes.


  • Acoustic Analysis
    Acoustic models take the acoustic properties of the input signal. They acquire a group of vectors of certain characteristics that will later be compared with a group of patterns that represent symbols of a phonetic alphabet and return the symbols which resembles them the most. This is the basis of the mathematical probabilistic process called Hidden Markov Model. The acoustic analysis is based on the extraction of a vector similar to the input acoustic signal with the purpose of applying the theory of pattern recognition. This vector is a parametric representation of the acoustic signal, containing the most relevant information and storing as compressed as possible. In order to obtain a good group of vectors, the signal is pre-processed, reducing background noise and correlation.

  • HMM, Hidden Markov Model
    <rellenar>


LANGUAGE MODEL


Language model refers to the grammar on which the recognizer will work and specify the phrases that it will be able to identify.

In Julius, recognition grammar is composed by two separate files:

- .voca: List of words that the grammar contains.
- .grammar: specifies the grammar of the language to be recognised.

Both files must be converted to .dfa and to .dict using the grammar compilator "mkdfa.pl" The .dfa file generated represents a finite automat. The .dict archive contains the dictionary of words in Julius format.





Once the acoustic language and the language model have been defined, all we need is the implementation of the main program in Python.


Arduino

Fluorimeter