Difference between revisions of "Third-party:SpeechRecognition:Models:fr"

From HemeraBook
Jump to: navigation, search
(Lexical model)
(Lexical model)
 
Line 28: Line 28:
 
* download [http://liumtools.univ-lemans.fr/speechLIUMtools/Downloads/linguistic_resources/dictionary/fillers_dict.gz lexical fillers]
 
* download [http://liumtools.univ-lemans.fr/speechLIUMtools/Downloads/linguistic_resources/dictionary/fillers_dict.gz lexical fillers]
 
* convert lexical model files in UTF-8
 
* convert lexical model files in UTF-8
  iconv -f iso88591 -t utf8 fillers_dict > fillers_dict.uft8
+
  iconv -f iso88591 -t utf8 fillers_dict > fillers_dict.utf8
  iconv -f iso88591 -t utf8 words_dict > words_dict.uft8
+
  iconv -f iso88591 -t utf8 words_dict > words_dict.utf8
 
* move lexical model files
 
* move lexical model files
 
  mkdir -p [[Appendix#HEMERA_TP_PATH |HEMERA_TP_PATH]]/speechRecognition/data/models/lexical/lium/
 
  mkdir -p [[Appendix#HEMERA_TP_PATH |HEMERA_TP_PATH]]/speechRecognition/data/models/lexical/lium/
  mv fillers_dict.uft8 words_dict.uft8 [[Appendix#HEMERA_TP_PATH |HEMERA_TP_PATH]]/speechRecognition/data/models/lexical/lium/
+
  mv fillers_dict.utf8 words_dict.utf8 [[Appendix#HEMERA_TP_PATH |HEMERA_TP_PATH]]/speechRecognition/data/models/lexical/lium/
 
* Check/update the following configuration elements:
 
* Check/update the following configuration elements:
 
  '''hemera.core.speechRecognition.sphinx3.lexicalModel.*'''
 
  '''hemera.core.speechRecognition.sphinx3.lexicalModel.*'''

Latest revision as of 18:03, 14 December 2011

LIUM models

Follow these instructions, to setup the French models provided by LIUM.

Language model

  • download language model
  • untar trigram_LM.tar, and gunzip trigram_LM.DMP.gz
  • (only if you have compiled sphinxbase) update temporarily the PATH and LD_LIBRARY_PATH to use your version of sphinx_lm_convert (adapt the path if needed)
export LD_LIBRARY_PATH=HEMERA_TP_PATH/_fromSource/sphinxbase/src/libsphinxbase/.libs:$LD_LIBRARY_PATH
export PATH=HEMERA_TP_PATH/_fromSource/sphinxbase/src/sphinx_lmtools/.libs:$PATH 
  • convert language model file in UTF-8 (using sphinx_lm_convert which supports utf8 encoding, which is NOT the case of sphinx3_lm_convert)
sphinx_lm_convert -i trigram_LM.DMP -ifmt DMP -ienc iso8859-1 -o trigram_LM.DMP.utf8 -oenc utf8 -ofmt DMP
  • move language model file
 mkdir -p HEMERA_TP_PATH/speechRecognition/data/models/language/lium/3g/
 mv trigram_LM.DMP.utf8 HEMERA_TP_PATH/speechRecognition/data/models/language/lium/3g/
  • Check/update the following configuration elements:
 hemera.core.speechRecognition.sphinx3.languageModel.*


Lexical model

iconv -f iso88591 -t utf8 fillers_dict > fillers_dict.utf8
iconv -f iso88591 -t utf8 words_dict > words_dict.utf8
  • move lexical model files
mkdir -p HEMERA_TP_PATH/speechRecognition/data/models/lexical/lium/
mv fillers_dict.utf8 words_dict.utf8 HEMERA_TP_PATH/speechRecognition/data/models/lexical/lium/
  • Check/update the following configuration elements:
hemera.core.speechRecognition.sphinx3.lexicalModel.*


Acoustic model

  • download acoustic model
  • uncompress in HEMERA_TP_PATH/speechRecognition/data/models/acoustic
  • rename lium_acoustic_models as lium
  • Check/update the following configuration elements:
hemera.core.speechRecognition.sphinx3.acousticModel.*