From HemeraBook
Jump to: navigation, search

LIUM models

Follow these instructions, to setup the French models provided by LIUM.

Language model

  • download language model
  • untar trigram_LM.tar, and gunzip trigram_LM.DMP.gz
  • (only if you have compiled sphinxbase) update temporarily the PATH and LD_LIBRARY_PATH to use your version of sphinx_lm_convert (adapt the path if needed)
export LD_LIBRARY_PATH=HEMERA_TP_PATH/_fromSource/sphinxbase/src/libsphinxbase/.libs:$LD_LIBRARY_PATH
export PATH=HEMERA_TP_PATH/_fromSource/sphinxbase/src/sphinx_lmtools/.libs:$PATH 
  • convert language model file in UTF-8 (using sphinx_lm_convert which supports utf8 encoding, which is NOT the case of sphinx3_lm_convert)
sphinx_lm_convert -i trigram_LM.DMP -ifmt DMP -ienc iso8859-1 -o trigram_LM.DMP.utf8 -oenc utf8 -ofmt DMP
  • move language model file
 mkdir -p HEMERA_TP_PATH/speechRecognition/data/models/language/lium/3g/
 mv trigram_LM.DMP.utf8 HEMERA_TP_PATH/speechRecognition/data/models/language/lium/3g/
  • Check/update the following configuration elements:

Lexical model

iconv -f iso88591 -t utf8 fillers_dict > fillers_dict.utf8
iconv -f iso88591 -t utf8 words_dict > words_dict.utf8
  • move lexical model files
mkdir -p HEMERA_TP_PATH/speechRecognition/data/models/lexical/lium/
mv fillers_dict.utf8 words_dict.utf8 HEMERA_TP_PATH/speechRecognition/data/models/lexical/lium/
  • Check/update the following configuration elements:

Acoustic model

  • download acoustic model
  • uncompress in HEMERA_TP_PATH/speechRecognition/data/models/acoustic
  • rename lium_acoustic_models as lium
  • Check/update the following configuration elements: