From HemeraBook
Jump to: navigation, search


You can use SphinxTrain provided by CMU Sphinx.
See SphinxTrain documentation.

Hemera Speech Recognition Tool

Hemera project provides a little speech recognition tool allowing to create lexical and language models.
Currently it only supports French language, but you may contribute to add support for other languages.

Get it

You can use the corresponding functionality if you use an IDE, or use the following command in HEMERA_ROOT_PATH:

svn co SpeechRecognitionTools

Important: it requires HemeraThirdParty project.

Third-party tools


The following tools are required:

-> you must install them (or create symbolic link) in HEMERA_TP_PATH/_fromSource which has been created to help you keeping track of third-party tools you have installed for Hemera

The script will check for these tools availability.


To begin, you need to prepare your computer for compiling source code.



make SRILM=$PWD World
  • for 64 bits version, performed following instructions
make SRILM=$PWD MACHINE_TYPE=i686-m64 World


WARNING: this tool does support x86_64 architecture, it must be compiled as ix86 even on x86_64 bits OS

If it is your case, you need additional packages.
Then use the provided patch to update Makefile, forcing 32 bits compilation:

patch -N -p1 -s HEMERA_TP_PATH/_fromSource/lia_phon/Makefile < misc/lia_phon_32bits_compile.patch

  • performed following instructions (it will create the tools, resources, and the 80k lexical)
cd HEMERA_TP_PATH/_fromSource/lia_phon
make LIA_PHON_REP=$PWD all ressource lex80k



Create your own corpus, updating the file to fit your needs:


Then, launch the script


You can use the --copy option to automatically copy the created models in the corresponding directory of HEMERA_TP_PATH.

If a tool is not available or if there is an error, it will be printed on standard output.
Otherwise, lexical and language model will be created under the data/ sub-directory.