Per-sentence grammars

From cdec Decoder

Jump to: navigation, search

cdec has efficient code for reading text SCFGs. As a result, it is practical to collect all the grammar rules necessary to translate each individual sentence into separate files, and load the appropriate file immediately prior to translation of the sentence. The computational overhead of doing this is usually minimal (compared to the expense of decoding), and it means that cdec processes are much more memory efficient, since they do not have to store a full grammar in memory.

To tell cdec to load a grammar file for a particular sentence, one uses the SGML markup option on the input. For example:

<seg id="864" grammar="/home/cdec/grammars/de-en/wmt11/grammar.864.gz"> Geld spielt bei der Jobsuche längst nicht mehr die Hauptrolle . </seg>

Per-sentence grammar files may be used in conjunction with other static grammars specified with the decoder's -g option.

Drawbacks

  • Per sentence grammars may be expensive to create, and are typically only really practical in offline translation workflows, such as system training.
  • Large amounts of disk space may be necessary.
Personal tools