Feature functions for rescoring an SCFG translation forest
From cdec Decoder
Contents |
Stateless Features
Stateless features do not change the size of the translation forest when they are applied.
ArityPenalty
- Counts the number of rules with arities 0, 1, 2, ... used in the hypothesis.
- Feature Name: Arity_0, Arity_1, Arity_2, ... (2 is usually max)
- Configuration spec:
feature_function=ArityPenalty
WordPenalty
LatticeCost
NonLatinCount
- Counts the number of words in the hypothesis containing Unicode characters 0x0080 and above.
- Configuration spec:
feature_function=NonLatinCount
Stateful Features
Stateful features result in a larger translation forest when they are applied.
LanguageModel
The use of LanguageModel is deprecated in favor of KLanguageModel.
KLanguageModel
- An n-gram target language model feature, using KenLM as the inference library.
- KenLM uses less memory and loads much more quickly than SRI-LM
- Note that the configuration name includes the format used to create the LM (probing, trie, or sorted)
- Feature name: LanguageModel
- Configuration spec:
feature_function=KLanguageModel_Probing /path/to/LanguageModel.klm feature_function=KLanguageModel_Trie /path/to/LanguageModel.klm feature_function=KLanguageModel_Sorted /path/to/LanguageModel.klm
How do I use explicit beginning- and end-of-sentence markers with cdec?
See the discussion under Language model notes
