Feature functions for rescoring an SCFG translation forest

From cdec Decoder

Jump to: navigation, search

Contents

Stateless Features

Stateless features do not change the size of the translation forest when they are applied.

ArityPenalty

  • Counts the number of rules with arities 0, 1, 2, ... used in the hypothesis.
  • Feature Name: Arity_0, Arity_1, Arity_2, ... (2 is usually max)
  • Configuration spec:
feature_function=ArityPenalty

WordPenalty

LatticeCost

NonLatinCount

  • Counts the number of words in the hypothesis containing Unicode characters 0x0080 and above.
  • Configuration spec:
feature_function=NonLatinCount

Stateful Features

Stateful features result in a larger translation forest when they are applied.

LanguageModel

The use of LanguageModel is deprecated in favor of KLanguageModel.

KLanguageModel

  • An n-gram target language model feature, using KenLM as the inference library.
  • KenLM uses less memory and loads much more quickly than SRI-LM
  • Note that the configuration name includes the format used to create the LM (probing, trie, or sorted)
  • Feature name: LanguageModel
  • Configuration spec:
feature_function=KLanguageModel_Probing /path/to/LanguageModel.klm
feature_function=KLanguageModel_Trie /path/to/LanguageModel.klm
feature_function=KLanguageModel_Sorted /path/to/LanguageModel.klm

How do I use explicit beginning- and end-of-sentence markers with cdec?

See the discussion under Language model notes

Personal tools