Cdec k-best framework

From cdec Decoder

Jump to: navigation, search

In the semiring literature, it is customary to view Viterbi and k-best derivation algorithms (I will refer to these as top-k algorithms) as inside algorithms where the as semiring used has an addition operation that is max(a,b). Since cdec supports a generic semiring framework, this is one implementation option for these algorithms. However, since one typically wants to compute a variety properties (e.g., the string yield, the length, the number of rules) over the top-k derivations as ordered by a single quantity (derivation probability, for example), cdec provides a set of top-k algorithms templatized on a semiring that provides the ordering (using operator<) and a traversal function, which is essentially a multiplication operation in another domain (such as "concatenate string").

Thus, to get the E-side yield of the best derivation in terms of probability, one can write:

vector<WordID> yield;
prob_t best = Viterbi<vector<WordID>, ESentenceTraversal, prob_t, EdgeProb>(hg, &yield);

To get the top 25 E-side yields, you can use the same traversal function with the KBestDerivations class.

KBest::KBestDerivations<vector<WordID>, ESentenceTraversal, KBest::NoFilter, prob_t, EdgeProb> kbest(forest, 25);
for (int i = 0; i < 25; ++i) {
  const KBest::KBestDerivations<vector<WordID>, ESentenceTraversal, KBest::NoFilter, prob_t, EdgeProb>::Derivation* d =
    kbest.LazyKthBest(i);
  if (!d) break;     // there were less than 25 derivations in the forest!
  cout << TD::GetString(d->yield) << " ||| " << d->feature_values << endl;
}

The k-best extractor uses the lazy extraction algorithm of Huang and Chiang (IWPT 2005). It supports filtering of derivations during extraction which can be used, for example, to efficiently extract a unique k-best list.

Personal tools