Cdec sample grammar and test set

From cdec Decoder

Jump to: navigation, search

If you just want to see cdec run, you this page describes how to download a (smallish) translation model, language model, weights file, and test set and decode it with cdec. To run this demo, you will need to have cdec downloaded, installed, and built (build instructions). You will also need to have at least 1.5GB memory free on your system.

This page assumes that $CDEC refers to the root path of your cdec software tree.

Download the data

You can download the data with the following command on most Linux-like systems:

wget http://www.ark.cs.cmu.edu/cdyer/cdec-demo.tar.bz2

Or you can get it here. The download size is 229MB.

Unzip the data and run the decoder

tar xjf cdec-demo.tar.bz2
cd cdec-demo/
$CDEC/decoder/cdec -c cdec-mt03.ini -w weights.tuned -i mt03.src.txt > mt03.trans

The decoding process will take a couple of minutes on most modern systems with enough (i.e., > 1.5GB) memory.

Score the output

cdec includes a script to score a translation using a number of different metrics. By default, the Papineni et al. (2002) definition of BLEU is used:

$CDEC/mteval/fast_score -r mt03.ref.0 -r mt03.ref.1 -r mt03.ref.2 -r mt03.ref.3 -i mt03.trans

The following output is expected:

Loading references (4 files)
Loaded reference translations for 919 sentences.
Loaded 919 references for scoring with ibm_bleu
BLEU = 32.25, 76.5|43.1|24.3|13.9 (brev=0.993)
0.322502
Personal tools