How to run MERT, PRO, or Rampion

From cdec Decoder

Jump to: navigation, search

cdec has cluster-ready implementations of "hypergraph" MERT (Kumar et al., 2009), which is called dpmert in cdec, PRO (Hopkins and May, 2011), and Rampion (Gimpel and Smith, 2012). These algorithms can run on a single machine, a single machine using multiple processes (with job management done by the system's make command), or a batch processing system running some flavor of qsub.

Contents

Running MERT

MERT lives in the dpmert/ top-level directory, and is run using the command dpmert.pl. The mandatory arguments are:

  • --source-file <sourcefile>
    • The dev-set source file to by translated. This will typically be in text format, but SGML markup can be added to specify the location of per-sentence grammars.
  • --ref-files <reffilespec>
    • Reference translations for the dev-set in text format
    • Multiple references should be in multiple files
    • <reffilespec> can contain file globs, e.g. /home/data/zh-en/mt02.en.lc.*
  • --weights <weightsfile>
    • A weights file containing the initial weight vector (i.e., those used in the first iteration)
    • only the weights specified with non-zero values in <weightsfile> will be optimized
  • cdec.ini
    • the decoder configuration file used to configure the decoder to translate <srcfile>.

By default MERT optimizes the BLEU score of the dev-set. The metric used can be changed using the --metric option.

MERT best practices

MERT does inference over hypergraphs produced during the decoding process. Their size is determined by many factors but primarily the LM order and cube pruning settings. By default, these are quite large, which means MERT inference can be quite slow (and excessive disk space may be used). To produce smaller hypergraphs so as to speed up MERT inference, it is a good idea to add the following to your cdec.ini file.

density_prune=100

This tells the decoder to run inside-outside pruning after decoding but before running MERT inference. (The parameter can be increased to make larger hypergraphs, or decreased to make them smaller). Pruning does not harm the performance of the MERT algorithm, although it may occasionally require one or two extra iterations before convergence. But the iterations should take far less time than without.

Running PRO

PRO lives in the pro-train/ top-level directory, and is run using the command dist-pro.pl. The mandatory arguments are:

  • --source-file <sourcefile>
    • The dev-set source file to by translated. This will typically be in text format, but SGML markup can be added to specify the location of per-sentence grammars.
  • --ref-files <reffilespec>
    • Reference translations for the dev-set in text format
    • Multiple references should be in multiple files
    • <reffilespec> can contain file globs, e.g. /home/data/zh-en/mt02.en.lc.*
  • --weights <weightsfile>
    • A weights file containing the initial weight vector (i.e., those used in the first iteration)
    • To start from zero, an empty file may be used.
  • cdec.ini
    • the decoder configuration file used to configure the decoder to translate <srcfile>.

PRO best practices

It is a very good idea to spend some time tuning the regularization parameters used during learning, especially if you are using more than a very small (10 or so) number of coarse features. There are two such parameters, the L2 regularization strength (--reg) and the L2 penalty for weight changes across iterations (--reg-previous), of which the L2 regularization strength (--reg) is by far the more important one since its setting can prevent and/or encourage overfitting to the development data.

In general, regularization strength should be changed on a log scale, so start with some low regularizer (e.g., 1) and multiply by a growth factor k (>1) until you reach some upper bound or see performance decreasing as the model becomes over-regularized. To determine whether your model is over- or under-regularized, you should use a held-out dev-test set.

Running Rampion

Rampion lives in the rampion/ directory and is run using the command rampion.pl.

Cluster operation with qsub

Requirements:

  • some qsub flavor (Torque, Sun/Oracle Grid Engine, etc.)
  • a shared file system for all worker nodes and the head node

Insturctions:

  • Configure your machine environment in environment/LocalEnvironment.pm
  • Add --qsub to the command. That's it!
Personal tools