Cdec Development Guide

From cdec Decoder

Jump to: navigation, search

If you plan on contributing to cdec, this document will help you understand the development process. Want to know what to work on? Here are some suggested projects.

Contents

Development workflow

cdec is hosted on github.com. Developers wishing to contribute should follow roughly these steps:

Coding style

  • cdec uses a derivative of the Google c++ coding style. Read it!
  • cdec's differences to the official Google style:
    • Use of the boost libraries is encouraged
    • Feel free to go over 80 characters per line, but not horribly over
    • commented out code is permitted
    • Don't write -inl.h files. If the header definition becomes difficult to read, put the inline function implementation at the end of the file.
    • Virtual functions should be protected and called from (public) non-virtual functions (template method pattern)
    • Do not write inline implementations of virtual destructors or virtual functions
  • Really, go read the style guide. All of it.

Design tips

  • Don't over-engineer things: it's better to refactor later than to make things overly abstract up front
  • Make something abstract if you know for sure there will be more than two of them
  • Duplicated code is bad, but code with unintuitive hierarchies / call graphs is worse
  • Do refactor
  • Use object hierarchies and polymorphism sparingly
  • Do use templates (generics) in places where speed matters
  • Minimize #includes in header files
  • Minimize heap memory allocations (calls to new, either explicit or implicit) in code that will be called often

Unit tests and system tests

  • Unit tests are small binaries that exercise small amounts of code in a very controlled way
  • System tests run cdec with particular inputs to verify that the decoder works as a whole
  • Unit tests (mostly) require the google testing framework to run
  • System tests have no extra dependencies: if you have a cdec binary, you can run the system tests
  • Write both kinds and run them often!
  • Every model (translation model or rescoring model) should be exercised by at least one system test.
  • Run system tests with the following command:
./tests/run-system-tests.pl
  • To add a new system test
    • create a new directory under tests/system_tests that is the name of the test you wish to run
    • in this directory, create a configuration file, a feature weights file, some sample input, and any supporting data files that are necessary (refer to an existing test for the formats)
    • also create the expected outputs (refer to an existing test for the formats)

Debugging SEGVs

If you're new to C++, debugging SEGVs can be a bit challenging. Here are some tips.

  • Use the gdb (gnu debugger) to help pinpoint the location
    • the where command shows you where the program crashed.
    • print var can be used to inspect the value of a variable
  • Turning off compiler optimization in the Makefile (grep for the string O2 in the generated Makefile and change it to O0) and rebuilding can make this easier to do.
  • Heap corruption is a particularly nasty problem to track down. It happens when you write to unallocated memory or to memory that has already been freed. However, crashes often don't occur until much later (for example, the next time memory is allocated), making the source of the problem hard to identify. Fortunately, there are some very good memory debuggers that can help you identify the locus of the original problem. I've had very good luck using Electric Fence.

Dependencies on third-party software

  • In general, introducing dependencies on other software packages (i.e., beyond SRILM and boost) should be avoided.
  • If some dependency is necessary, it should be configured using autoconf and optional if at all possible
Personal tools