Projects that should be done
From cdec Decoder
- Edges currently duplicate the feature values from their rules. It would be better if edge's only had feature values that were just associated with that edge, and rule features were associated just with the rule object.
- On-disk representation for grammar rules
- Come up with a universal hypergraph representation format (get rid of slow, crappy JSON format)
- Should have a finite state parser generated using flex (or similar)
- Should be similar to the SLF lattice format (identical?)
- Should coordinate with moses and joshua people
- "Hypergraph MBR" implementation
- Use a profiler- cdec hasn't been profiled. (the open source Google CPU profiler is quite excellent for C++).
- Kill SRILM
- write a home-grown (but fast, small, and good) implementation of an n-gram LM
- bonus points if you put it in shared memory
- or maybe it would be better to put the whole thing in a header file?
Visit the Cdec Development Guide and get started!
