How cdec uses UTF-8

From cdec Decoder

Jump to: navigation, search

For the most part, cdec does not care what character encoding that is used, as long as words are separated by spaces (character 0x20). However, some functionality (compound splitting, features detecting non-ASCII characters) requires interpreting the character contents, and in these places, cdec assumes that its inputs are encoded in UTF-8. If this limited functionality is not used, other encodings will likely work.

Personal tools