diff options
Diffstat (limited to 'textproc/py-tokenizer/pkg-descr')
-rw-r--r-- | textproc/py-tokenizer/pkg-descr | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/textproc/py-tokenizer/pkg-descr b/textproc/py-tokenizer/pkg-descr new file mode 100644 index 000000000000..c1f700edffe5 --- /dev/null +++ b/textproc/py-tokenizer/pkg-descr @@ -0,0 +1,5 @@ +Tokenizer: A tokenizer for Icelandic text + +Tokenization is a necessary first step in many natural language processing +tasks, such as word counting, parsing, spell checking, corpus generation, and +statistical analysis of text. |