treat

Go to file

Louis Mullie 10e6612a06 Bump version		2013-11-23 16:54:41 -05:00
bin	Manifest baby!	2012-06-17 01:52:01 -04:00
files	Update files and manifests.	2012-06-29 16:44:37 -04:00
lib	Bump version	2013-11-23 16:54:41 -05:00
models	Manifest baby!	2012-06-17 01:52:01 -04:00
spec	Merge branch 'master' of github.com:louismullie/treat	2013-06-02 21:48:25 -04:00
tmp	Added manifests.	2012-06-16 01:50:26 -04:00
.gitignore	Ignore benchmark results.	2012-10-21 16:59:51 -04:00
.rspec	Added tests for time extractors and indexers/searchers.	2012-03-11 23:40:15 -04:00
.travis.yml	Add trace for debug.	2012-09-29 02:56:29 -04:00
.treat	Add tentative configuration options.	2013-01-02 20:25:43 -05:00
Gemfile	Fix #61	2013-11-23 16:29:17 -05:00
LICENSE	Bump version to 2.0.0.	2012-12-05 15:24:06 -05:00
README.md	Update README for v2.0.5	2013-06-02 21:51:59 -04:00
RELEASE	Begin release notes for 2.0.0rc1.	2012-12-04 02:04:39 -05:00
Rakefile	Put all specs under the Treat::Specs module.	2013-01-06 19:26:36 -05:00
treat.gemspec	Add a general document reader class using Yomu.	2013-05-01 13:03:51 +08:00

README.md

New in v2.0.5: OpenNLP integration and Yomu support

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a quick tour or by reading the manual.

Features

Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
Lexical resources (WordNet interface, several POS taggers for English).
Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
Word inflectors, including stemmers, conjugators, declensors, and number inflection.
Serialization of annotated entities to YAML, XML or to MongoDB.
Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
Linguistic resources, including language detection and tag alignments for several treebanks.
Machine learning (decision tree, multilayer perceptron, LIBLINEAR, LIBSVM).
Text retrieval with indexation and full-text search (Ferret).

Contributing

I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project here.

Authors

Lead developper: @louismullie [Twitter]

Contributors:

@bdigital
@automatedtendencies
@LeFnord
@darkphantum
@whistlerbrk
@smileart
@erol

License

This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.