Go to file
Louis Mullie d39198d3b9 Forgot to rename helper method here. 2012-10-20 21:49:16 -04:00
bin Manifest baby! 2012-06-17 01:52:01 -04:00
files Update files and manifests. 2012-06-29 16:44:37 -04:00
lib Forgot to rename helper method here. 2012-10-20 21:49:16 -04:00
models Manifest baby! 2012-06-17 01:52:01 -04:00
spec Include the DSL for benchmark transformers/preprocessors. 2012-10-20 16:08:16 -04:00
tmp Added manifests. 2012-06-16 01:50:26 -04:00
.gitignore Add /coverage to gitignore. 2012-08-02 15:30:53 -04:00
.rspec Added tests for time extractors and indexers/searchers. 2012-03-11 23:40:15 -04:00
.travis.yml Add trace for debug. 2012-09-29 02:56:29 -04:00
Gemfile Add RSpec version and terminal table. 2012-09-28 15:28:45 -04:00
LICENSE Bum version number in LICENSe. 2012-07-23 20:36:35 -04:00
README.md Slightly modify description. 2012-09-21 01:03:41 -04:00
RELEASE Added release notes for 1.2.0. 2012-08-03 18:17:58 -04:00
Rakefile Reflect moving Installer to Core. 2012-10-15 00:18:10 -04:00
treat.gemspec Add simplecov and terminal-table as development dependencies. 2012-09-28 15:29:16 -04:00

README.md

Build Status Dependency Status Code Climate

Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition.

Current features

  • Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
  • Text retrieval with indexation and full-text search (Ferret).
  • Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
  • Word inflectors, including stemmers, conjugators, declensors, and number inflection.
  • Lexical resources (WordNet interface, several POS taggers for English).
  • Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
  • Serialization of annotated entities to YAML, XML or to MongoDB.
  • Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
  • Linguistic resources, including language detection and tag alignments for several treebanks.
  • Machine learning (decision tree, multilayer perceptron, linear, support vector machines).

Resources


License

This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.