Compare commits

...

567 Commits

Author SHA1 Message Date
Louis Mullie 8c6450a0d7 Merge pull request #120 from indentlabs/master
Use S3's static-public-assets bucket instead of louismullie.com
2017-05-05 18:31:43 -04:00
Louis Mullie d6a163f8f7 Make nokogiri a required dependency, #121 2017-04-22 16:15:35 -04:00
Andrew Brown 58911ce969 Add \n after post-install output 2016-05-24 14:04:48 -05:00
Andrew Brown 7b6e8fe6a8 Unnecessary whitespace removal 2016-05-24 13:20:17 -05:00
Andrew Brown f1f8c010a6 Point treat assets to mirror on s3 2016-05-24 13:19:51 -05:00
Louis Mullie a86c3036cc Merge pull request #110 from agarie/update-travis
Update .travis.yml for Ruby 2+
2015-12-07 21:17:40 -05:00
Louis Mullie e6c1f7c3a4 Merge pull request #114 from ojak/103_parsing_issues
Fixed typo in phrase tag that causes Adverbial Phrases to parse improperly
2015-12-07 21:17:12 -05:00
Oliver Jakubiec 4dfaf83a14 Fixed typo in phrase tag that causes Adverbial Phrases to parse improperly 2015-12-07 17:36:11 -08:00
Carlos Agarie eed75d3ba7 Update .travis.yml for Ruby 2+ 2015-05-28 18:42:37 -03:00
Louis Mullie 3dc184277d Merge pull request #92 from jbranchaud/update-post-install-note
Put quotes around gem name in post-install message.
2015-04-10 15:21:51 -04:00
Louis Mullie e218791c88 Merge pull request #106 from trosborn/master
abbreviates adjective and adverb to work properly with rwordnet
2015-04-07 18:12:35 -04:00
Thomas Osborn 2d6891ab9c abbreviates adjective and adverb to work properly with rwordnet 2015-04-07 15:00:38 -07:00
Louis Mullie f61adc8187 Merge pull request #105 from trosborn/master
fixes errors caused by wordnet 1.0.0
2015-04-06 16:34:48 -04:00
Thomas Osborn cb7ae61cc5 fixes errors caused by wordnet 1.0.0 2015-04-05 21:27:08 -07:00
Louis Mullie 394edb8fc4 Merge pull request #101 from ojak/master
Added enclitic for 'd and fixed a tag typo.
2015-03-21 16:43:02 -04:00
Oliver Jakubiec 721215b64a Added enclitic for 'd and enclitic examples for clarity 2015-03-20 13:12:21 -07:00
Oliver Jakubiec 90fdcd8c9e Fixed typo in comma punctuation tag 2015-02-27 17:08:15 -08:00
Louis Mullie 3992f2c177 Merge pull request #95 from ojak/master
Fixed typo that caused n't to be tagged as a Word rather than an Enclitic
2015-02-26 14:12:17 -05:00
Oliver Jakubiec 159f3b266e Fixed typo that caused n't to be tagged as a Word rather than an Enclitics 2015-02-26 10:56:03 -08:00
jbranchaud b7c6ae452b Put quotes around gem name in post-install message.
The post-install message for this gem instructs the user to run `require
treat` from irb, however, the actual, functioning line of code involves
quotes, namely, `require 'treat'`.
2015-02-14 00:21:39 -06:00
Louis Mullie e753389911 Merge pull request #85 from nicklink483/rspec_file_fix
rspec file fix
2014-08-30 19:48:52 -04:00
Dominic Muller 1b783faf77 fixed line in .rspec file to invoke correct formatter and not cause a TravisCI bug 2014-08-25 23:13:29 -07:00
Louis Mullie febc47a905 Bump version 2014-07-17 14:18:12 -04:00
Louis Mullie 3969478f7f Fix https://github.com/louismullie/bind-it/issues/4 2014-07-17 14:17:29 -04:00
Louis Mullie fc77587390 Bump version for next release 2014-05-28 13:29:36 -04:00
Louis Mullie eb0ec83053 Merge pull request #79 from smileart/master
Fixed file path detection bug in build method polymorphism
2014-05-20 18:10:04 -04:00
Smile @rT f8e5f8392a Fixed file path detection bug in build method polymorphism
(when string was recognised as path cause of  "xml", "yml", etc. content in it)
Refactored file detection duplication in few places
Fixed xml file format detection (f variable assigned but never used)
Deleted some trailing spaces in the refactored file
2014-05-20 14:20:24 +03:00
Louis Mullie c813e72c2c Merge pull request #78 from dalton/master
Revert "Considerably simplify all time annotators."
2014-05-17 22:59:06 -04:00
Adam Dalton 316a6cec04 Revert "Considerably simplify all time annotators."
This reverts commit 01c3e50ef0.
2014-05-16 12:08:54 -04:00
Louis Mullie d45d82b3c4 Merge pull request #75 from adammcarthur/master
update code climate badge in readme
2014-05-07 09:37:39 -04:00
Adam McArthur 78c13c280f update code climate badge in readme 2014-05-07 18:18:57 +10:00
Louis Mullie 690157af8b Merge pull request #70 from chris-at-thewebfellas/master
possible fix for issue #68 and #66 and #63
2014-02-18 19:26:51 -05:00
Chris Anderton a8ce6b2f18 change newlines to try and allow rdoc to run without errors 2014-02-18 23:40:09 +00:00
Rob Anderton f7bd0c35d7 tweak multi-line should expectations to be consistent with other specs 2014-01-28 12:08:44 +00:00
Rob Anderton 65235fb935 fix language detection and tweak tests 2014-01-28 11:56:03 +00:00
Rob Anderton 0ed8a37761 add enclitics test, fix multi-line should expectations 2014-01-28 11:38:34 +00:00
Rob Anderton 8f82086bb1 fix collection test 2014-01-28 10:55:02 +00:00
Chris Anderton d9b912f24d naive attempt to fix the implode method on stringable 2014-01-23 00:57:32 +00:00
Louis Mullie 10e6612a06 Bump version 2013-11-23 16:54:41 -05:00
Louis Mullie 5fb11131c7 Fix require path on Ruby 2.0 2013-11-23 16:54:29 -05:00
Louis Mullie 281934d572 Fix #61 2013-11-23 16:29:17 -05:00
Louis Mullie c9bf91562b Bump 2013-06-29 14:23:25 -04:00
Louis Mullie 643937b231 Cheap fix for #45 2013-06-29 14:22:50 -04:00
Louis Mullie fff57a4526 Fix issues #54 and #57 2013-06-27 17:59:35 -04:00
Louis Mullie 49e99ccd0d Merge pull request #58 from geoffharcourt/fix_rubygems_source_declaration
Fix Rubygems source in Gemfile
2013-06-22 11:45:52 -07:00
Geoff Harcourt 136a7cdbb3 Fix Rubygems source in Gemfile
The use of source :rubygems has been deprecated, and causes the system
to issue a warning when bundle is used with the gem.
2013-06-20 15:59:41 -04:00
Louis Mullie a8448f8103 Merge pull request #55 from wstrinz/master
Update Ocropus calls to work with v 0.7
2013-06-05 13:13:28 -07:00
= 91e3571699 fix spelling for doc 2013-06-05 10:02:38 -05:00
= 6464bb64a0 updated old ocropus commands 2013-06-05 01:10:03 -05:00
= 5c984a1dbb updated old ocropus commands 2013-06-05 00:59:06 -05:00
Louis Mullie 20e3aa1bb7 Update README for v2.0.5 2013-06-02 21:51:59 -04:00
Louis Mullie e483b764e4 Merge branch 'master' of github.com:louismullie/treat
Conflicts:
	Gemfile
2013-06-02 21:48:25 -04:00
Louis Mullie 3a367f7c56 Add list of development gems. 2013-06-02 21:44:52 -04:00
Louis Mullie c99dfac530 Bump version for next release. 2013-06-02 21:44:25 -04:00
Louis Mullie ee98b19960 Fix "humongous" bug (closes #45). 2013-06-02 21:44:18 -04:00
Louis Mullie bf6d951cf7 Fix XML deserialization bug (closes #43). 2013-06-02 21:44:01 -04:00
Louis Mullie 1b28d3640d Comment out development stuff. 2013-06-02 21:43:30 -04:00
Louis Mullie 727a307af0 Add basic support for OpenNLP. 2013-06-02 21:42:56 -04:00
Louis Mullie cf8acd2b64 Merge pull request #51 from Erol/general-document-reader-using-yomu
Add a general document reader class using Yomu
2013-05-01 18:05:12 -07:00
Erol Fornoles 1aa4df9566 Add a general document reader class using Yomu. 2013-05-01 13:03:51 +08:00
Louis Mullie 89cf494cba Merge pull request #47 from smileart/master
Little patch to german.rb to fix issue #39
2013-04-05 04:23:15 -07:00
Serge Bedzhik e2ef8ea748 Little patch to german.rb to fix issue
It fixes issue — https://github.com/louismullie/treat/issues/39 with invalid multibyte char (US-ASCII)
2013-04-04 11:03:39 +03:00
Louis Mullie 038d62b12b Add logo to README. 2013-01-26 14:40:40 -05:00
Louis Mullie 767e1f7f45 Version 2.0.4. 2013-01-23 10:48:33 -05:00
Louis Mullie 4cbcf03d43 Fix parser spec. 2013-01-09 12:53:09 -05:00
Louis Mullie 5e4189218c A few fixes to Stanford parser. 2013-01-06 21:58:57 -05:00
Louis Mullie 482ed30d76 Update spec to reflect that sentences now need to be tokenized before parsing. 2013-01-06 21:58:30 -05:00
Louis Mullie b9ee19e276 Fix spec due to change in gem dependency behaviour. 2013-01-06 21:58:13 -05:00
Louis Mullie fe7f97afd7 Completely refactor parser without using pipeline. 2013-01-06 21:50:45 -05:00
Louis Mullie e135ffa466 Bind Stanford Core NLP on load. 2013-01-06 21:50:27 -05:00
Louis Mullie a755272934 Forgot some devel stuff in there. 2013-01-06 19:31:39 -05:00
Louis Mullie 8b8a769d74 Put all specs under the Treat::Specs module. 2013-01-06 19:26:36 -05:00
Louis Mullie e2b813ca21 Add support for kronic time extractor. 2013-01-06 19:17:51 -05:00
Louis Mullie f72c8aec54 Add support for kronic time extractor. 2013-01-06 19:17:27 -05:00
Louis Mullie 6a025ca4d6 Add support for kronic time extractor. 2013-01-06 19:17:20 -05:00
Louis Mullie 1f69c3f089 Add post-install message. 2013-01-06 19:17:11 -05:00
Louis Mullie 4b74913bcd Cosmetic changes. 2013-01-06 19:16:57 -05:00
Louis Mullie a34714cb99 Fix bug and make sure to load Maxent class. 2013-01-06 19:16:46 -05:00
Louis Mullie b37ecdb695 Make sure that the class is loaded. 2013-01-06 19:14:41 -05:00
Louis Mullie 01c3e50ef0 Considerably simplify all time annotators. 2013-01-06 19:14:20 -05:00
Louis Mullie ba5a2d78ad Update specs for time annotators. 2013-01-06 19:12:00 -05:00
Louis Mullie c6ceac22d6 Bump version number. 2013-01-05 16:31:03 -05:00
Louis Mullie c18d0073f7 Add Levensthein and Jaro-Winkler string distance algorithms. 2013-01-05 16:30:55 -05:00
Louis Mullie 056ef129d6 Use instance_exec pattern for builder. 2013-01-04 08:45:29 -05:00
Louis Mullie 6e644db50a Fix variable scoping. 2013-01-03 22:04:28 -05:00
Louis Mullie d351108ad0 Switch back to ruby-readability now that JRuby support was added. 2013-01-03 22:04:14 -05:00
Louis Mullie cbca3d8d37 Add builder. 2013-01-03 22:03:45 -05:00
Louis Mullie ebf8a16c2a Move topic_words spec to agnostic language, where it belongs. 2013-01-02 21:49:18 -05:00
Louis Mullie 683861dfd1 Refactorings and bug fixes in Stanford tools. 2013-01-02 21:25:40 -05:00
Louis Mullie 2cded62abf Add FIXME note. 2013-01-02 21:06:57 -05:00
Louis Mullie e32f44f0a0 Update Linguistics loader to reflect change in API. 2013-01-02 21:06:42 -05:00
Louis Mullie e87574c03d Add loggability dependency for linguistics. 2013-01-02 21:06:30 -05:00
Louis Mullie 6cf37a2926 Remove unused dependency. 2013-01-02 21:05:57 -05:00
Louis Mullie ff80d8ab2c Coding style. 2013-01-02 21:05:32 -05:00
Louis Mullie fdb5abb748 Suppress warning with new gem API. 2013-01-02 21:05:26 -05:00
Louis Mullie 931a66ec40 Fixes for change in Ruby Linguistics API. 2013-01-02 21:05:06 -05:00
Louis Mullie 644248504d Update gem to reflect new Linguistics API. 2013-01-02 20:37:09 -05:00
Louis Mullie 0731d33024 Update to reflect new API in gem. 2013-01-02 20:34:49 -05:00
Louis Mullie e8b12c5d03 Change development paths in comment. 2013-01-02 20:27:09 -05:00
Louis Mullie cc2b8d6679 Tentative wrapper for OpenNLP MaxEnt tokenizer. 2013-01-02 20:26:08 -05:00
Louis Mullie e284aee7eb Add tentative configuration options. 2013-01-02 20:25:43 -05:00
Louis Mullie 8209447625 Include DSL in agnostic spec. 2013-01-02 20:25:15 -05:00
Louis Mullie 201ac69775 Coding style. 2013-01-02 20:24:57 -05:00
Louis Mullie c42ee58bcd Clarify condition. 2013-01-02 20:24:47 -05:00
Louis Mullie 970a6248d5 Refactor and simplify some methods. 2013-01-02 20:24:28 -05:00
Louis Mullie 7fb10de5cc Refactor stanford loader. 2013-01-02 20:24:16 -05:00
Louis Mullie 577a063165 Fix typo. 2013-01-02 20:23:37 -05:00
Louis Mullie 9fa4ee9c1e Add note on implementation. 2013-01-02 20:23:27 -05:00
Louis Mullie fd0733113a Add 'nt as an enclitic. 2012-12-18 02:52:20 -05:00
Louis Mullie 50ac0f67e0 Fix user-specified option precedence in mongo unserialization. 2012-12-18 02:52:13 -05:00
Louis Mullie c2532d6cd2 Bump version. 2012-12-15 06:02:48 -05:00
Louis Mullie 0abeb6ba14 Fix typo causing syntax error. 2012-12-15 06:02:42 -05:00
Louis Mullie d38387265b Add in FastImage for readability. 2012-12-15 06:02:35 -05:00
Louis Mullie abf16b2b3e Update agnostic specs. 2012-12-15 05:21:17 -05:00
Louis Mullie 20b49cc471 Update the specs. 2012-12-15 05:21:02 -05:00
Louis Mullie 3295529e97 Change ruby-readability to jruby-* for compat. 2012-12-15 05:20:38 -05:00
Louis Mullie 5087b53623 Use jruby-readability, add images support. 2012-12-15 05:20:28 -05:00
Louis Mullie e149c46bde Directly call from class. 2012-12-15 05:20:16 -05:00
Louis Mullie 9836dabe87 Bump version number. 2012-12-13 21:23:15 -05:00
Louis Mullie 02ccf9942f Make nym option strings. 2012-12-13 03:28:22 -05:00
Louis Mullie c2a7a33fae Add in rest of specs. 2012-12-12 09:42:02 -05:00
Louis Mullie e9dd1a9298 Move language spec to agnostic. 2012-12-12 09:41:52 -05:00
Louis Mullie 3484baa904 Fix typo causing syntax error. 2012-12-12 09:41:37 -05:00
Louis Mullie 2b8e269380 Fix spec helper. 2012-12-12 02:54:31 -05:00
Louis Mullie 28669ec6c6 Make specs simpler and sweeter. 2012-12-12 02:50:15 -05:00
Louis Mullie 5daa79c2f0 Reflect change in folder from core to learning. 2012-12-12 02:35:00 -05:00
Louis Mullie 72eb214650 Delete duplicate file. 2012-12-12 02:34:48 -05:00
Louis Mullie 9f89a006a4 Add a list of keywords for the learning DSL. 2012-12-12 02:34:43 -05:00
Louis Mullie ec3465d24d Considerably simplify DSL code. 2012-12-12 02:34:33 -05:00
Louis Mullie b76a5c2c15 Rename folder to make consistent with /lib. 2012-12-12 02:34:17 -05:00
Louis Mullie 3e2a898155 Support groups and fragments. 2012-12-12 02:33:54 -05:00
Louis Mullie 7ef79deaa5 Test the :nym option for validity. 2012-12-12 02:33:43 -05:00
Louis Mullie e13ff4dcab Support fragments. 2012-12-12 02:33:30 -05:00
Louis Mullie 796ae59d77 Make consistent with other serializers by returning location of stored file. 2012-12-12 02:33:15 -05:00
Louis Mullie 9f54b6daba Fix bug due to change in language variable type. 2012-12-12 02:33:00 -05:00
Louis Mullie 66f0cadc57 Improve language spec. 2012-12-11 19:23:19 -05:00
Louis Mullie a7b4a7eea8 Fix specs in accordance with 6c5f31fd5e19d45f6e7f5f6393a02459b01eabb9. 2012-12-11 19:22:59 -05:00
Louis Mullie 2d9ef825c2 Cosmetic change. 2012-12-11 19:20:04 -05:00
Louis Mullie 0e06750ede Bump version to 2.0.0. 2012-12-05 15:24:06 -05:00
Louis Mullie 2c6c4ec964 Add contributors. 2012-12-04 17:49:04 -05:00
Louis Mullie 4d60ceb6ea Remove examples; moved to quick tour. 2012-12-04 02:17:45 -05:00
Louis Mullie b57d047ea8 Fix typo in README. 2012-12-04 02:16:07 -05:00
Louis Mullie 10abcb7e20 Simplify README even more. 2012-12-04 02:13:08 -05:00
Louis Mullie 1ab18f42c3 Format README. 2012-12-04 02:06:05 -05:00
Louis Mullie b6be6e7e1d Begin release notes for 2.0.0rc1. 2012-12-04 02:04:39 -05:00
Louis Mullie 4577430e84 Bump version number. 2012-12-04 02:04:29 -05:00
Louis Mullie 6c46ebf707 Update README. 2012-12-04 02:02:32 -05:00
Louis Mullie 01125b8a38 Cheating is bad. 2012-12-04 01:57:16 -05:00
Louis Mullie a7991bd009 Update spec examples after fixing worker for underscores. 2012-12-04 01:57:02 -05:00
Louis Mullie 6cc94ca6c5 Manually run specs to comply with RSpec update. 2012-12-04 01:56:44 -05:00
Louis Mullie 6f443b3087 Fix typo raising SyntaxError. 2012-12-04 01:56:18 -05:00
Louis Mullie 5b22b0d3b8 Input option may be a symbol. 2012-12-04 01:56:06 -05:00
Louis Mullie 7c1b597619 Fix YAML require issue. 2012-12-04 01:55:09 -05:00
Louis Mullie 2394de746d Images are in HTML format after conversion. 2012-12-03 23:58:59 -05:00
Louis Mullie 7f83e78a04 Ensure correctness when entity is smaller than minimum detection threshold. 2012-12-03 23:58:59 -05:00
Louis Mullie 1c69d6558e Allow to pass in a mix of entities and strings. 2012-12-03 23:58:59 -05:00
Louis Mullie da096bd15e Refactor and improve doc. 2012-12-03 23:58:59 -05:00
Louis Mullie 1e47f41510 Re-work some formatting and option issues. 2012-12-03 23:58:59 -05:00
Louis Mullie d3e52fb36a Add Learning DSL in a safer way. 2012-12-03 23:58:59 -05:00
Louis Mullie b5528a34f1 Make taggers recursive within token groups. 2012-12-03 23:58:59 -05:00
Louis Mullie e4664586fe Add suitable plugins for French. 2012-12-03 23:58:59 -05:00
Louis Mullie a49d1d97a0 Rename phrase to group, add group in entity list. 2012-12-03 23:58:58 -05:00
Louis Mullie fbd5e48fae Merge pull request #34 from LeFnord/master
removed Psych
2012-11-26 08:07:27 -08:00
LeFnord ee3ed723bc changed french and german encoding to utf-8 (should be standard) 2012-11-26 16:30:21 +01:00
LeFnord 476962e126 removed psych dependencies, cause in 1.9.x default 2012-11-26 14:48:04 +01:00
Louis Mullie 57e3c88d9e Default format to txt. 2012-11-25 03:23:19 -05:00
Louis Mullie bb4910c43d Fix problem with order of function calls. 2012-11-25 03:23:10 -05:00
Louis Mullie a1d09cbb99 Remove underscores in composite words. 2012-11-25 03:22:57 -05:00
Louis Mullie c8020c90e6 Put first_but_warn along with code that uses it. 2012-11-25 03:22:43 -05:00
Louis Mullie 5ac537db5a Downcase value for expected behaviour. 2012-11-25 03:22:15 -05:00
Louis Mullie 2162339419 Fix mess with DSL. 2012-11-25 03:21:51 -05:00
Louis Mullie be96c54064 Make scalpel default segmenter. 2012-11-25 03:21:38 -05:00
Louis Mullie a116cc8a5b Don't skip linear and svm as they are now working. 2012-10-31 23:26:09 -04:00
Louis Mullie 12cac5c769 Remove now unused labels feature. 2012-10-31 23:26:08 -04:00
Louis Mullie 1eb9341f77 Call class method, not instance method. 2012-10-31 23:26:08 -04:00
Louis Mullie 22b2afdddf Generalize sentence/phrase to group. 2012-10-31 23:26:08 -04:00
Louis Mullie 01d8f656b2 Remove labels from everywhere. 2012-10-31 23:26:08 -04:00
Louis Mullie 9af52ec798 Allow transparent casting when adding strings/numbers. 2012-10-31 23:26:08 -04:00
Louis Mullie 30196bb699 Generalize taggers and categorizers to target groups. 2012-10-31 23:26:08 -04:00
Louis Mullie f2edcf8118 Merge pull request #31 from LeFnord/master
Added stop words for several languages.
2012-10-29 05:34:22 -07:00
LeFnord b20836bc04 added umlaute to german stop words 2012-10-29 10:57:43 +01:00
LeFnord 3fa0e8593b corrected typo 2012-10-29 10:36:55 +01:00
LeFnord cd044f1c59 added stop words 2012-10-29 10:28:19 +01:00
Louis Mullie 5144f43c63 Update to reflect removal of labels option. 2012-10-28 02:18:43 -04:00
Louis Mullie f0d63fea20 Silence library output. 2012-10-28 02:18:31 -04:00
Louis Mullie 7f93decac7 Make sure silence_stdout returns the result of yield. 2012-10-28 02:18:25 -04:00
Louis Mullie f6862357e6 Completely remove need for labels when creating question. 2012-10-28 02:05:20 -04:00
Louis Mullie 19813f6a14 Remove unnecessary check. 2012-10-28 02:01:04 -04:00
Louis Mullie 295e22417d Almost finalize liblinear adapter. 2012-10-28 02:00:56 -04:00
Louis Mullie 9fb2c9a127 Stylistic conventions. 2012-10-28 01:52:42 -04:00
Louis Mullie e5f8da4201 #17: Modify the including class, not Object. 2012-10-28 01:52:29 -04:00
Louis Mullie 386d56de70 Fix failing specs. 2012-10-28 01:42:07 -04:00
Louis Mullie 24e02802b9 Merge branch 'master' of github.com:louismullie/treat 2012-10-28 01:35:21 -04:00
Louis Mullie 18fce305ed Add documentation on config options. 2012-10-28 01:35:08 -04:00
Louis Mullie 49cfd6a6f7 Add tf-idf-similarity requirement. 2012-10-28 01:34:38 -04:00
Louis Mullie 5c1ca9d9f1 Move Levensthein to a similarity annotator. 2012-10-28 01:34:26 -04:00
Louis Mullie f7f53b0f7e Add precision/recall example. 2012-10-28 01:34:07 -04:00
Louis Mullie a6b42ca20e Add SVN classification example. 2012-10-28 01:33:59 -04:00
Louis Mullie d3f6df6266 Add in similarity annotators. 2012-10-28 01:33:32 -04:00
Louis Mullie b20a5dd412 Merge pull request #30 from darkphantum/master
Fix bug in tf_idf worker
2012-10-27 08:48:58 -07:00
darkphantum d09dd391b7 fix bug in td_idf worker 2012-10-26 19:08:35 -07:00
Louis Mullie a8170397ff Add message for deprecation of old DSL syntax. 2012-10-26 13:36:37 -04:00
Louis Mullie c182ca8a42 Merge pull request #29 from darkphantum/master
Allow Document entity creation from Raw String
2012-10-25 19:05:41 -07:00
darkphantum 8b8a1c93f3 edit err message related to prev commit 2012-10-25 18:53:33 -07:00
darkphantum ba7d0995d6 Allow Document Entity creation from string 2012-10-25 15:16:57 -07:00
Louis Mullie c7aa48e130 Take care of external output silencing issues. 2012-10-24 01:47:21 -04:00
Louis Mullie 22f17a5b4c Don't use DSL in specs. 2012-10-24 01:47:07 -04:00
Louis Mullie 61d683603c Finalize the implementation of MLPs with FANN. 2012-10-24 01:28:00 -04:00
Louis Mullie 3c925e579f Switch to ruby-fann for neural networks. 2012-10-24 01:27:51 -04:00
Louis Mullie f01f050f9d Add example of ID3 classification. 2012-10-24 01:07:03 -04:00
Louis Mullie afefa10e87 Fix and finish SVM implementation. 2012-10-24 01:06:02 -04:00
Louis Mullie d2a41c0dea Add example of classification with new DSL. 2012-10-24 00:42:44 -04:00
Louis Mullie 651c8490f2 Add proxy for calling Treat methods on lists. 2012-10-24 00:36:40 -04:00
Louis Mullie 1355122e2f Cleanup; pretty final implementation. 2012-10-24 00:35:10 -04:00
Louis Mullie 65311ceda9 Fix bugs occuring due to new return type. 2012-10-24 00:34:42 -04:00
Louis Mullie 0240847ce1 Cleanup. 2012-10-24 00:34:30 -04:00
Louis Mullie 3ae63233c2 Add to_a and to_ary methods (should move elsewhere eventually). 2012-10-24 00:34:20 -04:00
Louis Mullie 86db70f1e8 Cleanup and fix bug. 2012-10-24 00:33:12 -04:00
Louis Mullie 2c7aff0323 Allow to build entities from lists of entities. 2012-10-24 00:33:01 -04:00
Louis Mullie 98b39ef541 A List is going to be a section for now. 2012-10-24 00:32:43 -04:00
Louis Mullie f4975270b2 #17: Use lowercase builder method names. 2012-10-24 00:32:24 -04:00
Louis Mullie e73463c0d0 Format list of dependencies. 2012-10-24 00:30:51 -04:00
Louis Mullie e995da0e71 Reduce complexity in configure! 2012-10-21 19:24:02 -04:00
Louis Mullie aa99e71058 Remove dependency status.
Need to rely on a legacy version of rspec for now.
2012-10-21 19:17:15 -04:00
Louis Mullie 8baa98930a Fix typo. 2012-10-21 19:12:27 -04:00
Louis Mullie 7997a5d6b7 Remove sourcify gem. 2012-10-21 19:11:48 -04:00
Louis Mullie c0e98efb76 Add models to gemspec. 2012-10-21 19:10:01 -04:00
Louis Mullie be72e2499a Attempt to make temp dir if it doesn't exist. 2012-10-21 19:09:46 -04:00
Louis Mullie 2495d9cf05 Punctuation doesn't really have a language (?) 2012-10-21 19:09:25 -04:00
Louis Mullie e588051558 Remove trailing slash before downloading. 2012-10-21 19:09:00 -04:00
Louis Mullie 25689e5ef1 Don't crash if a language pack gem can't be installed. 2012-10-21 19:08:36 -04:00
Louis Mullie decb377e55 Add SQ tag for phrases. 2012-10-21 19:08:13 -04:00
Louis Mullie 5535c03141 Put this before loading library in case that fails. 2012-10-21 17:20:12 -04:00
Louis Mullie 7a902900b2 Improve documentation. 2012-10-21 17:00:17 -04:00
Louis Mullie 69d7c9ff8a Prettify a bit. 2012-10-21 17:00:07 -04:00
Louis Mullie 56902944c1 Ignore benchmark results. 2012-10-21 16:59:51 -04:00
Louis Mullie a713441546 Explicitly call categorization for clarity. 2012-10-21 16:43:35 -04:00
Louis Mullie d43e6be5c8 Isolate import behaviour in module. 2012-10-21 16:43:25 -04:00
Louis Mullie e8baae7513 Use -able convention for all mixins. 2012-10-21 16:43:12 -04:00
Louis Mullie f2c22b42f4 Delete empty class. 2012-10-21 16:42:25 -04:00
Louis Mullie 5a750834ae Add is_mixin? functionality on String (does it end with -able?) 2012-10-21 16:42:17 -04:00
Louis Mullie 363eaca919 Make things cleaner. 2012-10-21 16:42:01 -04:00
Louis Mullie 06913afa8a Add to_struct method to Hash helper. 2012-10-21 16:41:14 -04:00
Louis Mullie 9e755535c6 Global lookup method makes more sense on Workers. 2012-10-21 16:40:44 -04:00
Louis Mullie 4dd56750fb Add a fucking space. 2012-10-21 16:40:00 -04:00
Louis Mullie 544a2a115e Paths configuration does not need super. 2012-10-21 16:39:54 -04:00
Louis Mullie 19a59eaae8 Add some documentation and error checking to feel safer at night. 2012-10-21 16:39:41 -04:00
Louis Mullie 233894a5d8 Complete cleanup of the main config module. 2012-10-21 16:39:29 -04:00
Louis Mullie a34fca4e88 Don't reuse for border cases. 2012-10-20 22:18:05 -04:00
Louis Mullie a36bd07777 Improve code clarity in this file. 2012-10-20 22:17:50 -04:00
Louis Mullie 13175e8f25 Add propert determiner. 2012-10-20 22:17:37 -04:00
Louis Mullie 7b30c26686 Merge pull request #27 from louismullie/config-refactoring
Forgot to rename helper method here.
2012-10-20 18:47:32 -07:00
Louis Mullie d39198d3b9 Forgot to rename helper method here. 2012-10-20 21:49:16 -04:00
Louis Mullie 502b21797f Merge pull request #26 from louismullie/config-refactoring
Config refactoring
2012-10-20 18:46:11 -07:00
Louis Mullie c40c91a47c Include the DSL for benchmark transformers/preprocessors. 2012-10-20 16:08:16 -04:00
Louis Mullie b0e04c4f98 Allow to call verbosity methods from anywhere. 2012-10-20 16:07:41 -04:00
Louis Mullie 7bbabbfd4d Don't use DSL by default. 2012-10-20 16:07:31 -04:00
Louis Mullie 91e52f2f29 #17: make DSL into includable module. 2012-10-20 16:07:14 -04:00
Louis Mullie deda96e7cc Isolate config logic inside classes. 2012-10-20 16:06:07 -04:00
Louis Mullie fa5922ce93 Separate data from config logic. 2012-10-20 16:05:39 -04:00
Louis Mullie f04fd28d8d Remove sweeten and get_path method, more refactorings. 2012-10-20 16:04:47 -04:00
Louis Mullie b20d077fe6 Add configurable mixin. 2012-10-20 16:04:13 -04:00
Louis Mullie 45eedadaf8 Remove newline at end of file. 2012-10-20 16:03:59 -04:00
Louis Mullie e79c2dbdb8 Update README.md 2012-10-15 19:05:41 -03:00
Louis Mullie 8bc84a56ab Require all files in /treat in treat.rb! 2012-10-15 11:53:04 -04:00
Louis Mullie 6a3afafb45 Remove old feature that doesn't scale to other data formats. 2012-10-15 00:27:06 -04:00
Louis Mullie 65d49f8019 Cosmetics. 2012-10-15 00:26:47 -04:00
Louis Mullie 1f928d2d6e Make helpers as methods on standard objects. 2012-10-15 00:26:18 -04:00
Louis Mullie 3d75ffe035 Update helper method syntax. 2012-10-15 00:26:06 -04:00
Louis Mullie 52cb1ad1fa Add instructions for configuration here to make clearer. 2012-10-15 00:25:54 -04:00
Louis Mullie 8dea20c913 BSOD 2012-10-15 00:25:41 -04:00
Louis Mullie ea3c6a66fc Added this here from helpers, used only once. 2012-10-15 00:25:27 -04:00
Louis Mullie 301c8692df Refactor the Helper methods into methods on standard objects. 2012-10-15 00:25:07 -04:00
Louis Mullie 847ea450f6 Move to Core. 2012-10-15 00:21:36 -04:00
Louis Mullie 497e944adf Move to Core; rename helper. 2012-10-15 00:21:31 -04:00
Louis Mullie a4509af637 Scatter this code to the respective config classes. 2012-10-15 00:20:47 -04:00
Louis Mullie 422cacdad5 Consolidate config into more simple classes; make each class responsible for config set up. 2012-10-15 00:20:35 -04:00
Louis Mullie 041a490b67 Make treat.rb clean as hell. 2012-10-15 00:18:51 -04:00
Louis Mullie 6007b7acb9 Reflect moving Installer to Core. 2012-10-15 00:18:10 -04:00
Louis Mullie 200a5ae945 Add :apply alias for :do. 2012-10-13 01:22:56 -04:00
Louis Mullie 38c4b91d04 Remove platform detection helper. 2012-10-13 01:22:47 -04:00
Louis Mullie 71230b33d8 Remove reflection module helper. 2012-10-13 01:22:38 -04:00
Louis Mullie f12a3a6821 Move installer and server to core/ 2012-10-13 01:21:52 -04:00
Louis Mullie 7ede824392 Cosmetics. 2012-10-13 01:21:32 -04:00
Louis Mullie c136483ef9 Fix paths. 2012-10-10 23:06:24 -04:00
Louis Mullie bc18294b1a Only load required gems when calling #load 2012-09-30 12:33:46 -04:00
Louis Mullie 73c5d7baa5 Pass parameter as string. 2012-09-30 12:33:30 -04:00
Louis Mullie 007b5003d9 Raise appropriate error if collection is not stored on disk. 2012-09-30 02:30:30 -04:00
Louis Mullie e789035bb9 Use strings for the options. 2012-09-30 02:30:18 -04:00
Louis Mullie fd45b853f2 Small refactorings for stylistic consistency. 2012-09-30 02:30:04 -04:00
Louis Mullie e00a47b36c Update usage information. 2012-09-30 02:29:46 -04:00
Louis Mullie 31c10f5b28 Rename method to more accurate name (group_frokm_string). 2012-09-30 02:29:38 -04:00
Louis Mullie 52764910cb Cardinalizers and ordinalizers only available on numbers. 2012-09-30 02:29:24 -04:00
Louis Mullie 682f53c841 Remove UnimplementedException. 2012-09-30 01:35:48 -04:00
Louis Mullie 2f3fc670b0 Making things cleaner and adding some docs. 2012-09-30 01:35:03 -04:00
Louis Mullie da623ece80 Remove automatic copying of files into collections. 2012-09-30 01:17:59 -04:00
Louis Mullie a55cff3460 Got this backwards, fixing it. 2012-09-30 01:17:50 -04:00
Louis Mullie 07c4cdb625 Add documentation. 2012-09-30 01:17:36 -04:00
Louis Mullie fed8177ae9 Update documentation for included modules. 2012-09-30 01:17:27 -04:00
Louis Mullie d8b834e7c3 Create folder if collection name does not exist. 2012-09-30 01:17:05 -04:00
Louis Mullie dd15d2f68a Add fixme tag. 2012-09-30 01:16:26 -04:00
Louis Mullie 72bbbfe224 Edit documentation. 2012-09-30 01:16:17 -04:00
Louis Mullie ccaf52ce43 Rename Chainable to Applicable. 2012-09-30 01:16:06 -04:00
Louis Mullie 240955f21d Consolidate all entities in one file. 2012-09-30 01:15:50 -04:00
Louis Mullie c588218b2e Big refactorings, liking the signal to noise ratio in this class now. 2012-09-30 00:31:26 -04:00
Louis Mullie 78f674ba64 Small doc edit. 2012-09-29 23:51:11 -04:00
Louis Mullie 00544f8e1a Format and add documentation; remove UnimplementedException. 2012-09-29 23:50:13 -04:00
Louis Mullie d77f305e5f Add documentation. 2012-09-29 23:50:00 -04:00
Louis Mullie 05d5b85010 Add description of toolkit, author & license information. 2012-09-29 23:49:51 -04:00
Louis Mullie bef10ff3ea Cosmetic change. 2012-09-29 23:27:38 -04:00
Louis Mullie b2994282cd Require the autoload mixin. 2012-09-29 23:27:24 -04:00
Louis Mullie 95171673c4 Put this here until find a better place. 2012-09-29 23:27:15 -04:00
Louis Mullie fb66851258 Cannot enable DSL here. 2012-09-29 23:27:05 -04:00
Louis Mullie 3b90a394a8 Slight refactoring to make more concise; added comments. 2012-09-29 23:26:59 -04:00
Louis Mullie b44d407ccb Stylistic consistency. 2012-09-29 23:10:35 -04:00
Louis Mullie 7b06d82645 Enable DSL in config file. 2012-09-29 23:10:28 -04:00
Louis Mullie 6a75a44e71 Move autoload to its file. 2012-09-29 23:10:17 -04:00
Louis Mullie 6663c3e767 Remove Treat.install -> keep Treat::Installer.install 2012-09-29 23:10:07 -04:00
Louis Mullie f879a7e23a Reflect name change in autoload. 2012-09-29 22:51:59 -04:00
Louis Mullie 88698360e6 Consolidate autoload functionality. 2012-09-29 22:51:51 -04:00
Louis Mullie ad7cb7fdc5 Delete unnecessary modules. 2012-09-29 22:51:34 -04:00
Louis Mullie db038f64e8 Put proxies in their own folder in separate files. 2012-09-29 22:28:25 -04:00
Louis Mullie c77d19fa04 Lookup should be implemented on Category. 2012-09-29 22:28:12 -04:00
Louis Mullie 9b2d100d4b Move module files to their respective folders. 2012-09-29 22:27:32 -04:00
Louis Mullie 019c2ce863 Simplify paths. 2012-09-29 21:36:48 -04:00
Louis Mullie 8880ff49b7 Put module abilities under Modules::Module. 2012-09-29 21:36:36 -04:00
Louis Mullie 69156b4133 Only require JSON and Rack if starting server. 2012-09-29 21:36:24 -04:00
Louis Mullie be1e7f6f69 Only require dependency installer if actually running. 2012-09-29 21:36:07 -04:00
Louis Mullie e694f22cfe Move abilities under the Entity module. 2012-09-29 21:35:56 -04:00
Louis Mullie 98270e689b Proxies becomes its own main module. 2012-09-29 21:35:39 -04:00
Louis Mullie 9ef01650d9 Move group under the workers module. 2012-09-29 21:35:18 -04:00
Louis Mullie f5a8a805ba Delete files in wrong place. 2012-09-29 21:35:08 -04:00
Louis Mullie 3ff2244349 Move module-specific behavior (autoloaded, autoloadable) to modules/module. 2012-09-29 21:34:57 -04:00
Louis Mullie ea3218b9d0 Moved all abilities under Entity class. 2012-09-29 21:34:22 -04:00
Louis Mullie e71d4cd288 Rename Core to Learning. 2012-09-29 20:57:28 -04:00
Louis Mullie 34ba8b3fc6 Rename Core to Learning. 2012-09-29 20:56:18 -04:00
Louis Mullie ee21e9a573 Move installer and server to helpers. 2012-09-29 20:56:06 -04:00
Louis Mullie 3c8d479311 Move modules to separate folder. 2012-09-29 20:55:56 -04:00
Louis Mullie bb2f955a49 Rename /core to /learning. 2012-09-29 20:55:33 -04:00
Louis Mullie 79ed258809 Put exceptions in a separate file. 2012-09-29 20:51:09 -04:00
Louis Mullie 74e661ac81 Make Entity responsible for requiring abilities. 2012-09-29 20:51:02 -04:00
Louis Mullie edadb8b76f Delete unnecessary helper. 2012-09-29 20:49:46 -04:00
Louis Mullie db02468065 Add configuration as a module. 2012-09-29 20:49:25 -04:00
Louis Mullie f6860b7e1a Don't depend on configuration for these functions. 2012-09-29 20:49:16 -04:00
Louis Mullie dba5f527dc Simplify the main script file as much as possible. 2012-09-29 20:49:00 -04:00
Louis Mullie af4c43d6d3 Consolidate all modules in a single file; reduce redundancy in loading code. 2012-09-29 20:29:32 -04:00
Louis Mullie 6e8efc94ad Stylistic consistency. 2012-09-29 20:11:29 -04:00
Louis Mullie bf222d8487 Stylistic consistency. 2012-09-29 20:10:40 -04:00
Louis Mullie f1f8cbcc94 Remove stupid files. 2012-09-29 20:10:33 -04:00
Louis Mullie 5cabbd5480 Big cleaning up in Rakefile. 2012-09-29 20:08:22 -04:00
Louis Mullie b391d5d8d1 Refactorings for HTML benchmark output. 2012-09-29 20:05:19 -04:00
Louis Mullie 33680614b9 Move Ruby version check to version.rb 2012-09-29 20:04:58 -04:00
Louis Mullie 7bf00c8d92 Consolidate duplicated code from Rakefile into a few helper methods. 2012-09-29 19:49:15 -04:00
Louis Mullie 02d03fa27f Require libraty before running specs. 2012-09-29 03:00:27 -04:00
Louis Mullie 1f67be4278 Use require_relative for spec helper. 2012-09-29 02:57:06 -04:00
Louis Mullie 7a742b9b7c Add trace for debug. 2012-09-29 02:56:29 -04:00
Louis Mullie 09889ef8bf Update JAVA HOME to new Travis config settings. 2012-09-29 02:52:56 -04:00
Louis Mullie fadc25c2df Debugging Travis. 2012-09-29 02:51:40 -04:00
Louis Mullie 6f4860b709 List should be a Zone, not Section object. 2012-09-29 02:49:01 -04:00
Louis Mullie 0ad14e848e Add require_all helper. 2012-09-29 02:47:38 -04:00
Louis Mullie 54c8437053 Rename UnsupportedException to UnimplementedException. 2012-09-29 02:47:21 -04:00
Louis Mullie 7fa273cfbf Make all core modules extend Treat::Module. 2012-09-29 02:46:53 -04:00
Louis Mullie 4630d2e0f2 Autoload these classes and remove require. 2012-09-28 15:48:21 -04:00
Louis Mullie a03666b77c Adjust paths for require_relative. 2012-09-28 15:47:54 -04:00
Louis Mullie d4beeb62bd Use require_relative across gem for 1.9.3. 2012-09-28 15:33:44 -04:00
Louis Mullie 0051050278 Add simplecov and terminal-table as development dependencies. 2012-09-28 15:29:16 -04:00
Louis Mullie 3d12716e6a Try require_relative for Travis. 2012-09-28 15:28:54 -04:00
Louis Mullie 8f56542eef Add RSpec version and terminal table. 2012-09-28 15:28:45 -04:00
Louis Mullie 5a4f70292b Cosmetics. 2012-09-28 00:17:06 -04:00
Louis Mullie 070f183f47 Update spec todos. 2012-09-28 00:17:02 -04:00
Louis Mullie 19bd2f010c Try to go live with the new spec testing (yay!) 2012-09-28 00:15:36 -04:00
Louis Mullie 0cf4a487e6 Don't set levels on titles. 2012-09-28 00:15:17 -04:00
Louis Mullie bad04ab137 MLP returns probability. 2012-09-28 00:14:58 -04:00
Louis Mullie 638f4cd696 Cosmetics. 2012-09-28 00:14:43 -04:00
Louis Mullie 79a9bbec0f Decided to serialize collections just like other objects. 2012-09-28 00:14:36 -04:00
Louis Mullie 0ae0a5d131 Raise Treat::Exception instead of Exception. 2012-09-28 00:14:20 -04:00
Louis Mullie 8854510c8d Add lazy options. 2012-09-24 01:05:05 -04:00
Louis Mullie cd2aa7db5e Add specs for classify and keywords. 2012-09-24 01:04:50 -04:00
Louis Mullie eebf537275 Remove debug output. 2012-09-24 01:04:20 -04:00
Louis Mullie 41f6777630 Change :q option to :query for consistency. 2012-09-24 01:04:12 -04:00
Louis Mullie fb0f3cb6a8 Cosmetic change. 2012-09-24 01:04:02 -04:00
Louis Mullie 375337c131 Updated documentation. 2012-09-24 01:03:53 -04:00
Louis Mullie 8fea3b31bc Fixed bug. 2012-09-24 01:03:43 -04:00
Louis Mullie 021f0521c1 Add learners and retrievers to config. 2012-09-24 01:03:20 -04:00
Louis Mullie 325eec48b7 Delete files committed by error. 2012-09-24 01:03:11 -04:00
Louis Mullie d1104145ed Allow defining examples per worker. 2012-09-23 03:29:19 -04:00
Louis Mullie ff7c64b6c1 Add visualize and keyword specs. 2012-09-23 03:28:58 -04:00
Louis Mullie 5e4cdfa29e Remove tf_idf spec (moved to agnostic spec). 2012-09-23 03:28:47 -04:00
Louis Mullie 26e8feef45 Make sure language has stop words. 2012-09-23 03:28:23 -04:00
Louis Mullie 36ada7677d tf_idf could be nil. 2012-09-23 03:28:13 -04:00
Louis Mullie 5755db44ee Do not try to copy the file if it's already in the collection. 2012-09-23 03:27:49 -04:00
Louis Mullie 3f997bbc84 Get apply as an alias for do. 2012-09-23 03:27:35 -04:00
Louis Mullie 61e2a919c5 Add a few stop words. 2012-09-23 03:27:14 -04:00
Louis Mullie b6259ae5f0 Add entry for tf_idf. 2012-09-23 03:27:05 -04:00
Louis Mullie b34b006da5 Fix #24: misspelled gem name crashes German install. 2012-09-23 03:26:54 -04:00
Louis Mullie 66b16b78e2 Add debug message. 2012-09-21 09:45:52 -04:00
Louis Mullie d70acc89e3 Add an XML example phrase. 2012-09-21 01:05:57 -04:00
Louis Mullie a3ebca574a More refactorings and bug fixes. 2012-09-21 01:05:41 -04:00
Louis Mullie 82c8285f5b Change example. 2012-09-21 01:05:28 -04:00
Louis Mullie e8a163a9ae Remodel declension behaviour based on category. 2012-09-21 01:05:06 -04:00
Louis Mullie eb0858ac56 Standardize documentation format. 2012-09-21 01:04:46 -04:00
Louis Mullie 336b9cdb7e Remote workers that are not truly agnostic; add visualizers. 2012-09-21 01:04:15 -04:00
Louis Mullie a2ddbbb6c1 Add a custom unsupported exception (temporary). 2012-09-21 01:03:54 -04:00
Louis Mullie 7d977c38ae Slightly modify description. 2012-09-21 01:03:41 -04:00
Louis Mullie dbd9f4e1ec Made autoselect unnecessary by moving functions to build able. 2012-09-21 01:03:28 -04:00
Louis Mullie d9f68099a0 Bug fix in language class. 2012-09-18 00:19:20 -04:00
Louis Mullie a66de0b2f0 Merge pull request #23 from louismullie/new-specs
New specs
2012-09-17 21:03:15 -07:00
Louis Mullie 2c2ce34271 Make mode an argument to the constructor, and then run. Add sandbox task. 2012-09-18 00:01:53 -04:00
Louis Mullie 5097a05af5 Continue major refactorings. 2012-09-18 00:01:39 -04:00
Louis Mullie 397f21765e Pass headings as a parameter. 2012-09-18 00:01:24 -04:00
Louis Mullie f0cc41e283 Update to reflect new directory structure. 2012-09-18 00:01:13 -04:00
Louis Mullie b495014016 Update to reflect new directory structure. 2012-09-18 00:00:53 -04:00
Louis Mullie 6f107ada1e Made consistent with @file and @folder no longer being instance variables. 2012-09-18 00:00:44 -04:00
Louis Mullie 20191bd41e Change naming from Examples to Scenarios. 2012-09-17 21:31:23 -04:00
Louis Mullie 2a97f15103 Slightly change the syntax to run language scenarios. 2012-09-17 21:31:11 -04:00
Louis Mullie 41522a677b Move in print_table and save_html helpers from language.rb 2012-09-17 21:30:54 -04:00
Louis Mullie e26ace7a31 Doing some major refactorings in base scenario class. 2012-09-17 21:30:29 -04:00
Louis Mullie 021fa53065 Remove useless file. 2012-09-17 02:25:22 -04:00
Louis Mullie 09e45d8423 Rename benchmark to example; move descriptions to workers. 2012-09-17 02:18:11 -04:00
Louis Mullie c6f882c736 Delete file from dir. 2012-09-17 02:17:47 -04:00
Louis Mullie 84834366a4 Rename Benchmarks to Examples 2012-09-17 02:17:25 -04:00
Louis Mullie 999457f2a7 Require treat and spec workers in helper. 2012-09-17 02:17:14 -04:00
Louis Mullie 428d85f106 Make a few adjustments to run specs. 2012-09-17 02:16:44 -04:00
Louis Mullie c59c12e112 Add a Spec helper. 2012-09-17 02:00:41 -04:00
Louis Mullie 23f3264c05 Refactor the Rake file, add some comments. 2012-09-17 02:00:32 -04:00
Louis Mullie 0b4dcd5416 General refactorings. 2012-09-17 02:00:12 -04:00
Louis Mullie 8d904262d2 Remove initialize method. 2012-09-17 01:59:58 -04:00
Louis Mullie 860bdb26a3 Change naming conventions in Specs. 2012-09-17 01:19:24 -04:00
Louis Mullie e1aa90ca5f Change some naming conventions in folders and classes. 2012-09-17 01:19:03 -04:00
Louis Mullie ec39450feb Make all serializers return the location of the stored file. 2012-09-17 01:18:48 -04:00
Louis Mullie 975ce9a8b6 Separate examples into language folders. 2012-09-17 01:18:08 -04:00
Louis Mullie 31f2e2cd41 Updated README. 2012-09-09 12:51:44 -04:00
Louis Mullie ae98b6df9f Comment out worker specs. 2012-09-09 12:43:40 -04:00
Louis Mullie 698dc53720 Replace directional quotes by single quotes. 2012-09-09 12:43:03 -04:00
Louis Mullie 2cf4fbae9b Add to_string method, make sure to dup() value. 2012-09-09 12:42:45 -04:00
Louis Mullie 01e512d4e5 Renamed languages folder to workers folder for consistency. 2012-09-09 12:42:28 -04:00
Louis Mullie 42658e0c6c Make sure to_s returns a dup() of the string value. 2012-08-18 17:59:29 -04:00
Louis Mullie b435e800c4 Removed some info. 2012-08-18 17:59:12 -04:00
Louis Mullie f45218f176 Added TODO. 2012-08-18 17:59:05 -04:00
Louis Mullie d9bf776d7e Remove duplicate method call. 2012-08-18 17:59:00 -04:00
Louis Mullie 8ba162ca24 Use new hash syntax. 2012-08-18 17:41:06 -04:00
Louis Mullie c56ca32b01 Prevent error from occurring when not enough words are present. 2012-08-18 17:40:58 -04:00
Louis Mullie bb4c19f62c Remove tactful tokenizer, which has some odd quirks. 2012-08-16 12:19:52 -04:00
Louis Mullie 9fc955467f Remove the tactful tokenizer script, which is not accurate enough to my liking. 2012-08-16 12:19:07 -04:00
Louis Mullie 5271abd319 Begin adding more serious examples for English. 2012-08-16 12:18:49 -04:00
Louis Mullie ce5b01e209 Cosmetic change. 2012-08-16 12:18:23 -04:00
Louis Mullie 90de0725fc Add support for turning directional quotes on & off. 2012-08-16 12:17:59 -04:00
Louis Mullie 4e378ffab2 Let :tactful_tokenizer only be a segmented (tokenizer not accurate enough). 2012-08-16 12:17:38 -04:00
Louis Mullie 9f69a0ee74 Add support for SRX segmenter. 2012-08-16 12:17:11 -04:00
Louis Mullie fc092b8670 Add support for SRX gems for other languages. 2012-08-16 12:16:55 -04:00
Louis Mullie c1c68171b7 Improve documentation consistency. 2012-08-15 23:44:35 -04:00
Louis Mullie fdd6c8352e Add test sample files. 2012-08-15 23:44:21 -04:00
Louis Mullie b7d7181d84 Remove samples from specs (moved to /language). 2012-08-15 23:44:10 -04:00
Louis Mullie dfb9b94f4e Remove "helper" requirement from header. 2012-08-15 23:43:28 -04:00
Louis Mullie 3862abd44c Split Core spec into several files. 2012-08-15 23:43:01 -04:00
Louis Mullie 7a02ddd501 Run RSpec directly from Rake. 2012-08-15 23:42:43 -04:00
Louis Mullie e7efc4da64 Get rid of the helper, move all stuff to Rakefile. 2012-08-15 23:42:21 -04:00
Louis Mullie 9b9edba5c2 Move all performance stuff to specs. 2012-08-15 23:04:06 -04:00
Louis Mullie ccfaa0e890 Move all performance stuff to specs. 2012-08-15 23:03:52 -04:00
Louis Mullie 6cfd4a2da0 Move core specs to their own folder. 2012-08-15 23:03:30 -04:00
Louis Mullie c44b1ea4b6 Move entity specs to their own folder. 2012-08-15 23:03:21 -04:00
Louis Mullie b8befa7574 Added note to self. 2012-08-15 23:02:55 -04:00
Louis Mullie e7ce896235 Run automated specs. 2012-08-15 23:01:34 -04:00
Louis Mullie 807ac327c2 Merge benchmarks with tests. 2012-08-15 23:01:17 -04:00
Louis Mullie ce4caa6ece Improve documentation consistency, format references. 2012-08-15 23:00:48 -04:00
Louis Mullie 7f009a3826 Temporarily remove Enju for speed of tests. Need to undo. 2012-08-15 23:00:30 -04:00
Louis Mullie b6bafd611e Add 'zip' dependency, add :lda as a worker for :topic_words. 2012-08-15 23:00:15 -04:00
Louis Mullie 10cf4bff24 Add :benchmark task to Rakefile. 2012-08-15 22:59:49 -04:00
Louis Mullie 3897cad0ab Allow to chunk sections as well as documents. 2012-08-15 18:27:32 -04:00
Louis Mullie e503bbfdb4 Make List a type of Zone, not Section. 2012-08-15 18:27:11 -04:00
Louis Mullie 8db9997c0b Remove ActiveSupport inflection support. 2012-08-15 18:26:57 -04:00
Louis Mullie ae00fc69c0 Use rb-libsvm (new maintained version of libsvm-ruby-swig). 2012-08-15 18:26:46 -04:00
Louis Mullie e4803efe3a Documentation consistency improvements. 2012-08-15 18:26:26 -04:00
Louis Mullie f4ffcfffe2 Remove non-standard URL. 2012-08-15 18:26:11 -04:00
Louis Mullie 1dbdff01a3 Enhanced references and moved to :from_tag categorizer. 2012-08-15 18:26:01 -04:00
Louis Mullie 884d367917 Remove :active_support worker and add :from_tag categorizer. 2012-08-15 18:25:36 -04:00
Louis Mullie ca9ed78286 Add rb-libsvm dependency. 2012-08-15 18:25:23 -04:00
Louis Mullie 28a767e9ff More work on automated tests. 2012-08-15 18:25:12 -04:00
Louis Mullie 8406744c18 Add scalpel gem dependency, make SRX default segmenter. 2012-08-15 01:42:39 -04:00
Louis Mullie 6a9ea1f161 Basic benchmark script (work in progress). 2012-08-15 01:41:42 -04:00
Louis Mullie 5d57457392 Documentation consistency improvements. 2012-08-15 01:41:14 -04:00
Louis Mullie 4b95ace6ac Fix quirks in punkt and tactful segmenters. 2012-08-15 01:40:53 -04:00
Louis Mullie 8eb8e044d0 Allow #cardinal and #ordinal to be called on words as well as numbers. 2012-08-15 01:39:20 -04:00
Louis Mullie 2a6c8b0017 Added FIXME note. 2012-08-15 01:39:02 -04:00
Louis Mullie 0349e515cf Added support for Scalpel sentence segmentation tool. 2012-08-15 01:38:52 -04:00
Louis Mullie 2b5f81d7fb Improve consistency by having all workers be classes. 2012-08-14 21:51:51 -04:00
Louis Mullie e8797d0d79 Consistency improvement in documentation. 2012-08-14 19:54:15 -04:00
Louis Mullie 69174213f2 Strip whitespace from the sentences before returning. 2012-08-13 06:32:14 -04:00
Louis Mullie 2a0125463f Simplify target to one abstract type. 2012-08-13 06:31:55 -04:00
Louis Mullie 475c7eb50f Consistency improvements and added benchmark task. 2012-08-13 06:31:42 -04:00
Louis Mullie 38883ac2f5 Coreferences currently not implemented, so remove them. 2012-08-13 06:31:26 -04:00
Louis Mullie 73f2106241 Consistency improvement; all Workers should be classes. 2012-08-13 06:31:00 -04:00
Louis Mullie 91e2f84800 Fix include name for SRX sentence segmenter. 2012-08-12 19:33:32 -04:00
Louis Mullie 9f674cc9e1 Add support for srx-english sentence segmenter. 2012-08-12 19:30:28 -04:00
Louis Mullie 46a204f74c Raise a Treat::Exception when download fails. 2012-08-12 19:15:13 -04:00
Louis Mullie 002eb198c9 Merge branch 'master' of github.com:louismullie/treat 2012-08-05 00:39:53 -04:00
Louis Mullie 5eeb5f894b Consolidating extractor configuration. 2012-08-05 00:39:34 -04:00
Louis Mullie adfd33c4ac Added CodeClimate badge. 2012-08-04 22:59:22 -03:00
Louis Mullie bf3d3feec8 Update README.md
Add new classifier algorithms in v. 1.2.0.
2012-08-04 20:50:55 -03:00
Louis Mullie b5065c9a2e Dynamically get the list of workers from the config. 2012-08-04 13:48:52 -04:00
Louis Mullie c7947aa92c Renamed Doable module to Chainable. 2012-08-04 13:46:27 -04:00
Louis Mullie 82638ebf48 Deleted useless list. 2012-08-04 13:46:01 -04:00
Louis Mullie 8f7a55e1a3 Version 1.2.0 - Final. 2012-08-03 18:30:20 -04:00
Louis Mullie 8bbf993c95 Hide local data used for testing. 2012-08-03 18:25:30 -04:00
Louis Mullie 7c15f91af3 Do not publish tests in verbose mode. 2012-08-03 18:25:22 -04:00
Louis Mullie 64199d2700 Added release notes for 1.2.0. 2012-08-03 18:17:58 -04:00
Louis Mullie 2c0482a959 Added simplecov dependency for testing. 2012-08-03 18:17:48 -04:00
Louis Mullie 18000b65b0 Add a helper function for the specs. 2012-08-02 20:33:44 -04:00
Louis Mullie 1a63b0aa63 Allow configuration of paths for Stanford and Punkt. 2012-08-02 20:33:32 -04:00
Louis Mullie 268b951f30 Make .to_hash a procedural function, not a module. 2012-08-02 20:32:45 -04:00
Louis Mullie eebe1343e9 Allow configuration of paths for Punkt and Reuters models. 2012-08-02 20:32:24 -04:00
Louis Mullie da303637ce Moved hashable function to helpers. 2012-08-02 20:32:10 -04:00
Louis Mullie 39fefb7548 Require helper file instead of directly requiring Treat. 2012-08-02 20:31:51 -04:00
Louis Mullie d87ecabf36 Remove test.dump after testing serialization. 2012-08-02 19:39:29 -04:00
Louis Mullie 7cd749ed7b More works on specs. 2012-08-02 19:38:35 -04:00
Louis Mullie 23cf58667a Added comment/question to self. 2012-08-02 19:38:25 -04:00
Louis Mullie e8d607edf2 Added comparison method for data sets. 2012-08-02 19:38:14 -04:00
Louis Mullie f0e1741e65 Added some argument checks, made stuff more robust. 2012-08-02 17:31:20 -04:00
Louis Mullie ccd6d95761 Fixed argument checking. 2012-08-02 17:30:58 -04:00
Louis Mullie 900c0fe668 More work on Core specs. 2012-08-02 17:30:48 -04:00
Louis Mullie 90fc756ddb Argument checking, beginning to work on #self.build 2012-08-02 17:30:29 -04:00
Louis Mullie 069a2fae9a Reflect change in Treat::Core::Problem. 2012-08-02 15:33:15 -04:00
Louis Mullie 8dab7fe183 50% coverage of the core library. 2012-08-02 15:33:03 -04:00
Louis Mullie f1e2ebfeea Added argument checks. 2012-08-02 15:32:40 -04:00
Louis Mullie a599c15870 Added argument checks, fixed proc equality. 2012-08-02 15:31:47 -04:00
Louis Mullie a2ec1b0673 Remove default Rake task. 2012-08-02 15:31:02 -04:00
Louis Mullie b7f15533c6 Add /coverage to gitignore. 2012-08-02 15:30:53 -04:00
Louis Mullie 11d1ed3b7d Allow underscores in spec file name. 2012-08-01 17:55:24 -04:00
Louis Mullie eed0726754 Finalized API for num_children_with_feature. 2012-08-01 16:07:53 -04:00
Louis Mullie e44bcb051a Merge pull request #20 from automatedtendencies/master
Added value checking & recursion to num_children_with_feature
2012-08-01 12:15:40 -07:00
Robert Adler c03a24df47 Recursion & tag checks per value, cleaned up & reordered the vars 2012-08-01 14:04:12 -04:00
Robert Adler a4b2d4e011 Changes... oh so changes. (I was dumb and did c.method) 2012-08-01 13:27:08 -04:00
Robert Adler 44b59614b2 A little cleaner & put into loop 2012-08-01 13:21:04 -04:00
Robert Adler 1d385747a6 Recursive num_children_with_feature 2012-08-01 13:16:46 -04:00
Robert Adler b28df77fee values_at, not value_at 2012-08-01 13:03:16 -04:00
Robert Adler a93e4b3baa Value as 2nd var in num_children_with_feature 2012-08-01 12:50:33 -04:00
Louis Mullie 88beea24fc Added adapter for LIBLINEAR classification. 2012-08-01 03:04:58 -04:00
Louis Mullie 4216a8ac28 Edited variable names for clarity. 2012-08-01 03:04:40 -04:00
Louis Mullie 8946a5837b Merge branch 'mongo-data-sets' 2012-07-31 23:08:29 -04:00
Louis Mullie aed0938857 Added SVM to uppercase acronyms. 2012-07-31 23:07:25 -04:00
Louis Mullie b96e4be4da Added an optional set of labels for SVM. 2012-07-31 23:07:14 -04:00
Louis Mullie 7172fe466e Added LibSVM adapter. 2012-07-31 23:07:01 -04:00
Louis Mullie a289d7eb94 Add support for tags. 2012-07-30 00:55:33 -04:00
Louis Mullie b46dceedac Make sure to set index only if a folder is provided. 2012-07-30 00:55:24 -04:00
Louis Mullie 621c73d126 Allow to select multiple documents and make collection out of it. 2012-07-30 00:55:11 -04:00
Louis Mullie 7a95f70968 Reflect updates in the data set model. 2012-07-30 00:54:02 -04:00
Louis Mullie 2222882156 Renamed feature to more general "export", also encompassing tags. 2012-07-30 00:53:42 -04:00
Louis Mullie 9bd2d5e246 Add tag feature. 2012-07-30 00:53:27 -04:00
Louis Mullie c854e03c72 Convert objects to hashes. 2012-07-29 18:32:53 -04:00
Louis Mullie ec6239e976 Working on Mongo data sets. 2012-07-29 18:32:45 -04:00
Louis Mullie cd6eb8212b Make question instances convertible to a hash. 2012-07-29 18:32:28 -04:00
Louis Mullie 407a123f63 Add parameter checks; work on serialization. 2012-07-29 18:32:12 -04:00
Louis Mullie bc1b5afb55 Fix stray character. 2012-07-29 18:31:49 -04:00
Louis Mullie c54f110e19 Cleaning up code; use string to load Proc. 2012-07-29 18:31:32 -04:00
Louis Mullie ee6ae2b705 Merge branch 'birch-tree' 2012-07-28 20:59:51 -04:00
Louis Mullie 858356e343 Update option precedence. 2012-07-28 20:58:26 -04:00
Louis Mullie 05e79fe526 Fix paths in Stanford loader. 2012-07-28 20:58:07 -04:00
Louis Mullie ac2cac8a1c Cosmetic change. 2012-07-28 20:57:59 -04:00
Louis Mullie 0b062faaeb Working on Mongo serialization. 2012-07-28 20:57:53 -04:00
Louis Mullie 6a005680c3 Remove this spec; move it to birch gem. 2012-07-28 20:53:46 -04:00
Louis Mullie 6c71fa6831 Merge pull request #19 from louismullie/birch-tree
Use birch gem as base class of tree model for all textual entities.
2012-07-28 17:48:50 -07:00
Louis Mullie 306e0673ec Update spec to conform to new birch gem API. 2012-07-28 20:46:51 -04:00
Louis Mullie a95657409d Fix inconsistency in #size method. 2012-07-28 20:46:34 -04:00
Louis Mullie ada3222d0f Update documentation. 2012-07-28 20:46:12 -04:00
Louis Mullie e8266f4ddf Use Birch::Tree as a base class for Entity. 2012-07-28 20:46:05 -04:00
Louis Mullie c08c836fec Almost finish integration of birch gem. 2012-07-28 20:45:32 -04:00
Louis Mullie e32f4d4a7e Generally renamed "dependencies" to "edges" in preparation for Birch integration. 2012-07-28 17:13:12 -04:00
235 changed files with 6721 additions and 4349 deletions

2
.gitignore vendored
View File

@ -12,4 +12,6 @@
*.html
*.yaml
spec/sandbox.rb
coverage/*
benchmark/*
TODO

2
.rspec
View File

@ -1,2 +1,2 @@
--format s -c
--format d -c
--order rand

View File

@ -1,11 +1,18 @@
language: ruby
rvm:
- 1.9.2
- 1.9.3
- 2.0
- 2.1
- 2.2
before_install:
- export "JAVA_HOME=/usr/lib/jvm/java-6-openjdk/"
before_script:
- export "JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386/"
before_script:
- sudo apt-get install antiword
- sudo apt-get install poppler-utils
- rake treat:install[travis]
script: rake treat:spec
- rake treat:install[travis] --trace
script: rake treat:spec --trace

35
.treat Normal file
View File

@ -0,0 +1,35 @@
# A boolean value indicating whether to silence
# the output of external libraries (e.g. Stanford
# tools, Enju, LDA, Ruby-FANN, Schiphol).
Treat.core.verbosity.silence = false
# A boolean value indicating whether to explain
# the steps that Treat is performing.
Treat.core.verbosity.debug = true
# A boolean value indicating whether Treat should
# try to detect the language of newly input text.
Treat.core.language.detect = false
# A string representing the language to default
# to when detection is off.
Treat.core.language.default = 'english'
# A symbol representing the finest level at which
# language detection should be performed if language
# detection is turned on.
Treat.core.language.detect_at = :document
# The directory containing executables and JAR files.
Treat.paths.bin = '##_INSTALLER_BIN_PATH_##'
# The directory containing trained models
Treat.paths.models = '##_INSTALLER_MODELS_PATH_##'
# Mongo database configuration.
Treat.databases.mongo.db = 'your_database'
Treat.databases.mongo.host = 'localhost'
Treat.databases.mongo.port = '27017'
# Include the DSL by default.
include Treat::Core::DSL

39
Gemfile
View File

@ -1,12 +1,45 @@
source :rubygems
source 'https://rubygems.org'
gemspec
gem 'birch'
gem 'schiphol'
gem 'sourcify'
gem 'yomu'
gem 'ruby-readability'
gem 'nokogiri'
group :test do
gem 'rspec'
gem 'rake'
end
gem 'terminal-table'
gem 'simplecov'
end
=begin
gem 'linguistics'
gem 'engtagger'
gem 'open-nlp'
gem 'stanford-core-nlp'
gem 'rwordnet'
gem 'scalpel'
gem 'fastimage'
gem 'decisiontree'
gem 'whatlanguage'
gem 'zip'
gem 'nickel'
gem 'tactful_tokenizer'
gem 'srx-english'
gem 'punkt-segmenter'
gem 'chronic'
gem 'uea-stemmer'
gem 'rbtagger'
gem 'ruby-stemmer'
gem 'activesupport'
gem 'rb-libsvm'
gem 'tomz-liblinear-ruby-swig'
gem 'ruby-fann'
gem 'fuzzy-string-match'
gem 'levenshtein-ffi'
gem 'tf-idf-similarity'
gem 'kronic'
=end

View File

@ -1,4 +1,4 @@
Treat - Text Retrieval, Extraction and Annotation Toolkit, v. 1.1.2
Treat - Text Retrieval, Extraction and Annotation Toolkit, v. 2.0.0
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@ -15,7 +15,7 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
Author: Louis-Antoine Mullie (louis.mullie@gmail.com). Copyright 2011-12.
Non-trivial amount of code has been incorporated and modified from other libraries:
A non-trivial amount of code has been incorporated and modified from other libraries:
- formatters/readers/odt.rb - Mark Watson (GPL license)
- processors/tokenizers/tactful.rb - Matthew Bunday (GPL license)

View File

@ -1,34 +1,43 @@
[![Build Status](https://secure.travis-ci.org/louismullie/treat.png)](http://travis-ci.org/#!/louismullie/treat)
[![Dependency Status](https://gemnasium.com/louismullie/treat.png)](https://gemnasium.com/louismullie/treat)
Treat is a framework for natural language processing and computational linguistics in Ruby. It provides a common API for a number of gems and external libraries for document retrieval, parsing, annotation, and information extraction.
[![Code Climate](https://codeclimate.com/github/louismullie/treat.png)](https://codeclimate.com/github/louismullie/treat)
**Current features**
![Treat Logo](http://www.louismullie.com/treat/treat-logo.jpg)
**New in v2.0.5: [OpenNLP integration](https://github.com/louismullie/treat/commit/727a307af0c64747619531c3aa355535edbf4632) and [Yomu support](https://github.com/louismullie/treat/commit/e483b764e4847e48b39e91a77af8a8baa1a1d056)**
Treat is a toolkit for natural language processing and computational linguistics in Ruby. The Treat project aims to build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition. Learn more by taking a [quick tour](https://github.com/louismullie/treat/wiki/Quick-Tour) or by reading the [manual](https://github.com/louismullie/treat/wiki/Manual).
**Features**
* Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus).
* Text retrieval with indexation and full-text search (Ferret).
* Text chunkers, sentence segmenters, tokenizers, and parsers for several languages (Stanford & Enju).
* Word inflectors, including stemmers, conjugators, declensors, and number inflection.
* Lexical resources (WordNet interface, several POS taggers for English, Stanford taggers for several languages).
* Text chunkers, sentence segmenters, tokenizers, and parsers (Stanford & Enju).
* Lexical resources (WordNet interface, several POS taggers for English).
* Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
* Serialization of annotated entities to YAML, XML formats or to MongoDB.
* Word inflectors, including stemmers, conjugators, declensors, and number inflection.
* Serialization of annotated entities to YAML, XML or to MongoDB.
* Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats.
* Linguistic resources, including language detection and tag alignments for several treebanks.
* Decision tree and multilayer perceptron classification (liblinear coming soon!)
* Machine learning (decision tree, multilayer perceptron, LIBLINEAR, LIBSVM).
* Text retrieval with indexation and full-text search (Ferret).
<br>
**Contributing**
**Resources**
I am actively seeking developers that can help maintain and expand this project. You can find a list of ideas for contributing to the project [here](https://github.com/louismullie/treat/wiki/Contributing).
* Read the [latest documentation](http://rubydoc.info/github/louismullie/treat/frames).
* See how to [install Treat](https://github.com/louismullie/treat/wiki/Installation).
* Learn how to [use Treat](https://github.com/louismullie/treat/wiki/Manual).
* Help out by [contributing to the project](https://github.com/louismullie/treat/wiki/Contributing).
* View a list of [papers](https://github.com/louismullie/treat/wiki/Papers) about tools included in this toolkit.
* Open an [issue](https://github.com/louismullie/treat/issues).
<br>
**Authors**
Lead developper: @louismullie [[Twitter](https://twitter.com/LouisMullie)]
Contributors:
- @bdigital
- @automatedtendencies
- @LeFnord
- @darkphantum
- @whistlerbrk
- @smileart
- @erol
**License**
This software is released under the [GPL License](https://github.com/louismullie/treat/wiki/License-Information) and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.
This software is released under the [GPL License](https://github.com/louismullie/treat/wiki/License-Information) and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.

14
RELEASE
View File

@ -41,4 +41,16 @@ Treat - Text Retrieval, Extraction and Annotation Toolkit
1.1.0
* Complete refactoring of the core of the library.
* Separated all configuration stuff from dynamic stuff.
* Separated all configuration stuff from dynamic stuff.
1.2.0
* Added LIBSVM and LIBLINEAR classifier support.
* Added support for serialization of documents and data sets to MongoDB.
* Added specs for most of the core classes.
* Several bug fixes.
2.0.0rc1
* MAJOR CHANGE: the old DSL is no longer supported. A new DSL style using
lowercase keywords is now used and must be required explicitly.

View File

@ -1,29 +1,47 @@
require 'date'
require 'rspec/core/rake_task'
task :default => :spec
# All commands are prefixed with "treat:".
namespace :treat do
RSpec::Core::RakeTask.new do |t|
task = ARGV[0].scan(/\[([a-z]*)\]/)
if task && task.size == 0
t.pattern = "./spec/*.rb"
else
t.pattern = "./spec/#{task[0][0]}.rb"
end
# Require the Treat library.
require_relative 'lib/treat'
# Sandbox a script, for development.
# Syntax: rake treat:sandbox
task :sandbox do
require_relative 'spec/sandbox'
end
# Prints the current version of Treat.
# Syntax: rake treat:version
task :version do
vpath = '../lib/treat/version.rb'
vfile = File.expand_path(vpath, __FILE__)
contents = File.read(vfile)
puts contents[/VERSION = "([^"]+)"/, 1]
puts Treat::VERSION
end
# Installs a language pack (default to english).
# A language pack is a set of gems, binaries and
# model files that support the various workers
# that are available for that particular language.
# Syntax: rake treat:install (installs english)
# - OR - rake treast:install[some_language]
task :install, [:language] do |t, args|
require './lib/treat'
Treat.install(args.language || 'english')
language = args.language || 'english'
Treat::Core::Installer.install(language)
end
# Runs 1) the core library specs and 2) the
# worker specs for a) all languages (default)
# or b) a specific language (if specified).
# Also outputs the coverage for the whole
# library to treat/coverage (using SimpleCov).
# N.B. the worker specs are dynamically defined
# following the examples found in spec/workers.
# (see /spec/language/workers for more info)
# Syntax: rake treat:spec (core + all langs)
# - OR - rake treat:spec[some_language]
task :spec, [:language] do |t, args|
require_relative 'spec/helper'
Treat::Specs::Helper.start_coverage
Treat::Specs::Helper.run_library_specs
Treat::Specs::Helper.run_language_specs(args.language)
end
end
end

View File

@ -1,36 +1,23 @@
# Treat is a toolkit for natural language
# processing and computational linguistics
# in Ruby. The Treat project aims to build
# a language- and algorithm- agnostic NLP
# framework for Ruby with support for tasks
# such as document retrieval, text chunking,
# segmentation and tokenization, natural
# language parsing, part-of-speech tagging,
# keyword mining and named entity recognition.
#
# Author: Louis-Antoine Mullie (c) 2010-12.
#
# Released under the General Public License.
module Treat
# Treat requires Ruby >= 1.9.2
if RUBY_VERSION < '1.9.2'
raise "Treat requires Ruby version 1.9.2 " +
"or higher, but current is #{RUBY_VERSION}."
end
# Custom exception class.
class Exception < ::Exception; end
# Load configuration options.
require 'treat/config'
# Load all workers.
require 'treat/helpers'
# Require library loaders.
require 'treat/loaders'
# Require all core classes.
require 'treat/core'
# Require all entity classes.
require 'treat/entities'
# Lazy load worker classes.
require 'treat/workers'
# Require proxies last.
require 'treat/proxies'
# Turn sugar on.
Treat::Config.sweeten!
# Install packages for a given language.
def self.install(language = :english)
require 'treat/installer'
Treat::Installer.install(language)
end
# * Load all the core classes. * #
require_relative 'treat/version'
require_relative 'treat/exception'
require_relative 'treat/autoload'
require_relative 'treat/modules'
require_relative 'treat/builder'
end

44
lib/treat/autoload.rb Normal file
View File

@ -0,0 +1,44 @@
# Basic mixin for all the main modules;
# takes care of requiring the right files
# in the right order for each one.
#
# If a module's folder (e.g. /entities)
# contains a file with a corresponding
# singular name (e.g. /entity), that
# base class is required first. Then,
# all the files that are found directly
# under that folder are required (but
# not those found in sub-folders).
module Treat::Autoload
# Loads all the files for the base
# module in the appropriate order.
def self.included(base)
m = self.get_module_name(base)
d = self.get_module_path(m)
n = self.singularize(m) + '.rb'
f, p = File.join(d, n), "#{d}/*.rb"
require f if File.readable?(f)
Dir.glob(p).each { |f| require f }
end
# Returns the path to a module's dir.
def self.get_module_path(name)
file = File.expand_path(__FILE__)
dirs = File.dirname(file).split('/')
File.join(*dirs[0..-1], name)
end
# Return the downcased form of the
# module's last name (e.g. "entities").
def self.get_module_name(mod)
mod.to_s.split('::')[-1].downcase
end
# Helper method to singularize words.
def self.singularize(w)
if w[-3..-1] == 'ies'; w[0..-4] + 'y'
else; (w[-1] == 's' ? w[0..-2] : w); end
end
end

6
lib/treat/builder.rb Normal file
View File

@ -0,0 +1,6 @@
class Treat::Builder
include Treat::Core::DSL
def initialize(&block)
instance_exec(&block)
end
end

View File

@ -1,135 +0,0 @@
module Treat::Config
Paths = [ :tmp, :lib, :bin,
:files, :data, :models, :spec ]
class << self
attr_accessor :config
end
Treat.module_eval do
# Handle all missing methods as conf options.
def self.method_missing(sym, *args, &block)
super(sym, *args, &block) if sym == :to_ary
Treat::Config.config[sym]
end
end
def self.configure
# Temporary configuration hash.
config = { paths: {} }
confdir = get_full_path(:lib) + 'treat/config'
# Iterate over each directory in the config.
Dir[confdir + '/*'].each do |dir|
name = File.basename(dir, '.*').intern
config[name] = {}
# Iterate over each file in the directory.
Dir[confdir + "/#{name}/*.rb"].each do |file|
key = File.basename(file, '.*').intern
config[name][key] = eval(File.read(file))
end
end
# Get the path config.
Paths.each do |path|
config[:paths][path] = get_full_path(path)
end
# Get the tag alignments.
configure_tags!(config[:tags][:aligned])
# Convert hash to structs.
self.config = self.hash_to_struct(config)
end
def self.get_full_path(dir)
File.dirname(__FILE__) +
'/../../' + dir.to_s + "/"
end
def self.configure_tags!(config)
ts = config[:tag_sets]
config[:word_tags_to_category] =
align_tags(config[:word_tags], ts)
config[:phrase_tags_to_category] =
align_tags(config[:phrase_tags], ts)
end
# Align tag configuration.
def self.align_tags(tags, tag_sets)
wttc = {}
tags.each_slice(2) do |desc, tags|
category = desc.gsub(',', ' ,').
split(' ')[0].downcase
tag_sets.each_with_index do |tag_set, i|
next unless tags[i]
wttc[tags[i]] ||= {}
wttc[tags[i]][tag_set] = category
end
end
wttc
end
def self.hash_to_struct(hash)
return hash if hash.keys.
select { |k| !k.is_a?(Symbol) }.size > 0
struct = Struct.new(
*hash.keys).new(*hash.values)
hash.each do |key, value|
if value.is_a?(Hash)
struct[key] =
self.hash_to_struct(value)
end
end
struct
end
# Turn on syntactic sugar.
def self.sweeten!
# Undo this in unsweeten! - # Fix
Treat::Entities.module_eval do
self.constants.each do |type|
define_singleton_method(type) do |value='', id=nil|
const_get(type).build(value, id)
end
end
end
return if Treat.core.syntax.sweetened
Treat.core.syntax.sweetened = true
Treat.core.entities.list.each do |type|
next if type == :Symbol
kname = cc(type).intern
klass = Treat::Entities.const_get(kname)
Object.class_eval do
define_method(kname) do |val, opts={}|
klass.build(val, opts)
end
end
end
Treat::Core.constants.each do |kname|
Object.class_eval do
klass = Treat::Core.const_get(kname)
define_method(kname) do |*args|
klass.new(*args)
end
end
end
end
# Turn off syntactic sugar.
def self.unsweeten!
return unless Treat.core.syntax.sweetened
Treat.core.syntax.sweetened = false
Treat.core.entities.list.each do |type|
name = cc(type).intern
next if type == :Symbol
Object.class_eval { remove_method(name) }
end
end
# Run all configuration.
self.configure
end

View File

@ -0,0 +1,38 @@
# This module uses structs to represent the
# configuration options that are stored in
# the /config folder.
module Treat::Config
# Require configurable mix in.
require_relative 'importable'
# Make all configuration importable.
extend Treat::Config::Importable
# Core configuration options for entities.
class Treat::Config::Entities; end
# Configuration for paths to models, binaries,
# temporary storage and file downloads.
class Treat::Config::Paths; end
# Configuration for all Treat workers.
class Treat::Config::Workers; end
# Helpful linguistic options.
class Treat::Config::Linguistics; end
# Supported workers for each language.
class Treat::Config::Languages; end
# Configuration options for external libraries.
class Treat::Config::Libraries; end
# Configuration options for database
# connectivity (host, port, etc.)
class Treat::Config::Databases; end
# Configuration options for Treat core.
class Treat::Config::Core; end
end

View File

@ -0,0 +1,51 @@
# Provide default functionality to load configuration
# options from flat files into their respective modules.
module Treat::Config::Configurable
# When extended, add the .config property to
# the class that is being operated on.
def self.extended(base)
class << base; attr_accessor :config; end
base.class_eval { self.config = {} }
end
# Provide base functionality to configure
# all modules. The behaviour is as follows:
#
# 1 - Check if a file named data/$CLASS$.rb
# exists; if so, load that file as the base
# configuration, i.e. "Treat.$CLASS$"; e.g.
# "Treat.core"
#
# 2 - Check if a folder named data/$CLASS$
# exists; if so, load each file in that folder
# as a suboption of the main configuration,
# i.e. "Treat.$CLASS$.$FILE$"; e.g. "Treat.workers"
#
# (where $CLASS$ is the lowercase name of
# the concrete class being extended by this.)
def configure!
path = File.dirname(File.expand_path( # FIXME
__FILE__)).split('/')[0..-4].join('/') + '/'
main_dir = path + 'lib/treat/config/data/'
mod_name = self.name.split('::')[-1].downcase
conf_dir = main_dir + mod_name
base_file = main_dir + mod_name + '.rb'
if File.readable?(base_file)
self.config = eval(File.read(base_file))
elsif FileTest.directory?(conf_dir)
self.config = self.from_dir(conf_dir)
else; raise Treat::Exception,
"No config file found for #{mod_name}."
end
end
# * Helper methods for configuraton * #
def from_dir(conf_dir)
Hash[Dir[conf_dir + '/*'].map do |path|
name = File.basename(path, '.*').intern
[name, eval(File.read(path))]
end]
end
end

View File

@ -1,4 +0,0 @@
['xml', 'html', 'txt', 'odt',
'abw', 'doc', 'yaml', 'uea',
'lda', 'pdf', 'ptb', 'dot',
'ai', 'id3', 'svo', 'mlp' ]

View File

@ -1,8 +0,0 @@
{language_to_code: {
arabic: 'UTF-8',
chinese: 'GB18030',
english: 'UTF-8',
french: 'ISO_8859-1',
ferman: 'ISO_8859-1',
hebrew: 'UTF-8'
}}

View File

@ -1,2 +0,0 @@
{list: [:entity, :unknown, :email, :url, :symbol, :sentence, :punctuation, :number, :enclitic, :word, :token, :fragment, :phrase, :paragraph, :title, :zone, :list, :block, :page, :section, :collection, :document],
order: [:token, :fragment, :phrase, :sentence, :zone, :section, :document, :collection]}

View File

@ -1,3 +0,0 @@
{default: :english,
detect: false,
detect_at: :document}

View File

@ -1,8 +0,0 @@
{description: {
:tmp => 'temporary files',
:lib => 'class and module definitions',
:bin => 'binary files',
:files => 'user-saved files',
:models => 'model files',
:spec => 'spec test files'
}}

View File

@ -1 +0,0 @@
{sweetened: false}

View File

@ -1 +0,0 @@
{debug: false, silence: true}

View File

@ -0,0 +1,54 @@
{
acronyms:
['xml', 'html', 'txt', 'odt',
'abw', 'doc', 'yaml', 'uea',
'lda', 'pdf', 'ptb', 'dot',
'ai', 'id3', 'svo', 'mlp',
'svm', 'srx', 'nlp'],
encodings:
{language_to_code: {
arabic: 'UTF-8',
chinese: 'GB18030',
english: 'UTF-8',
french: 'ISO_8859-1',
ferman: 'ISO_8859-1',
hebrew: 'UTF-8'
}},
entities:
{list:
[:entity, :unknown, :email,
:url, :symbol, :sentence,
:punctuation, :number,
:enclitic, :word, :token, :group,
:fragment, :phrase, :paragraph,
:title, :zone, :list, :block,
:page, :section, :collection,
:document],
order:
[:token, :fragment, :group,
:sentence, :zone, :section,
:document, :collection]},
language: {
default: :english,
detect: false,
detect_at: :document
},
paths: {
description: {
tmp: 'temporary files',
lib: 'class and module definitions',
bin: 'binary files',
files: 'user-saved files',
models: 'model files',
spec: 'spec test files'
}
},
learning: {
list: [:data_set, :export, :feature, :tag, :problem, :question]
},
syntax: { sweetened: false },
verbosity: { debug: false, silence: true}
}

View File

@ -0,0 +1,10 @@
{
default: {
adapter: :mongo
},
mongo: {
host: 'localhost',
port: '27017',
db: nil
}
}

View File

@ -0,0 +1,15 @@
{
list:
[:entity, :unknown, :email,
:url, :symbol, :sentence,
:punctuation, :number,
:enclitic, :word, :token,
:fragment, :phrase, :paragraph,
:title, :zone, :list, :block,
:page, :section, :collection,
:document],
order:
[:token, :fragment, :phrase,
:sentence, :zone, :section,
:document, :collection]
}

View File

@ -0,0 +1,33 @@
{
dependencies: [
'ferret', 'bson_ext', 'mongo', 'lda-ruby',
'stanford-core-nlp', 'linguistics',
'ruby-readability', 'whatlanguage',
'chronic', 'kronic', 'nickel', 'decisiontree',
'rb-libsvm', 'ruby-fann', 'zip', 'loggability',
'tf-idf-similarity', 'narray', 'fastimage',
'fuzzy-string-match', 'levenshtein-ffi'
],
workers: {
learners: {
classifiers: [:id3, :linear, :mlp, :svm]
},
extractors: {
keywords: [:tf_idf],
language: [:what_language],
topic_words: [:lda],
tf_idf: [:native],
distance: [:levenshtein],
similarity: [:jaro_winkler, :tf_idf]
},
formatters: {
serializers: [:xml, :yaml, :mongo],
unserializers: [:xml, :yaml, :mongo],
visualizers: [:dot, :standoff, :tree]
},
retrievers: {
searchers: [:ferret],
indexers: [:ferret]
}
}
}

View File

@ -6,7 +6,7 @@
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
tokenizers: []
}
}
}

View File

@ -0,0 +1,95 @@
{
dependencies: [
'rbtagger',
'ruby-stemmer',
'punkt-segmenter',
'tactful_tokenizer',
'nickel',
'rwordnet',
'uea-stemmer',
'engtagger',
'activesupport',
'srx-english',
'scalpel'
],
workers: {
extractors: {
time: [:chronic, :kronic, :ruby, :nickel],
topics: [:reuters],
name_tag: [:stanford]
},
inflectors: {
conjugators: [:linguistics],
declensors: [:english, :linguistics],
stemmers: [:porter, :porter_c, :uea],
ordinalizers: [:linguistics],
cardinalizers: [:linguistics]
},
lexicalizers: {
taggers: [:lingua, :brill, :stanford],
sensers: [:wordnet],
categorizers: [:from_tag]
},
processors: {
parsers: [:stanford],
segmenters: [:scalpel, :srx, :tactful, :punkt, :stanford],
tokenizers: [:ptb, :stanford, :punkt, :open_nlp]
}
},
stop_words:
[
"about",
"also",
"are",
"away",
"because",
"been",
"beside",
"besides",
"between",
"but",
"cannot",
"could",
"did",
"etc",
"even",
"ever",
"every",
"for",
"had",
"have",
"how",
"into",
"isn",
"maybe",
"non",
"nor",
"now",
"should",
"such",
"than",
"that",
"then",
"these",
"this",
"those",
"though",
"too",
"was",
"wasn",
"were",
"what",
"when",
"where",
"which",
"while",
"who",
"whom",
"whose",
"will",
"with",
"would",
"wouldn",
"yes"
]
}

View File

@ -0,0 +1,148 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer',
'stanford-core-nlp'
],
workers: {
processors: {
segmenters: [:scalpel],
tokenizers: [:ptb,:stanford],
parsers: [:stanford]
},
lexicalizers: {
taggers: [:stanford],
categorizers: [:from_tag]
}
},
stop_words:
[
"ailleurs",
"ainsi",
"alors",
"aucun",
"aucune",
"auquel",
"aurai",
"auras",
"aurez",
"aurons",
"auront",
"aussi",
"autre",
"autres",
"aux",
"auxquelles",
"auxquels",
"avaient",
"avais",
"avait",
"avec",
"avez",
"aviez",
"avoir",
"avons",
"celui",
"cependant",
"certaine",
"certaines",
"certains",
"ces",
"cet",
"cette",
"ceux",
"chacun",
"chacune",
"chaque",
"comme",
"constamment",
"davantage",
"depuis",
"des",
"desquelles",
"desquels",
"dessous",
"dessus",
"donc",
"dont",
"duquel",
"egalement",
"elles",
"encore",
"enfin",
"ensuite",
"etaient",
"etais",
"etait",
"etes",
"etiez",
"etions",
"etre",
"eux",
"guere",
"ici",
"ils",
"jamais",
"jusqu",
"laquelle",
"legerement",
"lequel",
"les",
"lesquelles",
"lesquels",
"leur",
"leurs",
"lors",
"lui",
"maintenant",
"mais",
"malgre",
"moi",
"moins",
"notamment",
"parce",
"plupart",
"pourtant",
"presentement",
"presque",
"puis",
"puisque",
"quand",
"quant",
"que",
"quel",
"quelqu",
"quelque",
"quelques",
"qui",
"quoi",
"quoique",
"rien",
"selon",
"serai",
"seras",
"serez",
"serons",
"seront",
"soient",
"soit",
"sommes",
"sont",
"sous",
"suis",
"telle",
"telles",
"tels",
"toi",
"toujours",
"tout",
"toutes",
"tres",
"trop",
"une",
"vos",
"votre",
"vous"
]
}

View File

@ -0,0 +1,137 @@
#encoding: UTF-8
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer',
'stanford-core-nlp'
],
workers: {
processors: {
segmenters: [:tactful, :punkt, :stanford, :scalpel],
tokenizers: [:stanford, :punkt],
parsers: [:stanford]
},
lexicalizers: {
taggers: [:stanford],
categorizers: [:from_tag]
}
},
stop_words:
[
"alle",
"allem",
"alles",
"andere",
"anderem",
"anderen",
"anderer",
"anderes",
"auf",
"bei",
"beim",
"bist",
"dadurch",
"dein",
"deine",
"deiner",
"deines",
"deins",
"dem",
"denen",
"der",
"deren",
"des",
"deshalb",
"dessen",
"diese",
"diesem",
"diesen",
"dieser",
"dieses",
"ein",
"eine",
"einem",
"einen",
"einer",
"eines",
"euer",
"euere",
"eueren",
"eueres",
"für",
"haben",
"habt",
"hatte",
"hatten",
"hattest",
"hattet",
"hierzu",
"hinter",
"ich",
"ihr",
"ihre",
"ihren",
"ihrer",
"ihres",
"indem",
"ist",
"jede",
"jedem",
"jeden",
"jeder",
"jedes",
"kann",
"kannst",
"können",
"könnt",
"konnte",
"konnten",
"konntest",
"konntet",
"mehr",
"mein",
"meine",
"meiner",
"meines",
"meins",
"nach",
"neben",
"nicht",
"nichts",
"seid",
"sein",
"seine",
"seiner",
"seines",
"seins",
"sie",
"sind",
"über",
"und",
"uns",
"unser",
"unsere",
"unter",
"vor",
"warst",
"weil",
"wenn",
"werde",
"werden",
"werdet",
"willst",
"wir",
"wird",
"wirst",
"wollen",
"wollt",
"wollte",
"wollten",
"wolltest",
"wolltet",
"zum",
"zur"
]
}

View File

@ -6,7 +6,7 @@
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
tokenizers: []
}
}
}

View File

@ -0,0 +1,162 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: []
}
},
stop_words:
[
"affinche",
"alcun",
"alcuna",
"alcune",
"alcuni",
"alcuno",
"allora",
"altra",
"altre",
"altri",
"altro",
"anziche",
"certa",
"certe",
"certi",
"certo",
"che",
"chi",
"chiunque",
"comunque",
"con",
"cosa",
"cose",
"cui",
"dagli",
"dai",
"dall",
"dalla",
"dalle",
"darsi",
"degli",
"del",
"dell",
"della",
"delle",
"dello",
"dunque",
"egli",
"eppure",
"esse",
"essi",
"forse",
"gia",
"infatti",
"inoltre",
"invece",
"lui",
"malgrado",
"mediante",
"meno",
"mentre",
"mie",
"miei",
"mio",
"modo",
"molta",
"molte",
"molti",
"molto",
"negli",
"nel",
"nella",
"nelle",
"nessun",
"nessuna",
"nessuno",
"niente",
"noi",
"nostra",
"nostre",
"nostri",
"nostro",
"nulla",
"occorre",
"ogni",
"ognuno",
"oltre",
"oltretutto",
"oppure",
"ovunque",
"ovvio",
"percio",
"pertanto",
"piu",
"piuttosto",
"poca",
"poco",
"poiche",
"propri",
"proprie",
"proprio",
"puo",
"qua",
"qual",
"qualche",
"qualcuna",
"qualcuno",
"quale",
"quali",
"qualunque",
"quando",
"quant",
"quante",
"quanti",
"quanto",
"quantunque",
"quegli",
"quei",
"quest",
"questa",
"queste",
"questi",
"questo",
"qui",
"quindi",
"sebbene",
"sembra",
"sempre",
"senza",
"soltanto",
"stessa",
"stesse",
"stessi",
"stesso",
"sugli",
"sui",
"sul",
"sull",
"sulla",
"sulle",
"suo",
"suoi",
"taluni",
"taluno",
"tanta",
"tanti",
"tanto",
"tra",
"tuo",
"tuoi",
"tutt",
"tutta",
"tutte",
"tutto",
"una",
"uno",
"voi"
]
}

View File

@ -0,0 +1,11 @@
{
dependencies: [
'punkt-segmenter',
'srx-polish'
],
workers: {
processors: {
segmenters: [:srx, :punkt]
}
}
}

View File

@ -6,7 +6,7 @@
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
tokenizers: []
}
}
}

View File

@ -6,7 +6,7 @@
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
tokenizers: []
}
}
}

View File

@ -0,0 +1,291 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: []
}
},
stop_words:
[
"abans",
"aca",
"acerca",
"ahora",
"aixo",
"algo",
"algu",
"alguien",
"algun",
"alguna",
"algunas",
"algunes",
"alguno",
"algunos",
"alguns",
"alla",
"alli",
"allo",
"altra",
"altre",
"altres",
"amb",
"amunt",
"antes",
"aquel",
"aquell",
"aquella",
"aquellas",
"aquelles",
"aquellos",
"aquells",
"aquest",
"aquesta",
"aquestes",
"aquests",
"aqui",
"asimismo",
"aun",
"aunque",
"avall",
"cada",
"casi",
"com",
"como",
"con",
"cosas",
"coses",
"cual",
"cuales",
"cualquier",
"cuando",
"damunt",
"darrera",
"davant",
"debe",
"deben",
"deber",
"debia",
"debian",
"decia",
"decian",
"decir",
"deia",
"deien",
"del",
"demasiado",
"des",
"desde",
"despues",
"dicen",
"diciendo",
"dins",
"dir",
"diu",
"diuen",
"doncs",
"ell",
"ellas",
"elles",
"ells",
"els",
"encara",
"entonces",
"ese",
"esos",
"esser",
"esta",
"estan",
"estando",
"estant",
"estar",
"estaria",
"estarian",
"estarien",
"estas",
"estos",
"farien",
"feia",
"feien",
"fent",
"fue",
"fueron",
"gaire",
"gairebe",
"hace",
"hacia",
"hacian",
"haciendo",
"haran",
"hauria",
"haurien",
"hemos",
"hola",
"junto",
"lejos",
"les",
"lloc",
"los",
"menos",
"menys",
"meva",
"mias",
"mio",
"misma",
"mismas",
"mismo",
"mismos",
"molt",
"molta",
"moltes",
"mon",
"mucha",
"mucho",
"muy",
"nadie",
"ningu",
"nomes",
"nosaltres",
"nosotros",
"nostra",
"nostre",
"nuestra",
"nuestras",
"nuestro",
"nuestros",
"nunca",
"otra",
"pasa",
"pasan",
"pasara",
"pasaria",
"passara",
"passaria",
"passen",
"perque",
"poc",
"pocas",
"pocos",
"podem",
"poden",
"podeu",
"podria",
"podrian",
"podrien",
"poques",
"porque",
"potser",
"puc",
"pudieron",
"pudo",
"puede",
"pueden",
"puesto",
"qualsevol",
"quan",
"que",
"queria",
"querian",
"qui",
"quien",
"quienes",
"quiere",
"quieren",
"quin",
"quina",
"quines",
"quins",
"quizas",
"segueent",
"segun",
"sempre",
"seran",
"seria",
"serian",
"seu",
"seva",
"sido",
"siempre",
"siendo",
"siguiente",
"sino",
"sobretodo",
"solamente",
"sovint",
"suya",
"suyas",
"suyo",
"suyos",
"tambe",
"tambien",
"tanmateix",
"tanta",
"tanto",
"tendran",
"tendria",
"tendrian",
"tenen",
"teu",
"teva",
"tiene",
"tienen",
"tindran",
"tindria",
"tindrien",
"toda",
"todavia",
"todo",
"tota",
"totes",
"tras",
"traves",
"tuvieron",
"tuvo",
"tuya",
"tuyas",
"tuyo",
"tuyos",
"unas",
"unes",
"unos",
"uns",
"usaba",
"usaban",
"usada",
"usades",
"usado",
"usan",
"usando",
"usant",
"usar",
"usat",
"usava",
"usaven",
"usen",
"vaig",
"varem",
"varen",
"vareu",
"vegada",
"vegades",
"vez",
"volem",
"volen",
"voleu",
"vora",
"vos",
"vosaltres",
"vosotros",
"vostra",
"vostre",
"voy",
"vuestra",
"vuestras",
"vuestro",
"vuestros",
"vull"
]
}

View File

@ -0,0 +1,289 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: []
}
},
stop_words:
[
"atminstone",
"an",
"anda",
"aven",
"aldrig",
"alla",
"alls",
"allt",
"alltid",
"allting",
"alltsa",
"andra",
"annan",
"annars",
"antingen",
"att",
"bakom",
"bland",
"blev",
"bli",
"bliva",
"blivit",
"bort",
"bortom",
"bredvid",
"dar",
"darav",
"darefter",
"darfor",
"dari",
"darigenom",
"darvid",
"dedar",
"definitivt",
"del",
"den",
"dendar",
"denhar",
"denna",
"deras",
"dessa",
"dessutom",
"desto",
"det",
"detta",
"dylik",
"efterat",
"efter",
"eftersom",
"eller",
"emellertid",
"enbart",
"endast",
"enligt",
"ens",
"ensam",
"envar",
"eran",
"etc",
"ett",
"exakt",
"fatt",
"fastan",
"fick",
"fler",
"flera",
"foljande",
"foljde",
"foljer",
"for",
"fore",
"forhoppningsvis",
"formodligen",
"forr",
"forra",
"forutom",
"forvisso",
"fran",
"framfor",
"fullstandigt",
"gang",
"gar",
"gatt",
"ganska",
"gav",
"genom",
"genomgaende",
"ger",
"gick",
"gjorde",
"gjort",
"gor",
"hade",
"har",
"harav",
"har",
"hej",
"hela",
"helst",
"helt",
"hitta",
"hon",
"honom",
"hur",
"huruvida",
"huvudsakligen",
"ibland",
"icke",
"ickedestomindre",
"igen",
"ihop",
"inat",
"ingen",
"ingenstans",
"inget",
"innan",
"innehalla",
"inre",
"inte",
"inuti",
"istaellet",
"kanske",
"klart",
"knappast",
"knappt",
"kom",
"komma",
"kommer",
"kraver",
"kunde",
"kunna",
"lata",
"later",
"lagga",
"langre",
"laet",
"lagd",
"leta",
"letar",
"manga",
"maste",
"med",
"medan",
"medans",
"mellan",
"mest",
"min",
"mindre",
"minst",
"mittemellan",
"motsvarande",
"mycket",
"nagon",
"nagongang",
"nagonsin",
"nagonstans",
"nagonting",
"nagorlunda",
"nagot",
"namligen",
"nar",
"nara",
"nasta",
"nastan",
"nedat",
"nedanfor",
"nerat",
"ner",
"nog",
"normalt",
"nummer",
"nuvarande",
"nytt",
"oavsett",
"och",
"ocksa",
"oppna",
"over",
"overallt",
"ofta",
"okej",
"olika",
"ovanfor",
"ratt",
"redan",
"relativt",
"respektive",
"rimlig",
"rimligen",
"rimligt",
"salunda",
"savida",
"saga",
"sager",
"sakert",
"sand",
"sarskilt",
"satt",
"sak",
"samma",
"samtliga",
"sedd",
"senare",
"senaste",
"ser",
"sig",
"sista",
"sjaelv",
"ska",
"skall",
"skickad",
"skriva",
"skulle",
"snabb",
"snarare",
"snart",
"som",
"somliga",
"speciellt",
"stalla",
"stallet",
"starta",
"strax",
"stundom",
"tackar",
"tanka",
"taga",
"tagen",
"tala",
"tanke",
"tidigare",
"tills",
"tog",
"totalt",
"trolig",
"troligen",
"tvaers",
"tvars",
"tycka",
"tyckte",
"tyvarr",
"understundom",
"upp",
"uppenbarligen",
"uppenbart",
"utan",
"utanfor",
"uteslutande",
"utom",
"var",
"varan",
"vad",
"val",
"varde",
"vanlig",
"vanligen",
"var",
"vare",
"varenda",
"varfor",
"varifran",
"varit",
"varje",
"varken",
"vars",
"vart",
"vem",
"verkligen",
"vidare",
"vilken",
"vill",
"visar",
"visst",
"visste"
]
}

View File

@ -0,0 +1,16 @@
{
punkt: {
model_path: nil
},
reuters: {
model_path: nil
},
stanford: {
jar_path: nil,
model_path: nil
},
open_nlp: {
jar_path: nil,
model_path: nil
}
}

View File

@ -0,0 +1,44 @@
{
categories:
['adjective', 'adverb', 'noun',
'verb', 'interjection', 'clitic',
'coverb', 'conjunction', 'determiner',
'particle', 'preposition', 'pronoun',
'number', 'symbol', 'punctuation',
'complementizer'],
punctuation: {
punct_to_category: {
'.' => 'period',
',' => 'comma',
';' => 'semicolon',
':' => 'colon',
'?' => 'interrogation',
'!' => 'exclamation',
'"' => 'double_quote',
"'" => 'single_quote',
'$' => 'dollar',
'%' => 'percent',
'#' => 'hash',
'*' => 'asterisk',
'&' => 'ampersand',
'+' => 'plus',
'-' => 'dash',
'/' => 'slash',
'\\' => 'backslash',
'^' => 'caret',
'_' => 'underscore',
'`' => 'tick',
'|' => 'pipe',
'~' => 'tilde',
'@' => 'at',
'[' => 'bracket',
']' => 'bracket',
'{' => 'brace',
'}' => 'brace',
'(' => 'parenthesis',
')' => 'parenthesis',
'<' => 'tag',
'>' => 'tag'
}}
}

View File

@ -0,0 +1,328 @@
{
aligned: {
tag_sets: [
:claws_c5, :brown, :penn,
:stutgart, :chinese, :paris7
],
phrase_tags: [
'Adjectival phrase', ['', '', 'ADJP', '', '', 'AP'],
'Adverbial phrase', ['', '', 'ADVP', '', '', 'AdP'],
'Conjunction phrase', ['', '', 'CONJP', '', '', 'Ssub'],
'Fragment', ['', '', 'FRAG', '', '', ''],
'Interjectional phrase', ['', '', 'INTJ', '', '', ''],
'List marker', ['', '', 'LST', '', '', ''],
'Not a phrase', ['', '', 'NAC', '', '', ''],
'Noun phrase', ['', '', 'NP', '', '', 'NP'],
'Verbal nucleus', ['', '', '', '', '', 'VN'],
'Head of noun phrase', ['', '', 'NX', '', '', ''],
'Prepositional phrase', ['', '', 'PP', '', '', 'PP'],
'Parenthetical', ['', '', 'PRN', '', '', ''],
'Particle', ['', '', 'PRT', '', '', ''],
'Participial phrase', ['', '', '', '', '', 'VPart'],
'Quantifier phrase', ['', '', 'QP', '', '', ''],
'Relative clause', ['', '', 'RRC', '', '', 'Srel'],
'Coordinated phrase', ['', '', 'UCP', '', '', 'COORD'],
'Infinitival phrase', ['', '', '', '', '', 'VPinf'],
'Verb phrase', ['', '', 'VP', '', '', ''],
'Inverted yes/no question', ['', '', 'SQ', '', '', ''],
'Wh adjective phrase', ['', '', 'WHADJP', '', '', ''],
'Wh adverb phrase', ['', '', 'WHADVP', '', '', ''],
'Wh noun phrase', ['', '', 'WHNP', '', '', ''],
'Wh prepositional phrase', ['', '', 'WHPP', '', '', ''],
'Unknown', ['', '', 'X', '', '', ''],
'Phrase', ['', '', 'P', '', '', 'Sint'],
'Sentence', ['', '', 'S', '', '', 'SENT'],
'Phrase', ['', '', 'SBAR', '', '', ''] # Fix
],
word_tags: [
# Aligned tags for the Claws C5, Brown and Penn tag sets.
'Adjective', ['AJ0', 'JJ', 'JJ', '', 'JJ', 'A'],
'Adjective', ['AJ0', 'JJ', 'JJ', '', 'JJ', 'ADJ'],
'Ajective, adverbial or predicative', ['', '', '', 'ADJD', '', 'ADJ'],
'Adjective, attribute', ['', '', '', 'ADJA', 'VA', 'ADJ'],
'Adjective, ordinal number', ['ORD', 'OD', 'JJ', '', 'OD', 'ADJ'],
'Adjective, comparative', ['AJC', 'JJR', 'JJR', 'KOKOM', '', 'ADJ'],
'Adjective, superlative', ['AJS', 'JJT', 'JJS', '', 'JJ', 'ADJ'],
'Adjective, superlative, semantically', ['AJ0', 'JJS', 'JJ', '', '', 'ADJ'],
'Adjective, cardinal number', ['CRD', 'CD', 'CD', 'CARD', 'CD', 'ADJ'],
'Adjective, cardinal number, one', ['PNI', 'CD', 'CD', 'CARD', 'CD', 'ADJ'],
'Adverb', ['AV0', 'RB', 'RB', 'ADV', 'AD', 'ADV'],
'Adverb, negative', ['XX0', '*', 'RB', 'PTKNEG', '', 'ADV'],
'Adverb, comparative', ['AV0', 'RBR', 'RBR', '', 'AD', 'ADV'],
'Adverb, superlative', ['AV0', 'RBT', 'RBS', '', 'AD', 'ADV'],
'Adverb, particle', ['AVP', 'RP', 'RP', '', '', 'ADV'],
'Adverb, question', ['AVQ', 'WRB', 'WRB', '', 'AD', 'ADV'],
'Adverb, degree & question', ['AVQ', 'WQL', 'WRB', '', 'ADV'],
'Adverb, degree', ['AV0', 'QL', 'RB', '', '', 'ADV'],
'Adverb, degree, postposed', ['AV0', 'QLP', 'RB', '', '', 'ADV'],
'Adverb, nominal', ['AV0', 'RN', 'RB', 'PROP', '', 'ADV'],
'Adverb, pronominal', ['', '', '', '', 'PROP', '', 'ADV'],
'Conjunction, coordination', ['CJC', 'CC', 'CC', 'KON', 'CC', 'COOD'],
'Conjunction, coordination, and', ['CJC', 'CC', 'CC', 'KON', 'CC', 'ET'],
'Conjunction, subordination', ['CJS', 'CS', 'IN', 'KOUS', 'CS', 'CONJ'],
'Conjunction, subordination with to and infinitive', ['', '', '', 'KOUI', '', ''],
'Conjunction, complementizer, that', ['CJT', 'CS', 'IN', '', '', 'C'],
'Determiner', ['DT0', 'DT', 'DT', '', 'DT', 'D'],
'Determiner, pronoun', ['DT0', 'DTI', 'DT', '', '', 'D'],
'Determiner, pronoun, plural', ['DT0', 'DTS', 'DT', '', '', 'D'],
'Determiner, prequalifier', ['DT0', 'ABL', 'DT', '', '', 'D'],
'Determiner, prequantifier', ['DT0', 'ABN', 'PDT', '', 'DT', 'D'],
'Determiner, pronoun or double conjunction', ['DT0', 'ABX', 'PDT', '', '', 'D'],
'Determiner, pronoun or double conjunction', ['DT0', 'DTX', 'DT', '', '', 'D'],
'Determiner, article', ['AT0', 'AT', 'DT', 'ART', '', 'D'],
'Determiner, postdeterminer', ['DT0', 'AP', 'DT', '', '', 'D'],
'Determiner, possessive', ['DPS', 'PP$', 'PRP$', '', '', 'D'],
'Determiner, possessive, second', ['DPS', 'PP$', 'PRPS', '', '', 'D'],
'Determiner, question', ['DTQ', 'WDT', 'WDT', '', 'DT', 'D'],
'Determiner, possessive & question', ['DTQ', 'WP$', 'WP$', '', '', 'D'],
'Interjection', ['', '', '', '', '', 'I'],
'Localizer', ['', '', '', '', 'LC'],
'Measure word', ['', '', '', '', 'M'],
'Noun, common', ['NN0', 'NN', 'NN', 'N', 'NN', 'NN'],
'Noun, singular', ['NN1', 'NN', 'NN', 'NN', 'NN', 'N'],
'Noun, plural', ['NN2', 'NNS', 'NNS', 'NN', 'NN', 'N'],
'Noun, proper, singular', ['NP0', 'NP', 'NNP', 'NE', 'NR', 'N'],
'Noun, proper, plural', ['NP0', 'NPS', 'NNPS', 'NE', 'NR', 'N'],
'Noun, adverbial', ['NN0', 'NR', 'NN', 'NE', '', 'N'],
'Noun, adverbial, plural', ['NN2', 'NRS', 'NNS', '', 'N'],
'Noun, temporal', ['', '', '', '', 'NT', 'N'],
'Noun, verbal', ['', '', '', '', 'NN', 'N'],
'Pronoun, nominal (indefinite)', ['PNI', 'PN', 'PRP', '', 'PN', 'CL'],
'Pronoun, personal, subject', ['PNP', 'PPSS', 'PRP', 'PPER'],
'Pronoun, personal, subject, 3SG', ['PNP', 'PPS', 'PRP', 'PPER'],
'Pronoun, personal, object', ['PNP', 'PPO', 'PRP', 'PPER'],
'Pronoun, reflexive', ['PNX', 'PPL', 'PRP', 'PRF'],
'Pronoun, reflexive, plural', ['PNX', 'PPLS', 'PRP', 'PRF'],
'Pronoun, question, subject', ['PNQ', 'WPS', 'WP', 'PWAV'],
'Pronoun, question, subject', ['PNQ', 'WPS', 'WPS', 'PWAV'], # FIXME
'Pronoun, question, object', ['PNQ', 'WPO', 'WP', 'PWAV', 'PWAT'],
'Pronoun, existential there', ['EX0', 'EX', 'EX'],
'Pronoun, attributive demonstrative', ['', '', '', 'PDAT'],
'Prounoun, attributive indefinite without determiner', ['', '', '', 'PIAT'],
'Pronoun, attributive possessive', ['', '', '', 'PPOSAT', ''],
'Pronoun, substituting demonstrative', ['', '', '', 'PDS'],
'Pronoun, substituting possessive', ['', '', '', 'PPOSS', ''],
'Prounoun, substituting indefinite', ['', '', '', 'PIS'],
'Pronoun, attributive relative', ['', '', '', 'PRELAT', ''],
'Pronoun, substituting relative', ['', '', '', 'PRELS', ''],
'Pronoun, attributive interrogative', ['', '', '', 'PWAT'],
'Pronoun, adverbial interrogative', ['', '', '', 'PWAV'],
'Pronoun, substituting interrogative', ['', '', '', 'PWS'],
'Verb, main, finite', ['', '', '', 'VVFIN', '', 'V'],
'Verb, main, infinitive', ['', '', '', 'VVINF', '', 'V'],
'Verb, main, imperative', ['', '', '', 'VVIMP', '', 'V'],
'Verb, base present form (not infinitive)', ['VVB', 'VB', 'VBP', '', '', 'V'],
'Verb, infinitive', ['VVI', 'VB', 'VB', 'V', '', 'V'],
'Verb, past tense', ['VVD', 'VBD', 'VBD', '', '', 'V'],
'Verb, present participle', ['VVG', 'VBG', 'VBG', 'VAPP', '', 'V'],
'Verb, past/passive participle', ['VVN', 'VBN', 'VBN', 'VVPP', '', 'V'],
'Verb, present, 3SG, -s form', ['VVZ', 'VBZ', 'VBZ', '', '', 'V'],
'Verb, auxiliary', ['', '', '', 'VAFIN', '', 'V'],
'Verb, imperative', ['', '', '', 'VAIMP', '', 'V'],
'Verb, imperative infinitive', ['', '', '', 'VAINF', '', 'V'],
'Verb, auxiliary do, base', ['VDB', 'DO', 'VBP', '', '', 'V'],
'Verb, auxiliary do, infinitive', ['VDB', 'DO', 'VB', '', '', 'V'],
'Verb, auxiliary do, past', ['VDD', 'DOD', 'VBD', '', '', 'V'],
'Verb, auxiliary do, present participle', ['VDG', 'VBG', 'VBG', '', '', 'V'],
'Verb, auxiliary do, past participle', ['VDN', 'VBN', 'VBN', '', '', 'V'],
'Verb, auxiliary do, present 3SG', ['VDZ', 'DOZ', 'VBZ', '', '', 'V'],
'Verb, auxiliary have, base', ['VHB', 'HV', 'VBP', 'VA', '', 'V'],
'Verb, auxiliary have, infinitive', ['VHI', 'HV', 'VB', 'VAINF', '', 'V'],
'Verb, auxiliary have, past', ['VHD', 'HVD', 'VBD', 'VA', '', 'V'],
'Verb, auxiliary have, present participle', ['VHG', 'HVG', 'VBG', 'VA', '', 'V'],
'Verb, auxiliary have, past participle', ['VHN', 'HVN', 'VBN', 'VAPP', '', 'V'],
'Verb, auxiliary have, present 3SG', ['VHZ', 'HVZ', 'VBZ', 'VA', '', 'V'],
'Verb, auxiliary be, infinitive', ['VBI', 'BE', 'VB', '', '', 'V'],
'Verb, auxiliary be, past', ['VBD', 'BED', 'VBD', '', '', 'V'],
'Verb, auxiliary be, past, 3SG', ['VBD', 'BEDZ', 'VBD', '', '', 'V'],
'Verb, auxiliary be, present participle', ['VBG', 'BEG', 'VBG', '', '', 'V'],
'Verb, auxiliary be, past participle', ['VBN', 'BEN', 'VBN', '', '', 'V'],
'Verb, auxiliary be, present, 3SG', ['VBZ', 'BEZ', 'VBZ', '', '', 'V'],
'Verb, auxiliary be, present, 1SG', ['VBB', 'BEM', 'VBP', '', '', 'V'],
'Verb, auxiliary be, present', ['VBB', 'BER', 'VBP', '', '', 'V'],
'Verb, modal', ['VM0', 'MD', 'MD', 'VMFIN', 'VV', 'V'],
'Verb, modal', ['VM0', 'MD', 'MD', 'VMINF', 'VV', 'V'],
'Verb, modal, finite', ['', '', '', '', 'VMFIN', 'V'],
'Verb, modal, infinite', ['', '', '', '', 'VMINF', 'V'],
'Verb, modal, past participle', ['', '', '', '', 'VMPP', 'V'],
'Particle', ['', '', '', '', '', 'PRT'],
'Particle, with adverb', ['', '', '', 'PTKA', '', 'PRT'],
'Particle, answer', ['', '', '', 'PTKANT', '', 'PRT'],
'Particle, negation', ['', '', '', 'PTKNEG', '', 'PRT'],
'Particle, separated verb', ['', '', '', 'PTKVZ', '', 'PRT'],
'Particle, to as infinitive marker', ['TO0', 'TO', 'TO', 'PTKZU', '', 'PRT'],
'Preposition, comparative', ['', '', '', 'KOKOM', '', 'P'],
'Preposition, to', ['PRP', 'IN', 'TO', '', '', 'P'],
'Preposition', ['PRP', 'IN', 'IN', 'APPR', 'P', 'P'],
'Preposition, with aritcle', ['', '', '', 'APPART', '', 'P'],
'Preposition, of', ['PRF', 'IN', 'IN', '', '', 'P'],
'Possessive', ['POS', '$', 'POS'],
'Postposition', ['', '', '', 'APPO'],
'Circumposition, right', ['', '', '', 'APZR', ''],
'Interjection, onomatopoeia or other isolate', ['ITJ', 'UH', 'UH', 'ITJ', 'IJ'],
'Onomatopoeia', ['', '', '', '', 'ON'],
'Punctuation', ['', '', '', '', 'PU', 'PN'],
'Punctuation, sentence ender', ['PUN', '.', '.', '', '', 'PN'],
'Punctuation, semicolon', ['PUN', '.', '.', '', '', 'PN'],
'Puncutation, colon or ellipsis', ['PUN', ':', ':'],
'Punctuation, comma', ['PUN', ',', ',', '$,'],
'Punctuation, dash', ['PUN', '-', '-'],
'Punctuation, dollar sign', ['PUN', '', '$'],
'Punctuation, left bracket', ['PUL', '(', '(', '$('],
'Punctuation, right bracket', ['PUR', ')', ')'],
'Punctuation, quotation mark, left', ['PUQ', '', '``'],
'Punctuation, quotation mark, right', ['PUQ', '', '"'],
'Punctuation, left bracket', ['PUL', '(', 'PPL'],
'Punctuation, right bracket', ['PUR', ')', 'PPR'],
'Punctuation, left square bracket', ['PUL', '(', 'LSB'],
'Punctuation, right square bracket', ['PUR', ')', 'RSB'],
'Punctuation, left curly bracket', ['PUL', '(', 'LCB'],
'Punctuation, right curly bracket', ['PUR', ')', 'RCB'],
'Unknown, foreign words (not in lexicon)', ['UNZ', '(FW-)', 'FW', '', 'FW'],
'Symbol', ['', '', 'SYM', 'XY'],
'Symbol, alphabetical', ['ZZ0', '', ''],
'Symbol, list item', ['', '', 'LS'],
# Not sure about these tags from the Chinese PTB.
'Aspect marker', ['', '', '', '', 'AS'], # ?
'Ba-construction', ['', '', '', '', 'BA'], # ?
'In relative', ['', '', '', '', 'DEC'], # ?
'Associative', ['', '', '', '', 'DER'], # ?
'In V-de or V-de-R construct', ['', '', '', '', 'DER'], # ?
'For words ? ', ['', '', '', '', 'ETC'], # ?
'In long bei-construct', ['', '', '', '', 'LB'], # ?
'In short bei-construct', ['', '', '', '', 'SB'], # ?
'Sentence-nal particle', ['', '', '', '', 'SB'], # ?
'Particle, other', ['', '', '', '', 'MSP'], # ?
'Before VP', ['', '', '', '', 'DEV'], # ?
'Verb, ? as main verb', ['', '', '', '', 'VE'], # ?
'Verb, ????', ['', '', '', '', 'VC'] # ?
]},
enju: {
cat_to_category: {
'ADJ' => 'adjective',
'ADV' => 'adverb',
'CONJ' => 'conjunction',
'COOD' => 'conjunction',
'C' => 'complementizer',
'D' => 'determiner',
'N' => 'noun',
'P' => 'preposition',
'PN' => 'punctuation',
'SC' => 'conjunction',
'V' => 'verb',
'PRT' => 'particle'
},
cat_to_description: [
['ADJ', 'Adjective'],
['ADV', 'Adverb'],
['CONJ', 'Coordination conjunction'],
['C', 'Complementizer'],
['D', 'Determiner'],
['N', 'Noun'],
['P', 'Preposition'],
['SC', 'Subordination conjunction'],
['V', 'Verb'],
['COOD', 'Part of coordination'],
['PN', 'Punctuation'],
['PRT', 'Particle'],
['S', 'Sentence']
],
xcat_to_description: [
['COOD', 'Coordinated phrase/clause'],
['IMP', 'Imperative sentence'],
['INV', 'Subject-verb inversion'],
['Q', 'Interrogative sentence with subject-verb inversion'],
['REL', 'A relativizer included'],
['FREL', 'A free relative included'],
['TRACE', 'A trace included'],
['WH', 'A wh-question word included']
],
xcat_to_ptb: [
['ADJP', '', 'ADJP'],
['ADJP', 'REL', 'WHADJP'],
['ADJP', 'FREL', 'WHADJP'],
['ADJP', 'WH', 'WHADJP'],
['ADVP', '', 'ADVP'],
['ADVP', 'REL', 'WHADVP'],
['ADVP', 'FREL', 'WHADVP'],
['ADVP', 'WH', 'WHADVP'],
['CONJP', '', 'CONJP'],
['CP', '', 'SBAR'],
['DP', '', 'NP'],
['NP', '', 'NP'],
['NX', 'NX', 'NAC'],
['NP' 'REL' 'WHNP'],
['NP' 'FREL' 'WHNP'],
['NP' 'WH' 'WHNP'],
['PP', '', 'PP'],
['PP', 'REL', 'WHPP'],
['PP', 'WH', 'WHPP'],
['PRT', '', 'PRT'],
['S', '', 'S'],
['S', 'INV', 'SINV'],
['S', 'Q', 'SQ'],
['S', 'REL', 'SBAR'],
['S', 'FREL', 'SBAR'],
['S', 'WH', 'SBARQ'],
['SCP', '', 'SBAR'],
['VP', '', 'VP'],
['VP', '', 'VP'],
['', '', 'UK']
]},
paris7: {
tag_to_category: {
'C' => :complementizer,
'PN' => :punctuation,
'SC' => :conjunction
}
# Paris7 Treebank functional tags
=begin
SUJ (subject)
OBJ (direct object)
ATS (predicative complement of a subject)
ATO (predicative complement of a direct object)
MOD (modifier or adjunct)
A-OBJ (indirect complement introduced by à)
DE-OBJ (indirect complement introduced by de)
P-OBJ (indirect complement introduced by another preposition)
=end
},
ptb: {
escape_characters: {
'(' => '-LRB-',
')' => '-RRB-',
'[' => '-LSB-',
']' => '-RSB-',
'{' => '-LCB-',
'}' => '-RCB-'
},
phrase_tag_to_description: [
['S', 'Paris7 declarative clause'],
['SBAR', 'Clause introduced by a (possibly empty) subordinating conjunction'],
['SBARQ', 'Direct question introduced by a wh-word or a wh-phrase'],
['SINV', 'Inverted declarative sentence'],
['SQ', 'Inverted yes/no question']
]
}
}

View File

@ -6,7 +6,7 @@
},
time: {
type: :annotator,
targets: [:phrase]
targets: [:group]
},
topics: {
type: :annotator,
@ -22,18 +22,18 @@
},
name_tag: {
type: :annotator,
targets: [:phrase, :word]
},
coreferences: {
type: :annotator,
targets: [:zone]
targets: [:group]
},
tf_idf: {
type: :annotator,
targets: [:word]
},
summary: {
type: :annotator,
targets: [:document]
similarity: {
type: :computer,
targets: [:entity]
},
distance: {
type: :computer,
targets: [:entity]
}
}

View File

@ -1,11 +1,12 @@
{
taggers: {
type: :annotator,
targets: [:phrase, :token]
targets: [:group, :token],
recursive: true
},
categorizers: {
type: :annotator,
targets: [:phrase, :token],
targets: [:group, :token],
recursive: true
},
sensers: {
@ -14,5 +15,5 @@
preset_option: :nym,
presets: [:synonyms, :antonyms,
:hyponyms, :hypernyms],
}
}
}

View File

@ -1,7 +1,7 @@
{
chunkers: {
type: :transformer,
targets: [:document],
targets: [:document, :section],
default: :autoselect
},
segmenters: {
@ -10,10 +10,10 @@
},
tokenizers: {
type: :transformer,
targets: [:sentence, :phrase]
targets: [:group]
},
parsers: {
type: :transformer,
targets: [:sentence, :phrase]
targets: [:group]
}
}

View File

@ -1 +0,0 @@
{adapter: :mongo}

View File

@ -1 +0,0 @@
{host: 'localhost', port: '27017', db: nil }

View File

@ -0,0 +1,31 @@
# Mixin that is extended by Treat::Config
# in order to provide a single point of
# access method to trigger the import.
module Treat::Config::Importable
# Import relies on each configuration.
require_relative 'configurable'
# Store all the configuration in self.config
def self.extended(base)
class << base; attr_accessor :config; end
end
# Main function; loads all configuration options.
def import!
config, c = {}, Treat::Config::Configurable
definition = :define_singleton_method
Treat::Config.constants.each do |const|
next if const.to_s.downcase.is_mixin?
klass = Treat::Config.const_get(const)
klass.class_eval { extend c }.configure!
name = const.to_s.downcase.intern
config[name] = klass.config
Treat.send(definition, name) do
Treat::Config.config[name]
end
end
self.config = config.to_struct
end
end

View File

@ -1,34 +0,0 @@
{
dependencies: [
'psych',
'nokogiri',
'ferret',
'bson_ext',
'mongo',
'lda-ruby',
'stanford-core-nlp',
'linguistics',
'ruby-readability',
'whatlanguage',
'chronic',
'nickel',
'decisiontree',
'ai4r'
],
workers: {
extractors: {
keywords: [:tf_idf],
language: [:what_language]
},
formatters: {
serializers: [:xml, :yaml, :mongo]
},
lexicalizers: {
categorizers: [:from_tag]
},
inflectors: {
ordinalizers: [:linguistics],
cardinalizers: [:linguistics]
}
}
}

View File

@ -1,60 +0,0 @@
{
dependencies: [
'rbtagger',
'ruby-stemmer',
'punkt-segmenter',
'tactful_tokenizer',
'nickel',
'rwordnet',
'uea-stemmer',
'engtagger',
'activesupport',
'english'
],
workers: {
extractors: {
time: [:chronic, :ruby, :nickel],
topics: [:reuters],
keywords: [:tf_idf],
name_tag: [:stanford],
coreferences: [:stanford]
},
inflectors: {
conjugators: [:linguistics],
declensors: [:english, :linguistics, :active_support],
stemmers: [:porter, :porter_c, :uea],
ordinalizers: [:linguistics],
cardinalizers: [:linguistics]
},
lexicalizers: {
taggers: [:lingua, :brill, :stanford],
sensers: [:wordnet]
},
processors: {
parsers: [:stanford, :enju],
segmenters: [:tactful, :punkt, :stanford],
tokenizers: [:ptb, :stanford, :tactful, :punkt]
}
},
info: {
stopwords:
['the', 'of', 'and', 'a', 'to', 'in', 'is',
'you', 'that', 'it', 'he', 'was', 'for', 'on',
'are', 'as', 'with', 'his', 'they', 'I', 'at',
'be', 'this', 'have', 'from', 'or', 'one', 'had',
'by', 'word', 'but', 'not', 'what', 'all', 'were',
'we', 'when', 'your', 'can', 'said', 'there', 'use',
'an', 'each', 'which', 'she', 'do', 'how', 'their',
'if', 'will', 'up', 'other', 'about', 'out', 'many',
'then', 'them', 'these', 'so', 'some', 'her', 'would',
'make', 'like', 'him', 'into', 'time', 'has', 'look',
'two', 'more', 'write', 'go', 'see', 'number', 'no',
'way', 'could', 'people', 'my', 'than', 'first', 'been',
'call', 'who', 'its', 'now', 'find', 'long', 'down',
'day', 'did', 'get', 'come', 'made', 'may', 'part',
'say', 'also', 'new', 'much', 'should', 'still',
'such', 'before', 'after', 'other', 'then', 'over',
'under', 'therefore', 'nonetheless', 'thereafter',
'afterwards', 'here', 'huh', 'hah', "n't", "'t", 'here']
}
}

View File

@ -1,18 +0,0 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer',
'stanford-core-nlp'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful],
parsers: [:stanford]
},
lexicalizers: {
taggers: [:stanford],
categorizers: [:from_tag]
}
}
}

View File

@ -1,18 +0,0 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer',
'stanford'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful],
parsers: [:stanford]
},
lexicalizers: {
taggers: [:stanford],
categorizers: [:from_tag]
}
}
}

View File

@ -1,12 +0,0 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
}
}
}

View File

@ -1,12 +0,0 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
}
}
}

View File

@ -1,12 +0,0 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
}
}
}

View File

@ -1,12 +0,0 @@
{
dependencies: [
'punkt-segmenter',
'tactful_tokenizer'
],
workers: {
processors: {
segmenters: [:punkt],
tokenizers: [:tactful]
}
}
}

View File

@ -1 +0,0 @@
{jar_path: nil, model_path: nil}

View File

@ -1,4 +0,0 @@
['adjective', 'adverb', 'noun', 'verb', 'interjection',
'clitic', 'coverb', 'conjunction', 'determiner', 'particle',
'preposition', 'pronoun', 'number', 'symbol', 'punctuation',
'complementizer']

View File

@ -1,33 +0,0 @@
{punct_to_category: {
'.' => 'period',
',' => 'comma',
';' => 'semicolon',
':' => 'colon',
'?' => 'interrogation',
'!' => 'exclamation',
'"' => 'double_quote',
"'" => 'single_quote',
'$' => 'dollar',
'%' => 'percent',
'#' => 'hash',
'*' => 'asterisk',
'&' => 'ampersand',
'+' => 'plus',
'-' => 'dash',
'/' => 'slash',
'\\' => 'backslash',
'^' => 'caret',
'_' => 'underscore',
'`' => 'tick',
'|' => 'pipe',
'~' => 'tilde',
'@' => 'at',
'[' => 'bracket',
']' => 'bracket',
'{' => 'brace',
'}' => 'brace',
'(' => 'parenthesis',
')' => 'parenthesis',
'<' => 'tag',
'>' => 'tag'
}}

23
lib/treat/config/paths.rb Normal file
View File

@ -0,0 +1,23 @@
# Generates the following path config options:
# Treat.paths.tmp, Treat.paths.bin, Treat.paths.lib,
# Treat.paths.models, Treat.paths.files, Treat.paths.spec.
class Treat::Config::Paths
# Get the path configuration based on the
# directory structure loaded into Paths.
# Note that this doesn't call super, as
# there is no external config files to load.
def self.configure!
root = File.dirname(File.expand_path( # FIXME
__FILE__)).split('/')[0..-4].join('/') + '/'
self.config = Hash[
# Get a list of directories in treat/
Dir.glob(root + '*').select do |path|
FileTest.directory?(path)
# Map to pairs of [:name, path]
end.map do |path|
[File.basename(path).intern, path + '/']
end]
end
end

37
lib/treat/config/tags.rb Normal file
View File

@ -0,0 +1,37 @@
# Handles all configuration related
# to understanding of part of speech
# and phrasal tags.
class Treat::Config::Tags
# Generate a map of word and phrase tags
# to their syntactic category, keyed by
# tag set.
def self.configure!
super
config = self.config[:aligned].dup
word_tags, phrase_tags, tag_sets =
config[:word_tags], config[:phrase_tags]
tag_sets = config[:tag_sets]
config[:word_tags_to_category] =
align_tags(word_tags, tag_sets)
config[:phrase_tags_to_category] =
align_tags(phrase_tags, tag_sets)
self.config[:aligned] = config
end
# Helper methods for tag set config.
# Align tag tags in the tag set
def self.align_tags(tags, tag_sets)
wttc = {}
tags.each_slice(2) do |desc, tags|
category = desc.gsub(',', ' ,').
split(' ')[0].downcase
tag_sets.each_with_index do |tag_set, i|
next unless tags[i]
wttc[tags[i]] ||= {}
wttc[tags[i]][tag_set] = category
end
end; return wttc
end
end

View File

@ -1,221 +0,0 @@
{tag_sets: [
:claws_c5, :brown, :penn, :stutgart, :chinese, :paris7
],
phrase_tags: [
'Adjectival phrase', ['', '', 'ADJP', '', '', 'AP'],
'Adverbial phrase', ['', '', 'ADVP', '', '', 'AdP'],
'Conjunction phrase', ['', '', 'CONJP', '', '', 'Ssub'],
'Fragment', ['', '', 'FRAG', '', '', ''],
'Interjectional phrase', ['', '', 'INTJ', '', '', ''],
'List marker', ['', '', 'LST', '', '', ''],
'Not a phrase', ['', '', 'NAC', '', '', ''],
'Noun phrase', ['', '', 'NP', '', '', 'NP'],
'Verbal nucleus', ['', '', '', '', '', 'VN'],
'Head of noun phrase', ['', '', 'NX', '', '', ''],
'Prepositional phrase', ['', '', 'PP', '', '', 'PP'],
'Parenthetical', ['', '', 'PRN', '', '', ''],
'Particle', ['', '', 'PRT', '', '', ''],
'Participial phrase', ['', '', '', '', '', 'VPart'],
'Quantifier phrase', ['', '', 'QP', '', '', ''],
'Relative clause', ['', '', 'RRC', '', '', 'Srel'],
'Coordinated phrase', ['', '', 'UCP', '', '', 'COORD'],
'Infinitival phrase', ['', '', '', '', '', 'VPinf'],
'Verb phrase', ['', '', 'VP', '', '', ''],
'Wh adjective phrase', ['', '', 'WHADJP', '', '', ''],
'Wh adverb phrase', ['', '', 'WHAVP', '', '', ''],
'Wh noun phrase', ['', '', 'WHNP', '', '', ''],
'Wh prepositional phrase', ['', '', 'WHPP', '', '', ''],
'Unknown', ['', '', 'X', '', '', ''],
'Phrase', ['', '', 'P', '', '', 'Sint'],
'Sentence', ['', '', 'S', '', '', 'SENT'],
'Phrase', ['', '', 'SBAR', '', '', ''] # Fix
],
word_tags: [
# Aligned tags for the Claws C5, Brown and Penn tag sets.
# Adapted from Manning, Christopher and Schütze, Hinrich,
# 1999. Foundations of Statistical Natural Language
# Processing. MIT Press, p. 141-142;
# http://www.isocat.org/rest/dcs/376;
'Adjective', ['AJ0', 'JJ', 'JJ', '', 'JJ', 'A'],
'Adjective', ['AJ0', 'JJ', 'JJ', '', 'JJ', 'ADJ'],
'Ajective, adverbial or predicative', ['', '', '', 'ADJD', '', 'ADJ'],
'Adjective, attribute', ['', '', '', 'ADJA', 'VA', 'ADJ'],
'Adjective, ordinal number', ['ORD', 'OD', 'JJ', '', 'OD', 'ADJ'],
'Adjective, comparative', ['AJC', 'JJR', 'JJR', 'KOKOM', '', 'ADJ'],
'Adjective, superlative', ['AJS', 'JJT', 'JJS', '', 'JJ', 'ADJ'],
'Adjective, superlative, semantically', ['AJ0', 'JJS', 'JJ', '', '', 'ADJ'],
'Adjective, cardinal number', ['CRD', 'CD', 'CD', 'CARD', 'CD', 'ADJ'],
'Adjective, cardinal number, one', ['PNI', 'CD', 'CD', 'CARD', 'CD', 'ADJ'],
'Adverb', ['AV0', 'RB', 'RB', 'ADV', 'AD', 'ADV'],
'Adverb, negative', ['XX0', '*', 'RB', 'PTKNEG', '', 'ADV'],
'Adverb, comparative', ['AV0', 'RBR', 'RBR', '', 'AD', 'ADV'],
'Adverb, superlative', ['AV0', 'RBT', 'RBS', '', 'AD', 'ADV'],
'Adverb, particle', ['AVP', 'RP', 'RP', '', '', 'ADV'],
'Adverb, question', ['AVQ', 'WRB', 'WRB', '', 'AD', 'ADV'],
'Adverb, degree & question', ['AVQ', 'WQL', 'WRB', '', 'ADV'],
'Adverb, degree', ['AV0', 'QL', 'RB', '', '', 'ADV'],
'Adverb, degree, postposed', ['AV0', 'QLP', 'RB', '', '', 'ADV'],
'Adverb, nominal', ['AV0', 'RN', 'RB', 'PROP', '', 'ADV'],
'Adverb, pronominal', ['', '', '', '', 'PROP', '', 'ADV'],
'Conjunction, coordination', ['CJC', 'CC', 'CC', 'KON', 'CC', 'COOD'],
'Conjunction, coordination, and', ['CJC', 'CC', 'CC', 'KON', 'CC', 'ET'],
'Conjunction, subordination', ['CJS', 'CS', 'IN', 'KOUS', 'CS', 'CONJ'],
'Conjunction, subordination with to and infinitive', ['', '', '', 'KOUI', '', ''],
'Conjunction, complementizer, that', ['CJT', 'CS', 'IN', '', '', 'C'],
'Determiner', ['DT0', 'DT', 'DT', '', 'DT', 'D'],
'Determiner, pronoun', ['DT0', 'DTI', 'DT', '', '', 'D'],
'Determiner, pronoun, plural', ['DT0', 'DTS', 'DT', '', '', 'D'],
'Determiner, prequalifier', ['DT0', 'ABL', 'DT', '', '', 'D'],
'Determiner, prequantifier', ['DT0', 'ABN', 'PDT', '', 'DT', 'D'],
'Determiner, pronoun or double conjunction', ['DT0', 'ABX', 'PDT', '', '', 'D'],
'Determiner, pronoun or double conjunction', ['DT0', 'DTX', 'DT', '', '', 'D'],
'Determiner, article', ['AT0', 'AT', 'DT', 'ART', '', 'D'],
'Determiner, postdeterminer', ['DT0', 'AP', 'DT', '', '', 'D'],
'Determiner, possessive', ['DPS', 'PP$', 'PRP$', '', '', 'D'],
'Determiner, possessive, second', ['DPS', 'PP$', 'PRPS', '', '', 'D'],
'Determiner, question', ['DTQ', 'WDT', 'WDT', '', 'DT', 'D'],
'Determiner, possessive & question', ['DTQ', 'WP$', 'WP$', '', '', 'D'],
'Interjection', ['', '', '', '', '', 'I'],
'Localizer', ['', '', '', '', 'LC'],
'Measure word', ['', '', '', '', 'M'],
'Noun, common', ['NN0', 'NN', 'NN', 'N', 'NN', 'NN'],
'Noun, singular', ['NN1', 'NN', 'NN', 'NN', 'NN', 'N'],
'Noun, plural', ['NN2', 'NNS', 'NNS', 'NN', 'NN', 'N'],
'Noun, proper, singular', ['NP0', 'NP', 'NNP', 'NE', 'NR', 'N'],
'Noun, proper, plural', ['NP0', 'NPS', 'NNPS', 'NE', 'NR', 'N'],
'Noun, adverbial', ['NN0', 'NR', 'NN', 'NE', '', 'N'],
'Noun, adverbial, plural', ['NN2', 'NRS', 'NNS', '', 'N'],
'Noun, temporal', ['', '', '', '', 'NT', 'N'],
'Noun, verbal', ['', '', '', '', 'NN', 'N'],
'Pronoun, nominal (indefinite)', ['PNI', 'PN', 'PRP', '', 'PN', 'CL'],
'Pronoun, personal, subject', ['PNP', 'PPSS', 'PRP', 'PPER'],
'Pronoun, personal, subject, 3SG', ['PNP', 'PPS', 'PRP', 'PPER'],
'Pronoun, personal, object', ['PNP', 'PPO', 'PRP', 'PPER'],
'Pronoun, reflexive', ['PNX', 'PPL', 'PRP', 'PRF'],
'Pronoun, reflexive, plural', ['PNX', 'PPLS', 'PRP', 'PRF'],
'Pronoun, question, subject', ['PNQ', 'WPS', 'WP', 'PWAV'],
'Pronoun, question, subject', ['PNQ', 'WPS', 'WPS', 'PWAV'], # Hack
'Pronoun, question, object', ['PNQ', 'WPO', 'WP', 'PWAV', 'PWAT'],
'Pronoun, existential there', ['EX0', 'EX', 'EX'],
'Pronoun, attributive demonstrative', ['', '', '', 'PDAT'],
'Prounoun, attributive indefinite without determiner', ['', '', '', 'PIAT'],
'Pronoun, attributive possessive', ['', '', '', 'PPOSAT', ''],
'Pronoun, substituting demonstrative', ['', '', '', 'PDS'],
'Pronoun, substituting possessive', ['', '', '', 'PPOSS', ''],
'Prounoun, substituting indefinite', ['', '', '', 'PIS'],
'Pronoun, attributive relative', ['', '', '', 'PRELAT', ''],
'Pronoun, substituting relative', ['', '', '', 'PRELS', ''],
'Pronoun, attributive interrogative', ['', '', '', 'PWAT'],
'Pronoun, adverbial interrogative', ['', '', '', 'PWAV'],
'Pronoun, substituting interrogative', ['', '', '', 'PWS'],
'Verb, main, finite', ['', '', '', 'VVFIN', '', 'V'],
'Verb, main, infinitive', ['', '', '', 'VVINF', '', 'V'],
'Verb, main, imperative', ['', '', '', 'VVIMP', '', 'V'],
'Verb, base present form (not infinitive)', ['VVB', 'VB', 'VBP', '', '', 'V'],
'Verb, infinitive', ['VVI', 'VB', 'VB', 'V', '', 'V'],
'Verb, past tense', ['VVD', 'VBD', 'VBD', '', '', 'V'],
'Verb, present participle', ['VVG', 'VBG', 'VBG', 'VAPP', '', 'V'],
'Verb, past/passive participle', ['VVN', 'VBN', 'VBN', 'VVPP', '', 'V'],
'Verb, present, 3SG, -s form', ['VVZ', 'VBZ', 'VBZ', '', '', 'V'],
'Verb, auxiliary', ['', '', '', 'VAFIN', '', 'V'],
'Verb, imperative', ['', '', '', 'VAIMP', '', 'V'],
'Verb, imperative infinitive', ['', '', '', 'VAINF', '', 'V'],
'Verb, auxiliary do, base', ['VDB', 'DO', 'VBP', '', '', 'V'],
'Verb, auxiliary do, infinitive', ['VDB', 'DO', 'VB', '', '', 'V'],
'Verb, auxiliary do, past', ['VDD', 'DOD', 'VBD', '', '', 'V'],
'Verb, auxiliary do, present participle', ['VDG', 'VBG', 'VBG', '', '', 'V'],
'Verb, auxiliary do, past participle', ['VDN', 'VBN', 'VBN', '', '', 'V'],
'Verb, auxiliary do, present 3SG', ['VDZ', 'DOZ', 'VBZ', '', '', 'V'],
'Verb, auxiliary have, base', ['VHB', 'HV', 'VBP', 'VA', '', 'V'],
'Verb, auxiliary have, infinitive', ['VHI', 'HV', 'VB', 'VAINF', '', 'V'],
'Verb, auxiliary have, past', ['VHD', 'HVD', 'VBD', 'VA', '', 'V'],
'Verb, auxiliary have, present participle', ['VHG', 'HVG', 'VBG', 'VA', '', 'V'],
'Verb, auxiliary have, past participle', ['VHN', 'HVN', 'VBN', 'VAPP', '', 'V'],
'Verb, auxiliary have, present 3SG', ['VHZ', 'HVZ', 'VBZ', 'VA', '', 'V'],
'Verb, auxiliary be, infinitive', ['VBI', 'BE', 'VB', '', '', 'V'],
'Verb, auxiliary be, past', ['VBD', 'BED', 'VBD', '', '', 'V'],
'Verb, auxiliary be, past, 3SG', ['VBD', 'BEDZ', 'VBD', '', '', 'V'],
'Verb, auxiliary be, present participle', ['VBG', 'BEG', 'VBG', '', '', 'V'],
'Verb, auxiliary be, past participle', ['VBN', 'BEN', 'VBN', '', '', 'V'],
'Verb, auxiliary be, present, 3SG', ['VBZ', 'BEZ', 'VBZ', '', '', 'V'],
'Verb, auxiliary be, present, 1SG', ['VBB', 'BEM', 'VBP', '', '', 'V'],
'Verb, auxiliary be, present', ['VBB', 'BER', 'VBP', '', '', 'V'],
'Verb, modal', ['VM0', 'MD', 'MD', 'VMFIN', 'VV', 'V'],
'Verb, modal', ['VM0', 'MD', 'MD', 'VMINF', 'VV', 'V'],
'Verb, modal, finite', ['', '', '', '', 'VMFIN', 'V'],
'Verb, modal, infinite', ['', '', '', '', 'VMINF', 'V'],
'Verb, modal, past participle', ['', '', '', '', 'VMPP', 'V'],
'Particle', ['', '', '', '', '', 'PRT'],
'Particle, with adverb', ['', '', '', 'PTKA', '', 'PRT'],
'Particle, answer', ['', '', '', 'PTKANT', '', 'PRT'],
'Particle, negation', ['', '', '', 'PTKNEG', '', 'PRT'],
'Particle, separated verb', ['', '', '', 'PTKVZ', '', 'PRT'],
'Particle, to as infinitive marker', ['TO0', 'TO', 'TO', 'PTKZU', '', 'PRT'],
'Preposition, comparative', ['', '', '', 'KOKOM', '', 'P'],
'Preposition, to', ['PRP', 'IN', 'TO', '', '', 'P'],
'Preposition', ['PRP', 'IN', 'IN', 'APPR', 'P', 'P'],
'Preposition, with aritcle', ['', '', '', 'APPART', '', 'P'],
'Preposition, of', ['PRF', 'IN', 'IN', '', '', 'P'],
'Possessive', ['POS', '$', 'POS'],
'Postposition', ['', '', '', 'APPO'],
'Circumposition, right', ['', '', '', 'APZR', ''],
'Interjection, onomatopoeia or other isolate', ['ITJ', 'UH', 'UH', 'ITJ', 'IJ'],
'Onomatopoeia', ['', '', '', '', 'ON'],
'Punctuation', ['', '', '', '', 'PU', 'PN'],
'Punctuation, sentence ender', ['PUN', '.', '.', '', '', 'PN'],
'Punctuation, semicolon', ['PUN', '.', '.', '', '', 'PN'],
'Puncutation, colon or ellipsis', ['PUN', ':', ':'],
'Punctuationm, comma', ['PUN', ',', ',', '$,'],
'Punctuation, dash', ['PUN', '-', '-'],
'Punctuation, dollar sign', ['PUN', '', '$'],
'Punctuation, left bracket', ['PUL', '(', '(', '$('],
'Punctuation, right bracket', ['PUR', ')', ')'],
'Punctuation, quotation mark, left', ['PUQ', '', '``'],
'Punctuation, quotation mark, right', ['PUQ', '', '"'],
'Punctuation, left bracket', ['PUL', '(', 'PPL'],
'Punctuation, right bracket', ['PUR', ')', 'PPR'],
'Punctuation, left square bracket', ['PUL', '(', 'LSB'],
'Punctuation, right square bracket', ['PUR', ')', 'RSB'],
'Punctuation, left curly bracket', ['PUL', '(', 'LCB'],
'Punctuation, right curly bracket', ['PUR', ')', 'RCB'],
'Unknown, foreign words (not in lexicon)', ['UNZ', '(FW-)', 'FW', '', 'FW'],
'Symbol', ['', '', 'SYM', 'XY'],
'Symbol, alphabetical', ['ZZ0', '', ''],
'Symbol, list item', ['', '', 'LS'],
# Not sure about these tags from the Chinese PTB.
'Aspect marker', ['', '', '', '', 'AS'], # ?
'Ba-construction', ['', '', '', '', 'BA'], # ?
'In relative', ['', '', '', '', 'DEC'], # ?
'Associative', ['', '', '', '', 'DER'], # ?
'In V-de or V-de-R construct', ['', '', '', '', 'DER'], # ?
'For words ? ', ['', '', '', '', 'ETC'], # ?
'In long bei-construct', ['', '', '', '', 'LB'], # ?
'In short bei-construct', ['', '', '', '', 'SB'], # ?
'Sentence-nal particle', ['', '', '', '', 'SB'], # ?
'Particle, other', ['', '', '', '', 'MSP'], # ?
'Before VP', ['', '', '', '', 'DEV'], # ?
'Verb, ? as main verb', ['', '', '', '', 'VE'], # ?
'Verb, ????', ['', '', '', '', 'VC'] # ?
]}

View File

@ -1,71 +0,0 @@
{cat_to_category: {
'ADJ' => 'adjective',
'ADV' => 'adverb',
'CONJ' => 'conjunction',
'COOD' => 'conjunction',
'C' => 'complementizer',
'D' => 'determiner',
'N' => 'noun',
'P' => 'preposition',
'PN' => 'punctuation',
'SC' => 'conjunction',
'V' => 'verb',
'PRT' => 'particle'
},
cat_to_description: [
['ADJ', 'Adjective'],
['ADV', 'Adverb'],
['CONJ', 'Coordination conjunction'],
['C', 'Complementizer'],
['D', 'Determiner'],
['N', 'Noun'],
['P', 'Preposition'],
['SC', 'Subordination conjunction'],
['V', 'Verb'],
['COOD', 'Part of coordination'],
['PN', 'Punctuation'],
['PRT', 'Particle'],
['S', 'Sentence']
],
xcat_to_description: [
['COOD', 'Coordinated phrase/clause'],
['IMP', 'Imperative sentence'],
['INV', 'Subject-verb inversion'],
['Q', 'Interrogative sentence with subject-verb inversion'],
['REL', 'A relativizer included'],
['FREL', 'A free relative included'],
['TRACE', 'A trace included'],
['WH', 'A wh-question word included']
],
xcat_to_ptb: [
['ADJP', '', 'ADJP'],
['ADJP', 'REL', 'WHADJP'],
['ADJP', 'FREL', 'WHADJP'],
['ADJP', 'WH', 'WHADJP'],
['ADVP', '', 'ADVP'],
['ADVP', 'REL', 'WHADVP'],
['ADVP', 'FREL', 'WHADVP'],
['ADVP', 'WH', 'WHADVP'],
['CONJP', '', 'CONJP'],
['CP', '', 'SBAR'],
['DP', '', 'NP'],
['NP', '', 'NP'],
['NX', 'NX', 'NAC'],
['NP' 'REL' 'WHNP'],
['NP' 'FREL' 'WHNP'],
['NP' 'WH' 'WHNP'],
['PP', '', 'PP'],
['PP', 'REL', 'WHPP'],
['PP', 'WH', 'WHPP'],
['PRT', '', 'PRT'],
['S', '', 'S'],
['S', 'INV', 'SINV'],
['S', 'Q', 'SQ'],
['S', 'REL', 'SBAR'],
['S', 'FREL', 'SBAR'],
['S', 'WH', 'SBARQ'],
['SCP', '', 'SBAR'],
['VP', '', 'VP'],
['VP', '', 'VP'],
['', '', 'UK']
]}

View File

@ -1,17 +0,0 @@
{tag_to_category: {
'C' => :complementizer,
'PN' => :punctuation,
'SC' => :conjunction
}
# Paris7 Treebank functional tags
=begin
SUJ (subject)
OBJ (direct object)
ATS (predicative complement of a subject)
ATO (predicative complement of a direct object)
MOD (modifier or adjunct)
A-OBJ (indirect complement introduced by à)
DE-OBJ (indirect complement introduced by de)
P-OBJ (indirect complement introduced by another preposition)
=end
}

View File

@ -1,15 +0,0 @@
{escape_characters: {
'(' => '-LRB-',
')' => '-RRB-',
'[' => '-LSB-',
']' => '-RSB-',
'{' => '-LCB-',
'}' => '-RCB-'
},
phrase_tag_to_description: [
['S', 'Paris7 declarative clause'],
['SBAR', 'Clause introduced by a (possibly empty) subordinating conjunction'],
['SBARQ', 'Direct question introduced by a wh-word or a wh-phrase'],
['SINV', 'Inverted declarative sentence'],
['SQ', 'Inverted yes/no question']
]}

View File

@ -1 +0,0 @@
[:extractors, :inflectors, :formatters, :learners, :lexicalizers, :processors, :retrievers]

View File

@ -1,5 +0,0 @@
# Contains the core classes used by Treat.
module Treat::Core
p = Treat.paths.lib + 'treat/core/*.rb'
Dir.glob(p).each { |f| require f }
end

View File

@ -1,108 +0,0 @@
# A DataSet contains an entity classification
# problem as well as data for entities that
# have already been classified, complete with
# references to these entities.
class Treat::Core::DataSet
# Used to serialize Procs.
silence_warnings do
require 'sourcify'
end
# The classification problem this
# data set holds data for.
attr_accessor :problem
# Items that have been already
# classified (training data).
attr_accessor :items
# References to the IDs of the
# original entities contained
# in the data set.
attr_accessor :entities
# Initialize the DataSet. Can be
# done with a Problem entity
# (thereby creating an empty set)
# or with a filename (representing
# a serialized data set which will
# then be deserialized and loaded).
def initialize(prob_or_file)
if prob_or_file.is_a?(String)
ds = self.class.
unserialize(prob_or_file)
@problem = ds.problem
@items = ds.items
@entities = ds.entities
else
@problem = prob_or_file
@items, @entities = [], []
end
end
# Add an entity to the data set.
# The entity's relevant features
# are calculated based on the
# classification problem, and a
# line with the results of the
# calculation is added to the
# data set, along with the ID
# of the entity.
def <<(entity)
@items << @problem.
export_item(entity)
@entities << entity.id
end
# Marshal the data set to the supplied
# file name. Marshal is used for speed;
# other serialization options may be
# provided in later versions. This
# method relies on the sourcify gem
# to transform Feature procs to strings,
# since procs/lambdas can't be serialized.
def serialize(file)
problem = @problem.dup
problem.features.each do |feature|
next unless feature.proc
feature.proc = feature.proc.to_source
end
data = [problem, @items, @entities]
File.open(file, 'w') do |f|
f.write(Marshal.dump(data))
end
problem.features.each do |feature|
next unless feature.proc
source = feature.proc[5..-1]
feature.proc = eval("Proc.new #{source}")
end
end
# Merge another data set into this one.
def merge(data_set)
if data_set.problem != @problem
raise Treat::Exception,
"Cannot merge two data sets that " +
"don't reference the same problem."
else
@items << data_set.items
@entities << data_set.entities
end
end
# Unserialize a data set file created
# by using the #serialize method.
def self.unserialize(file)
data = Marshal.load(File.binread(file))
problem, items, entities = *data
problem.features.each do |feature|
next unless feature.proc
source = feature.proc[5..-1]
feature.proc = eval("Proc.new #{source}")
end
data_set = Treat::Core::DataSet.new(problem)
data_set.items = items
data_set.entities = entities
data_set
end
end

23
lib/treat/core/dsl.rb Normal file
View File

@ -0,0 +1,23 @@
module Treat::Core::DSL
# Map all classes in Treat::Entities to
# a global builder function (entity, word,
# phrase, punctuation, symbol, list, etc.)
def self.included(base)
def method_missing(sym,*args,&block)
@@entities ||= Treat.core.entities.list
@@learning ||= Treat.core.learning.list
if @@entities.include?(sym)
klass = Treat::Entities.const_get(sym.cc)
return klass.build(*args)
elsif @@learning.include?(sym)
klass = Treat::Learning.const_get(sym.cc)
return klass.new(*args)
else
super(sym,*args,&block)
raise "Uncaught method ended up in Treat DSL."
end
end
end
end

View File

@ -1,42 +0,0 @@
# Represents a feature to be used
# in a classification task.
class Treat::Core::Feature
# The name of the feature. If no
# proc is supplied, this assumes
# that the target of your classification
# problem responds to the method
# corresponding to this name.
attr_reader :name
# A proc that can be used to perform
# calculations before storing a feature.
attr_accessor :proc
# The default value to be
attr_reader :default
# Initialize a feature for a classification
# problem. If two arguments are supplied,
# the second argument is assumed to be the
# default value. If three arguments are
# supplied, the second argument is the
# callback to generate the feature, and
# the third one is the default value.
def initialize(name, proc_or_default = nil, default = nil)
@name = name
if proc_or_default.is_a?(Proc)
@proc, @default =
proc_or_default, default
else
@proc = nil
@default = proc_or_default
end
end
# Custom comparison operator for features.
def ==(feature)
@name == feature.name &&
@proc == feature.proc &&
@default == feature.default
end
end

View File

@ -1,17 +1,11 @@
# A dependency manager for Treat language plugins.
# It can be called by using Treat.install(language).
module Treat::Installer
# Usage: Treat::Installer.install('language')
module Treat::Core::Installer
# Require the Rubygem dependency installer.
silence_warnings do
require 'rubygems/dependency_installer'
end
require 'treat/version'
require 'schiphol'
# Address of the server with the files.
Server = 'www.louismullie.com'
Server = 's3.amazonaws.com/static-public-assets'
# Filenames for the Stanford packages.
StanfordPackages = {
@ -26,29 +20,34 @@ module Treat::Installer
:bin => File.absolute_path(Treat.paths.bin),
:models => File.absolute_path(Treat.paths.models)
}
# Install required dependencies and optional
# dependencies for a specific language.
def self.install(language = 'english')
# Require the Rubygem dependency installer.
silence_warnings do
require 'rubygems/dependency_installer'
end
@@installer = Gem::DependencyInstaller.new
if language == 'travis'
install_travis; return
end
l = "#{language.to_s.capitalize} language"
puts "\nTreat Installer, v. #{Treat::VERSION.to_s}\n\n"
begin
title "Installing core dependencies."
install_language_dependencies('agnostic')
title "Installing dependencies for the #{l}.\n"
install_language_dependencies(language)
# If gem is installed only, download models.
begin
Gem::Specification.find_by_name('punkt-segmenter')
@ -74,7 +73,7 @@ module Treat::Installer
end
end
# Minimal install for Travis CI.
def self.install_travis
install_language_dependencies(:agnostic)
@ -82,7 +81,7 @@ module Treat::Installer
download_stanford(:minimal)
download_punkt_models(:english)
end
def self.install_language_dependencies(language)
dependencies = Treat.languages[language].dependencies
@ -93,31 +92,31 @@ module Treat::Installer
end
def self.download_stanford(package = :minimal)
f = StanfordPackages[package]
url = "http://#{Server}/treat/#{f}"
loc = Schiphol.download(url,
loc = Schiphol.download(url,
download_folder: Treat.paths.tmp
)
puts "- Unzipping package ..."
dest = File.join(Treat.paths.tmp, 'stanford')
unzip_stanford(loc, dest)
model_dir = File.join(Paths[:models], 'stanford')
bin_dir = File.join(Paths[:bin], 'stanford')
origin = File.join(Paths[:tmp], 'stanford')
# Mac hidden files fix.
mac_remove = File.join(dest, '__MACOSX')
if File.readable?(mac_remove)
FileUtils.rm_rf(mac_remove)
end
unless File.readable?(bin_dir)
puts "- Creating directory bin/stanford ..."
FileUtils.mkdir_p(bin_dir)
end
unless File.readable?(model_dir)
puts "- Creating directory models/stanford ..."
FileUtils.mkdir_p(model_dir)
@ -128,18 +127,18 @@ module Treat::Installer
Dir.glob(File.join(origin, '*')) do |f|
next if ['.', '..'].include?(f)
if f.index('jar')
FileUtils.cp(f, File.join(Paths[:bin],
FileUtils.cp(f, File.join(Paths[:bin],
'stanford', File.basename(f)))
elsif FileTest.directory?(f)
FileUtils.cp_r(f, model_dir)
end
end
puts "- Cleaning up..."
FileUtils.rm_rf(origin)
'Done.'
end
def self.download_punkt_models(language)
@ -147,7 +146,7 @@ module Treat::Installer
f = "#{language}.yaml"
dest = "#{Treat.paths.models}punkt/"
url = "http://#{Server}/treat/punkt/#{f}"
loc = Schiphol.download(url,
loc = Schiphol.download(url,
download_folder: Treat.paths.tmp
)
unless File.readable?(dest)
@ -157,7 +156,7 @@ module Treat::Installer
puts "- Copying model file to models/punkt ..."
FileUtils.cp(loc, File.join(Paths[:models], 'punkt', f))
puts "- Cleaning up..."
FileUtils.rm_rf(Paths[:tmp] + Server)
@ -182,12 +181,11 @@ module Treat::Installer
begin
puts "Installing #{dependency}...\n"
@@installer.install(dependency)
rescue Exception => error
raise
puts "Couldn't install gem '#{dependency}' " +
"(#{error.message})."
rescue Gem::InstallError => error
puts "Warning: couldn't install " +
"gem '#{dependency}' (#{error.message})."
end
end
# Unzip a file to the destination path.
@ -195,7 +193,7 @@ module Treat::Installer
require 'zip/zip'
f_path = ''
Zip::ZipFile.open(file) do |zip_file|
zip_file.each do |f|
f_path = File.join(destination, f.name)

View File

@ -1,251 +0,0 @@
# This module provides an abstract tree structure.
module Treat::Core
# This class is a node for an N-ary tree data structure
# with a unique identifier, text value, children, features
# (annotations) and dependencies.
#
# This class was partly based on the 'rubytree' gem.
# RubyTree is licensed under the BSD license and can
# be found at http://rubytree.rubyforge.org/rdoc/.
# I have made several modifications in order to better
# suit this library and to avoid ugly monkey patching.
class Node
# A string containing the node's value (or empty).
attr_accessor :value
# A unique identifier for the node.
attr_reader :id
# An array containing the children of this node.
attr_reader :children
# A hash containing the features of this node.
attr_accessor :features
# An array containing the dependencies that link this
# node to other nodes.
attr_accessor :dependencies
# A struct for dependencies. # Fix
Struct.new('Dependency',
:target, :type, :directed, :direction)
# The parent of the node.
attr_accessor :parent
# Initialize the node with its value and id.
# Setup containers for the children, features
# and dependencies of this node.
def initialize(value, id = nil)
@parent = nil
@value, @id = value, id
@children = []
@children_hash = {}
@features = {}
@dependencies = []
end
# Iterate over each children in the node.
# Non-recursive.
def each
@children.each { |child| yield child }
end
# Boolean - does the node have dependencies?
def has_dependencies?; !(@dependencies.size == 0); end
# Boolean - does the node have children?
def has_children?; !(@children.size == 0); end
# Boolean - does the node have a parent?
def has_parent?; !@parent.nil?; end
# Boolean - does the node have features?
def has_features?; !(@features.size == 0); end
# Does the entity have a feature ?
def has_feature?(feature); @features.has_key?(feature); end
# Boolean - does the node not have a parent?
def is_root?; @parent.nil?; end
# Remove this node from its parent and set as root.
def set_as_root!; @parent = nil; self; end
# Boolean - is this node a leaf ?
# This is overriden in leaf classes.
def is_leaf?; !has_children?; end
# Add the nodes to the given child.
# This may be used with several nodes,
# for example: node << [child1, child2, child3]
def <<(nodes)
nodes = [nodes] unless nodes.is_a? Array
if nodes.include?(nil)
raise Treat::Exception,
'Trying to add a nil node.'
end
nodes.each do |node|
node.parent = self
@children << node
@children_hash[node.id] = node
end
nodes[0]
end
# Retrieve a child node by name or index.
def [](name_or_index)
if name_or_index == nil
raise Treat::Exception,
'Non-nil name or index needs to be provided.'
end
if name_or_index.kind_of?(Integer) &&
name_or_index < 1000
@children[name_or_index]
else
@children_hash[name_or_index]
end
end
# Remove the supplied node or id of a
# node from the children.
def remove!(ion)
return nil unless ion
if ion.is_a? Treat::Core::Node
@children.delete(ion)
@children_hash.delete(ion.id)
ion.set_as_root!
else
@children.delete(@children_hash[ion])
@children_hash.delete(ion)
end
end
# Remove all children.
def remove_all!
@children.each do |child|
child.set_as_root!
end
@children = []
@children_hash = {}
self
end
# Return the sibling with position #pos
# versus this one.
# #pos can be ... -1, 0, 1, ...
def sibling(pos)
return nil if is_root?
id = @parent.children.index(self)
@parent.children.at(id + pos)
end
# Return the sibling N positions to
# the left of this one.
def left(n = 1); sibling(-1*n); end
alias :previous_sibling :left
# Return the sibling N positions to the
# right of this one.
def right(n = 1); sibling(1*n); end
alias :next_sibling :right
# Return all brothers and sisters of this node.
def siblings
r = @parent.children.dup
r.delete(self)
r
end
# Total number of nodes in the subtree,
# including this one.
def size
@children.inject(1) do |sum, node|
sum += node.size
end
end
# Set the feature to the supplied value.
def set(feature, value)
@features ||= {}
@features[feature] = value
end
# Return a feature.
def get(feature)
return @value if feature == :value
return @id if feature == :id
@features[feature]
end
# Unset a feature.
def unset(*features)
if features.size == 1
@features.delete(features[0])
else
features.each do |feature|
@features.delete(feature)
end
end
end
# Return the depth of this node in the tree.
def depth
return 0 if is_root?
1 + parent.depth
end
alias :has? :has_feature?
# Link this node to the target node with
# the supplied dependency type.
def link(id_or_node, type = nil,
directed = true, direction = 1)
if id_or_node.is_a?(Treat::Core::Node)
id = root.find(id_or_node).id
else
id = id_or_node
end
@dependencies.each do |d|
return if d.target == id
end
@dependencies <<
Struct::Dependency.new(
id, type,
directed, direction
)
end
# Find the node in the tree with the given id.
def find(id_or_node)
if id_or_node.is_a?(Treat::Core::Node)
id = id_or_node.id
else
id = id_or_node
end
if @children_hash[id]
return @children_hash[id]
end
self.each do |child|
r = child.find(id)
return r if r.is_a? Treat::Core::Node
end
nil
end
# Find the root of the tree within which
# this node is contained.
def root
return self if !has_parent?
ancestor = @parent
while ancestor.has_parent?
ancestor = ancestor.parent
end
ancestor
end
end
end

View File

@ -1,49 +0,0 @@
# Defines a classification problem.
# - What question are we trying to answer?
# - What features are we going to look at
# to attempt to answer that question?
class Treat::Core::Problem
# The question we are trying to answer.
attr_reader :question
# An array of features that will be
# looked at in trying to answer the
# problem's question.
attr_reader :features
# Just the labels from the features.
attr_reader :labels
# Initialize the problem with a question
# and an arbitrary number of features.
def initialize(question, *features)
@question = question
@features = features
@labels = @features.map { |f| f.name }
end
# Custom comparison for problems.
def ==(problem)
@question == problem.question &&
@features == problem.features
end
# Return an array of all the entity's
# features, as defined by the problem.
# If include_answer is set to true, will
# append the answer to the problem after
# all of the features.
def export_item(e, include_answer = true)
line = []
@features.each do |feature|
r = feature.proc ?
feature.proc.call(e) :
e.send(feature.name)
line << (r || feature.default)
end
return line unless include_answer
line << (e.has?(@question.name) ?
e.get(@question.name) : @question.default)
line
end
end

41
lib/treat/core/server.rb Normal file
View File

@ -0,0 +1,41 @@
class Treat::Core::Server
# Refer to http://rack.rubyforge.org/doc/classes/Rack/Server.html
# for possible options to configure.
def initialize(handler = 'thin', options = {})
raise "Implementation not finished."
require 'json'; require 'rack'
@handler, @options = handler.capitalize, options
end
def start
handler = Rack::Handler.const_get(@handler)
handler.run(self, @options)
end
def call(env)
headers = { 'content-type' => 'application/json' }
rack_input = env["rack.input"].read
if rack_input.strip == ''
return [500, headers, {
'error' => 'Empty JSON request.'
}]
end
rack_json = JSON.parse(rack_input)
unless rack_json['type'] &&
rack_json['value'] && rack_json['do']
return [500, headers, {
'error' => 'Must specify "type", "value" and "do".'
}]
end
if rack_json['conf']
# Set the configuration.
end
method = rack_json['type'].capitalize.intern
resp = send(method, rack_json[value]).do(rack_json['do'])
response = [rack_input.to_json]
[200, headers, response]
end
end

View File

@ -1,6 +0,0 @@
# Contains the textual model used by Treat.
module Treat::Entities
require 'treat/entities/entity'
p = Treat.paths.lib + 'treat/entities/*.rb'
Dir.glob(p).each { |f| require f }
end

View File

@ -1,47 +0,0 @@
module Treat::Entities::Abilities::Copyable
require 'fileutils'
# What happens when it is a database-stored
# collection or document ?
def copy_into(collection)
unless collection.is_a?(
Treat::Entities::Collection)
raise Treat::Exception,
"Cannot copy an entity into " +
"something else than a collection."
end
if type == :document
copy_document_into(collection)
elsif type == :collection
copy_collection_into(collection)
else
raise Treat::Exception,
"Can only copy a document " +
"or collection into a collection."
end
end
def copy_collection_into(collection)
copy = dup
f = File.dirname(folder)
f = f.split(File::SEPARATOR)[-1]
f = File.join(collection.folder, f)
FileUtils.mkdir(f) unless
FileTest.directory(f)
FileUtils.cp_r(folder, f)
copy.set :folder, f
copy
end
def copy_document_into(collection)
copy = dup
return copy unless file
f = File.basename(file)
f = File.join(collection.folder, f)
FileUtils.cp(file, f)
copy.set :file, f
copy
end
end

View File

@ -1,83 +0,0 @@
# When Treat.debug is set to true, each call to
# #call_worker will result in a debug message being
# printed by the #print_debug function.
module Treat::Entities::Abilities::Debuggable
@@prev = nil
@@i = 0
# Explains what Treat is currently doing.
def print_debug(entity, task, worker, group, options)
targs = group.targets.map do |target|
target.to_s
end
if targs.size == 1
t = targs[0]
else
t = targs[0..-2].join(', ') +
' and/or ' + targs[-1]
end
genitive = targs.size > 1 ?
'their' : 'its'
doing = ''
human_task = task.to_s.gsub('_', ' ')
if group.type == :transformer ||
group.type == :computer
tt = human_task
tt = tt[0..-2] if tt[-1] == 'e'
ed = tt[-1] == 'd' ? '' : 'ed'
doing = "#{tt.capitalize}#{ed} #{t}"
elsif group.type == :annotator
if group.preset_option
opt = options[group.preset_option]
form = opt.to_s.gsub('_', ' ')
human_task[-1] = ''
human_task = form + ' ' + human_task
end
doing = "Annotated #{t} with " +
"#{genitive} #{human_task}"
end
if group.to_s.index('Formatters')
curr = doing +
' in format ' +
worker.to_s
else
curr = doing +
' using ' +
worker.to_s.gsub('_', ' ')
end
curr.gsub!('ss', 's') unless curr.index('class')
curr += '.'
if curr == @@prev
@@i += 1
else
if @@i > 1
Treat.core.entities.list.each do |e|
@@prev.gsub!(e.to_s, e.to_s + 's')
end
@@prev.gsub!('its', 'their')
@@prev = @@prev.split(' ').
insert(1, @@i.to_s).join(' ')
end
@@i = 0
puts @@prev # Last call doesn't get shown.
end
@@prev = curr
end
end

View File

@ -1,46 +0,0 @@
# Registers occurences of textual values inside
# all children entity. Useful to calculate frequency.
module Treat::Entities::Abilities::Registrable
# Registers a token in the @registry hash.
def register(entity)
unless @registry
@count = 0
@registry = {
:value => {},
:position => {},
:type => {},
:id => {}
}
end
if entity.is_a?(Treat::Entities::Token) ||
entity.is_a?(Treat::Entities::Phrase)
val = entity.to_s.downcase
@registry[:value][val] ||= 0
@registry[:value][val] += 1
end
@registry[:id][entity.id] = true
@registry[:type][entity.type] ||= 0
@registry[:type][entity.type] += 1
@registry[:position][entity.id] = @count
@count += 1
@parent.register(entity) if has_parent?
end
# Backtrack up the tree to find a token registry,
# by default the one in the root node of any entity.
def registry(type = nil)
if has_parent? &&
type != self.type
@parent.registry(type)
else
@registry
end
end
end

View File

@ -1,36 +0,0 @@
module Treat::Entities
# Represents a collection of texts.
class Collection < Entity
# Initialize the collection with a folder
# containing the texts of the collection.
def initialize(folder = nil, id = nil)
super('', id)
if folder && !FileTest.directory?(folder)
FileUtils.mkdir(folder)
end
set :folder, folder if folder
i = folder + '/.index'
set :index, i if FileTest.directory?(i)
end
# Works like the default <<, but if the
# file being added is a collection or a
# document, then copy that collection or
# document into this collection's folder.
def <<(entities, copy = true)
unless entities.is_a? Array
entities = [entities]
end
entities.each do |entity|
if [:document, :collection].
include?(entity.type) && copy &&
@features[:folder] != nil
entity = entity.copy_into(self)
end
end
super(entities)
end
end
end

View File

@ -1,10 +0,0 @@
module Treat::Entities
# Represents a document.
class Document < Entity
# Initialize a document with a file name.
def initialize(file = nil, id = nil)
super('', id)
set :file, file
end
end
end

101
lib/treat/entities/entities.rb Executable file
View File

@ -0,0 +1,101 @@
module Treat::Entities
# * Collection and document classes * #
# Represents a collection.
class Collection < Entity; end
# Represents a document.
class Document < Entity; end
# * Sections and related classes * #
# Represents a section.
class Section < Entity; end
# Represents a page of text.
class Page < Section; end
# Represents a block of text
class Block < Section; end
# Represents a list.
class List < Section; end
# * Zones and related classes * #
# Represents a zone of text.
class Zone < Entity; end
# Represents a title, subtitle,
# logical header of a text.
class Title < Zone; end
# Represents a paragraph (group
# of sentences and/or phrases).
class Paragraph < Zone; end
# * Groups and related classes * #
# Represents a group of tokens.
class Group < Entity; end
# Represents a group of words
# with a sentence ender (.!?)
class Sentence < Group; end
# Represents a group of words,
# with no sentence ender.
class Phrase < Group; end
# Represents a non-linguistic
# fragment (e.g. stray symbols).
class Fragment < Group; end
# * Tokens and related classes* #
# Represents a terminal element
# (leaf) in the text structure.
class Token < Entity; end
# Represents a word. Strictly,
# this is /^[[:alpha:]\-']+$/.
class Word < Token; end
# Represents an enclitic.
# Strictly, this is any of
# 'll 'm 're 's 't or 've.
class Enclitic < Token; end
# Represents a number. Strictly,
# this is /^#?([0-9]+)(\.[0-9]+)?$/.
class Number < Token
def to_i; to_s.to_i; end
def to_f; to_s.to_f; end
end
# Represents a punctuation sign.
# Strictly, this is /^[[:punct:]\$]+$/.
class Punctuation < Token; end
# Represents a character that is neither
# a word, an enclitic, a number or a
# punctuation character (e.g. @#$%&*).
class Symbol < Token; end
# Represents a url. This is (imperfectly)
# defined as /^(http|https):\/\/[a-z0-9]
# +([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}
# (([0-9]{1,5})?\/.*)?$/ix
class Url < Token; end
# Represents a valid RFC822 address.
# This is (imperfectly) defined as
# /.+\@.+\..+/ (fixme maybe?)
class Email < Token; end
# Represents a token whose type
# cannot be identified.
class Unknown; end
end

View File

@ -1,94 +1,106 @@
module Treat::Entities
module Abilities; end
# Require abilities.
p = Treat.paths.lib +
'treat/entities/abilities/*.rb'
Dir.glob(p).each { |f| require f }
# Basic tree structure.
require 'birch'
# The Entity class extends a basic tree structure
# (written in C for optimal speed) and represents
# any form of textual entityin a processing task
# (this could be a collection of documents, a
# single document, a single paragraph, etc.)
#
# Classes that extend Entity provide the concrete
# behavior corresponding to the relevant entity type.
# See entities.rb for a full list and description of
# the different entity types in the document model.
class Entity < ::Birch::Tree
class Entity < Treat::Core::Node
# A Symbol representing the lowercase
# version of the class name.
# A symbol representing the lowercase
# version of the class name. This is
# the only attribute that the Entity
# class adds to the Birch::Tree class.
attr_accessor :type
# Autoload all the classes in /abilities.
path = File.expand_path(__FILE__)
patt = File.dirname(path) + '/entity/*.rb'
Dir.glob(patt).each { |f| require f }
# Implements support for #register, #registry.
include Registrable
# Implements support for #register,
# #registry, and #contains_* methods.
include Abilities::Registrable
# Implement support for #self.call_worker, etc.
extend Delegatable
# Implement support for #self.add_workers
extend Abilities::Delegatable
# Implement support for #self.print_debug, etc.
extend Debuggable
# Implement support for #self.print_debug and
# #self.invalid_call_msg
extend Abilities::Debuggable
# Implement support for #self.build and #self.from_*
extend Buildable
# Implement support for #self.build
# and #self.from_*
extend Abilities::Buildable
# Implement support for #apply (previously #do).
include Applicable
# Implement support for #do.
include Abilities::Doable
# Implement support for #frequency, #frequency_in,
# #frequency_of, #position, #position_from_end, etc.
include Countable
# Implement support for #frequency,
# #frequency_in_parent and #position_in_parent.
include Abilities::Countable
# Implement support for #magic.
include Abilities::Magical
# Implement support for over 100 #magic methods!
include Magical
# Implement support for #to_s, #inspect, etc.
include Abilities::Stringable
include Stringable
# Implement support for #check_has
# and #check_hasnt_children?
include Abilities::Checkable
# Implement support for #check_has and others.
include Checkable
# Implement support for #each_entity, as well as
# #entities_with_type, #ancestors_with_type,
# #entities_with_feature, #entities_with_category.
include Abilities::Iterable
# #entities_with_feature, #entities_with_category, etc.
include Iterable
# Implement support for #export to export
# a line of a data set based on a classification.
include Abilities::Exportable
# Implement support for #copy_into.
include Abilities::Copyable
# Implement support for #export, allowing to export
# a data set row from the receiving entity.
include Exportable
# Implement support for #self.compare_with
extend Abilities::Comparable
extend Comparable
# Initialize the entity with its value and
# (optionally) a unique identifier. By default,
# the object_id will be used as id.
def initialize(value = '', id = nil)
id ||= object_id
super(value, id)
id ||= object_id; super(value, id)
@type = :entity if self == Entity
@type ||= ucc(cl(self.class)).intern
@type ||= self.class.mn.ucc.intern
end
# Add an entity to the current entity.
# Registers the entity in the root node
# token registry if the entity is a leaf.
#
# @see Treat::Registrable
# Unsets the parent node's value; in order
# to keep the tree clean, only the leaf
# values are stored.
#
# Takes in a single entity or an array of
# entities. Returns the first child supplied.
# If a string is
def <<(entities, clear_parent = true)
unless entities.is_a? Array
entities = [entities]
end
entities.each do |entity|
register(entity)
end
entities = (entities.is_a?(::String) ||
entities.is_a?(::Numeric)) ?
entities.to_entity : entities
entities = entities.is_a?(::Array) ?
entities : [entities]
# Register each entity in this node.
entities.each { |e| register(e) }
# Pass to the <<() method in Birch.
super(entities)
# Unset the parent value if necessary.
@parent.value = '' if has_parent?
entities[0]
# Return the first child.
return entities[0]
end
# Catch missing methods to support method-like
# access to features (e.g. entity.category
# instead of entity.features[:category]) and to
@ -102,29 +114,26 @@ module Treat::Entities
# sugar for the #self.build method.
def method_missing(sym, *args, &block)
return self.build(*args) if sym == nil
if !@features.has_key?(sym)
r = magic(sym, *args, &block)
return r unless r == :no_magic
begin
super(sym, *args, &block)
rescue NoMethodError
raise Treat::Exception,
if Treat::Workers.lookup(sym)
msg = "Method #{sym} cannot " +
"be called on a #{type}."
else
msg = "Method #{sym} does not exist."
msg += did_you_mean?(
Treat::Workers.methods, sym)
end
end
else
@features[sym]
end
return @features[sym] if @features.has_key?(sym)
result = magic(sym, *args, &block)
return result unless result == :no_magic
begin; super(sym, *args, &block)
rescue NoMethodError; invalid_call(sym); end
end
# Raises a Treat::Exception saying that the
# method called was invalid, and that the
# requested method does not exist. Also
# provides suggestions for misspellings.
def invalid_call(sym)
msg = Treat::Workers.lookup(sym) ?
"Method #{sym} can't be called on a #{type}." :
"Method #{sym} is not defined by Treat." +
Treat::Helpers::Help.did_you_mean?(
Treat::Workers.methods, sym)
raise Treat::Exception, msg
end
end
end

View File

@ -1,8 +1,8 @@
# Implement support for the functions #do and #do_task.
module Treat::Entities::Abilities::Doable
module Treat::Entities::Entity::Applicable
# Perform the supplied tasks on the entity.
def do(*tasks)
def apply(*tasks)
tasks.each do |task|
if task.is_a?(Hash)
@ -25,6 +25,8 @@ module Treat::Entities::Abilities::Doable
end
self
end
alias :do :apply
# Perform an individual task on an entity
# given a worker and options to pass to it.
@ -33,7 +35,7 @@ module Treat::Entities::Abilities::Doable
entity_types = group.targets
f = nil
entity_types.each do |t|
f = true if is_a?(Treat::Entities.const_get(cc(t)))
f = true if is_a?(Treat::Entities.const_get(t.cc))
end
if f || entity_types.include?(:entity)
send(task, worker, options)

View File

@ -3,7 +3,7 @@
# a string or a numeric object. This class
# is pretty much self-explanatory.
# FIXME how can we make this language independent?
module Treat::Entities::Abilities::Buildable
module Treat::Entities::Entity::Buildable
require 'schiphol'
require 'fileutils'
@ -15,7 +15,21 @@ module Treat::Entities::Abilities::Buildable
PunctRegexp = /^[[:punct:]\$]+$/
UriRegexp = /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix
EmailRegexp = /.+\@.+\..+/
Enclitics = %w['ll 'm 're 's 't 've]
Enclitics = [
# EXAMPLE:
"'d", # I'd => I would
"'ll", # I'll => I will
"'m", # I'm => I am
"'re", # We're => We are
"'s", # There's => There is
# Let's => Let us
"'t", # 'Twas => Archaic ('Twas the night)
"'ve", # They've => They have
"n't" # Can't => Can not
]
# Accepted formats of serialized files
AcceptedFormats = ['.xml', '.yml', '.yaml', '.mongo']
# Reserved folder names
Reserved = ['.index']
@ -23,23 +37,38 @@ module Treat::Entities::Abilities::Buildable
# Build an entity from anything (can be
# a string, numeric,folder, or file name
# representing a raw or serialized file).
def build(file_or_value, options = {})
def build(*args)
# This probably needs some doc.
if args.size == 0
file_or_value = ''
elsif args[0].is_a?(Hash)
file_or_value = args[0]
elsif args.size == 1
if args[0].is_a?(Treat::Entities::Entity)
args[0] = [args[0]]
end
file_or_value = args[0]
else
file_or_value = args
end
fv = file_or_value.to_s
if file_or_value.is_a?(Hash)
if fv == ''; self.new
elsif file_or_value.is_a?(Array)
from_array(file_or_value)
elsif file_or_value.is_a?(Hash)
from_db(file_or_value)
elsif self == Treat::Entities::Document ||
(fv.index('yml') || fv.index('yaml') ||
fv.index('xml') || fv.index('mongo'))
elsif self == Treat::Entities::Document || (is_serialized_file?(fv))
if fv =~ UriRegexp
from_url(fv, options)
from_url(fv)
else
from_file(fv, options)
from_file(fv)
end
elsif self == Treat::Entities::Collection
if FileTest.directory?(fv)
from_folder(fv, options)
from_folder(fv)
else
create_collection(fv)
end
@ -63,27 +92,34 @@ module Treat::Entities::Abilities::Buildable
# is user-created (i.e. by calling build
# instead of from_string directly).
def from_string(string, enforce_type = false)
# If calling using the build syntax (i.e. user-
# called), enforce the type that was supplied.
enforce_type = true if caller_method == :build
unless self == Treat::Entities::Entity
return self.new(string) if enforce_type
end
e = anything_from_string(string)
if enforce_type && !e.is_a?(self)
raise "Asked to build a #{cl(self).downcase} "+
raise "Asked to build a #{self.mn.downcase} "+
"from \"#{string}\" and to enforce type, "+
"but type detected was #{cl(e.class).downcase}."
"but type detected was #{e.class.mn.downcase}."
end
e
end
# Build a document from an array
# of builders.
def from_array(array)
obj = self.new
array.each do |el|
el = el.to_entity unless el.is_a?(Treat::Entities::Entity)
obj << el
end
obj
end
# Build a document from an URL.
def from_url(url, options)
def from_url(url)
unless self ==
Treat::Entities::Document
raise Treat::Exception,
@ -91,16 +127,22 @@ module Treat::Entities::Abilities::Buildable
'else than a document from a url.'
end
f = Schiphol.download(url,
:download_folder => Treat.paths.files,
:show_progress => Treat.core.verbosity.silence,
:rectify_extensions => true,
:max_tries => 3
)
begin
folder = Treat.paths.files
if folder[-1] == '/'
folder = folder[0..-2]
end
f = Schiphol.download(url,
download_folder: folder,
show_progress: !Treat.core.verbosity.silence,
rectify_extensions: true,
max_tries: 3)
rescue
raise Treat::Exception,
"Couldn't download file at #{url}."
end
options[:default_to] ||= 'html'
e = from_file(f, options)
e = from_file(f,'html')
e.set :url, url.to_s
e
@ -123,7 +165,7 @@ module Treat::Entities::Abilities::Buildable
# Build an entity from a folder with documents.
# Folders will be searched recursively.
def from_folder(folder, options)
def from_folder(folder)
return if Reserved.include?(folder)
@ -149,39 +191,43 @@ module Treat::Entities::Abilities::Buildable
c = Treat::Entities::Collection.new(folder)
folder += '/' unless folder[-1] == '/'
if !FileTest.directory?(folder)
FileUtils.mkdir(folder)
end
c.set :folder, folder
i = folder + '/.index'
c.set :index, i if FileTest.directory?(i)
Dir[folder + '*'].each do |f|
if FileTest.directory?(f)
c2 = Treat::Entities::Collection.
from_folder(f, options)
from_folder(f)
c.<<(c2, false) if c2
else
c.<<(Treat::Entities::Document.
from_file(f, options), false)
from_file(f), false)
end
end
c
return c
end
# Build a document from a raw or serialized file.
def from_file(file, options)
def from_file(file,def_fmt=nil)
if file.index('yml') ||
file.index('yaml') ||
file.index('xml') ||
file.index('mongo')
from_serialized_file(file, options)
if is_serialized_file?(file)
from_serialized_file(file)
else
fmt = Treat::Workers::Formatters::Readers::Autoselect.
detect_format(file, options[:default_to])
options[:_format] = fmt
from_raw_file(file, options)
fmt = Treat::Workers::Formatters::Readers::Autoselect.detect_format(file,def_fmt)
from_raw_file(file, fmt)
end
end
# Build a document from a raw file.
def from_raw_file(file, options)
def from_raw_file(file, def_fmt='txt')
unless self ==
Treat::Entities::Document
@ -195,32 +241,40 @@ module Treat::Entities::Abilities::Buildable
"Path '#{file}' does not "+
"point to a readable file."
end
d = Treat::Entities::Document.new(file)
options = {default_format: def_fmt}
d = Treat::Entities::Document.new
d.set :file, file
d.read(:autoselect, options)
end
# Build an entity from a serialized file.
def from_serialized_file(file, options)
def from_serialized_file(file)
if file.index('mongo')
options[:id] = file.scan( # Consolidate this
/([0-9]+)\.mongo/).first.first
from_db(:mongo, options)
else
unless File.readable?(file)
raise Treat::Exception,
"Path '#{file}' does not "+
"point to a readable file."
end
d = Treat::Entities::Document.new(file)
d.unserialize(:autoselect, options)
d.children[0].set_as_root! # Fix this
d.children[0]
unless File.readable?(file)
raise Treat::Exception,
"Path '#{file}' does not "+
"point to a readable file."
end
doc = Treat::Entities::Document.new
doc.set :file, file
format = nil
if File.extname(file) == '.yml' ||
File.extname(file) == '.yaml'
format = :yaml
elsif File.extname(file) == '.xml'
format = :xml
else
raise Treat::Exception,
"Unreadable serialized format for #{file}."
end
doc.unserialize(format)
doc.children[0].set_as_root! # Fix this
doc.children[0]
end
def is_serialized_file?(path_to_check)
(AcceptedFormats.include? File.extname(path_to_check)) && (File.file?(path_to_check))
end
def from_db(hash)
@ -238,15 +292,28 @@ module Treat::Entities::Abilities::Buildable
# Build any kind of entity from a string.
def anything_from_string(string)
case self.mn.downcase.intern
when :document
folder = Treat.paths.files
if folder[-1] == '/'
folder = folder[0..-2]
end
case cl(self).downcase.intern
when :document, :collection
now = Time.now.to_f
doc_file = folder+ "/#{now}.txt"
string.force_encoding('UTF-8')
File.open(doc_file, 'w') do |f|
f.puts string
end
from_raw_file(doc_file)
when :collection
raise Treat::Exception,
"Cannot create a document or " +
"Cannot create a " +
"collection from a string " +
"(need a readable file/folder)."
when :phrase
sentence_or_phrase_from_string(string)
group_from_string(string)
when :token
token_from_string(string)
when :zone
@ -258,7 +325,7 @@ module Treat::Entities::Abilities::Buildable
if string.gsub(/[\.\!\?]+/,
'.').count('.') <= 1 &&
string.count("\n") == 0
sentence_or_phrase_from_string(string)
group_from_string(string)
else
zone_from_string(string)
end
@ -269,15 +336,14 @@ module Treat::Entities::Abilities::Buildable
end
# This should be improved on.
def check_encoding(string)
string.encode("UTF-8", undef: :replace) # Fix
end
# Build a phrase from a string.
def sentence_or_phrase_from_string(string)
def group_from_string(string)
check_encoding(string)
if !(string =~ /[a-zA-Z]+/)
Treat::Entities::Fragment.new(string)
elsif string.count('.!?') >= 1
@ -285,7 +351,6 @@ module Treat::Entities::Abilities::Buildable
else
Treat::Entities::Phrase.new(string)
end
end
# Build the right type of token
@ -331,7 +396,7 @@ module Treat::Entities::Abilities::Buildable
end
end
def create_collection(fv)
FileUtils.mkdir(fv)
Treat::Entities::Collection.new(fv)

View File

@ -1,7 +1,7 @@
# This module implements methods that are used
# by workers to determine if an entity is properly
# formatted before working on it.
module Treat::Entities::Abilities::Checkable
module Treat::Entities::Entity::Checkable
# Check if the entity has the given feature,
# and if so return it. If not, calculate the
@ -15,7 +15,7 @@ module Treat::Entities::Abilities::Checkable
g2 = Treat::Workers.lookup(feature)
raise Treat::Exception,
"#{g1.type.to_s.capitalize} #{task} " +
"#{g1.type.to_s.capitalize} " +
"requires #{g2.type} #{g2.method}."
end

View File

@ -1,21 +1,21 @@
module Treat::Entities::Abilities::Comparable
# Allow comparison of entity hierarchy in DOM.
module Treat::Entities::Entity::Comparable
# Determines whether the receiving class
# is smaller, equal or greater in the DOM
# hierarchy compared to the supplied one.
def compare_with(klass)
i = 0; rank_a = nil; rank_b = nil
Treat.core.entities.order.each do |type|
klass2 = Treat::Entities.const_get(cc(type))
klass2 = Treat::Entities.const_get(type.cc)
rank_a = i if self <= klass2
rank_b = i if klass <= klass2
next if rank_a && rank_b
i += 1
end
return -1 if rank_a < rank_b
return 0 if rank_a == rank_b
return 1 if rank_a > rank_b
end
end

View File

@ -1,4 +1,4 @@
module Treat::Entities::Abilities::Countable
module Treat::Entities::Entity::Countable
# Find the position of the current entity
# inside the parent entity, starting at 1.
@ -41,6 +41,7 @@ module Treat::Entities::Abilities::Countable
# Returns the frequency of the given value
# in the this entity.
def frequency_of(value)
value = value.downcase
if is_a?(Treat::Entities::Token)
raise Treat::Exception,
"Cannot get the frequency " +

View File

@ -0,0 +1,86 @@
# When Treat.debug is set to true, each call to
# #call_worker will result in a debug message being
# printed by the #print_debug function.
module Treat::Entities::Entity::Debuggable
# Previous state and counter.
@@prev, @@i = nil, 0
# Explains what Treat is currently doing.
# Fixme: last call will never get shown.
def print_debug(entity, task, worker, group, options)
# Get a list of the worker's targets.
targets = group.targets.map(&:to_s)
# List the worker's targets as either
# a single target or an and/or form
# (since it would be too costly to
# actually determine what target types
# were processed at runtime for each call).
t = targets.size == 1 ? targets[0] : targets[
0..-2].join(', ') + ' and/or ' + targets[-1]
# Add genitive for annotations (sing./plural)
genitive = targets.size > 1 ? 'their' : 'its'
# Set up an empty string and humanize task name.
doing, human_task = '', task.to_s.gsub('_', ' ')
# Base is "{task}-ed {a(n)|N} {target(s)}"
if [:transformer, :computer].include?(group.type)
tt = human_task
tt = tt[0..-2] if tt[-1] == 'e'
ed = tt[-1] == 'd' ? '' : 'ed'
doing = "#{tt.capitalize}#{ed} #{t}"
# Base is "Annotated {a(n)|N} {target(s)}"
elsif group.type == :annotator
if group.preset_option
opt = options[group.preset_option]
form = opt.to_s.gsub('_', ' ')
human_task[-1] = ''
human_task = form + ' ' + human_task
end
doing = "Annotated #{t} with " +
"#{genitive} #{human_task}"
end
# Form is '{base} in format {worker}'.
if group.to_s.index('Formatters')
curr = doing + ' in format ' + worker.to_s
# Form is '{base} using {worker}'.
else
curr = doing + ' using ' + worker.to_s.gsub('_', ' ')
end
# Remove any double pluralization that may happen.
curr.gsub!('ss', 's') unless curr.index('class')
# Accumulate repeated tasks.
@@i += 1 if curr == @@prev
# Change tasks, so output.
if curr != @@prev && @@prev
# Pluralize entity names if necessary.
if @@i > 1
Treat.core.entities.list.each do |e|
@@prev.gsub!(e.to_s, e.to_s + 's')
end
@@prev.gsub!('its', 'their')
@@prev = @@prev.split(' ').
insert(1, @@i.to_s).join(' ')
# Add determiner if singular.
else
@@prev = @@prev.split(' ').
insert(1, 'a').join(' ')
end
# Reset counter.
@@i = 0
# Write to stdout.
puts @@prev + '.'
end
@@prev = curr
end
end

View File

@ -1,7 +1,7 @@
# Makes a class delegatable, allowing calls
# on it to be forwarded to a worker class
# able to perform the appropriate task.
module Treat::Entities::Abilities::Delegatable
module Treat::Entities::Entity::Delegatable
# Add preset methods to an entity class.
def add_presets(group)
@ -10,27 +10,25 @@ module Treat::Entities::Abilities::Delegatable
return unless opt
self.class_eval do
group.presets.each do |preset|
define_method(preset) do |worker=nil, options={}|
return get(preset) if has?(preset)
options = {opt => preset}.merge(options)
m = group.method
send(m, worker, options)
f = unset(m)
features[preset] = f if f
group.presets.each do |preset|
define_method(preset) do |worker=nil, options={}|
return get(preset) if has?(preset)
options = {opt => preset}.merge(options)
m = group.method
send(m, worker, options)
f = unset(m)
features[preset] = f if f
end
end
end
end
end
# Add the workers to perform a task on an entity class.
def add_workers(group)
self.class_eval do
task = group.method
add_presets(group)
define_method(task) do |worker=nil, options={}|
if worker.is_a?(Hash)
options, worker =
@ -64,7 +62,7 @@ module Treat::Entities::Abilities::Delegatable
worker_not_found(worker, group)
end
worker = group.const_get(cc(worker.to_s).intern)
worker = group.const_get(worker.to_s.cc.intern)
result = worker.send(group.method, entity, options)
if group.type == :annotator && result
@ -90,40 +88,32 @@ module Treat::Entities::Abilities::Delegatable
# Get the default worker for that language
# inside the given group.
def find_worker_for_language(language, group)
lang = Treat.languages[language]
cat = group.to_s.split('::')[2].downcase.intern
group = ucc(cl(group)).intern
group = group.mn.ucc.intern
if lang.nil?
raise Treat::Exception,
"No configuration file loaded for language #{language}."
end
workers = lang.workers
if !workers.respond_to?(cat) ||
!workers[cat].respond_to?(group)
workers = Treat.languages.agnostic.workers
end
if !workers.respond_to?(cat) ||
!workers[cat].respond_to?(group)
raise Treat::Exception,
"No #{group} is/are available for the " +
"#{language.to_s.capitalize} language."
end
workers[cat][group].first
end
# Return an error message and suggest possible typos.
def worker_not_found(klass, group)
"Algorithm '#{ucc(cl(klass))}' couldn't be "+
"found in group #{group}." + did_you_mean?(
group.list.map { |c| ucc(c) }, ucc(klass))
def worker_not_found(worker, group)
"Worker with name '#{worker}' couldn't be "+
"found in group #{group}." + Treat::Helpers::Help.
did_you_mean?(group.list.map { |c| c.ucc }, worker)
end
end

View File

@ -1,7 +1,7 @@
module Treat::Entities::Abilities::Exportable
module Treat::Entities::Entity::Exportable
def export(problem)
ds = Treat::Core::DataSet.new(problem)
ds = Treat::Learning::DataSet.new(problem)
each_entity(problem.question.target) do |e|
ds << e
end

View File

@ -1,4 +1,4 @@
module Treat::Entities::Abilities::Iterable
module Treat::Entities::Entity::Iterable
# Yields each entity of any of the supplied
# types in the children tree of this Entity.
@ -6,12 +6,12 @@ module Treat::Entities::Abilities::Iterable
# #each. It does not yield the top element being
# recursed.
#
# This function NEEDS to be ported to C.
# This function NEEDS to be ported to C. #FIXME
def each_entity(*types)
types = [:entity] if types.size == 0
f = false
types.each do |t2|
if is_a?(Treat::Entities.const_get(cc(t2)))
if is_a?(Treat::Entities.const_get(t2.cc))
f = true; break
end
end
@ -57,7 +57,7 @@ module Treat::Entities::Abilities::Iterable
def ancestor_with_type(type)
return unless has_parent?
ancestor = @parent
type_klass = Treat::Entities.const_get(cc(type))
type_klass = Treat::Entities.const_get(type.cc)
while not ancestor.is_a?(type_klass)
return nil unless (ancestor && ancestor.has_parent?)
ancestor = ancestor.parent
@ -94,25 +94,17 @@ module Treat::Entities::Abilities::Iterable
end
# Number of children that have a given feature.
def num_children_with_feature(feature)
# Second variable to allow for passing value to check for.
def num_children_with_feature(feature, value = nil, recursive = true)
i = 0
each do |c|
i += 1 if c.has?(feature)
m = method(recursive ? :each_entity : :each)
m.call do |c|
next unless c.has?(feature)
i += (value == nil ? 1 :
(c.get(feature) == value ? 1 : 0))
end
i
end
# Return the first element in the array, warning if not
# the only one in the array. Used for magic methods: e.g.,
# the magic method "word" if called on a sentence with many
# words, Treat will return the first word, but warn the user.
def first_but_warn(array, type)
if array.size > 1
warn "Warning: requested one #{type}, but" +
" there are many #{type}s in this entity."
end
array[0]
end
end

View File

@ -1,27 +1,20 @@
module Treat::Entities::Abilities::Magical
module Treat::Entities::Entity::Magical
# Parse "magic methods", which allow the following
# syntaxes to be used (where 'word' can be replaced
# by any entity type, e.g. token, zone, etc.):
#
# - each_word : iterate over each entity of type word.
# - words: return an array of words in the entity.
# - each_word : iterate over each children of type word.
# - words: return an array of children words.
# - word: return the first word in the entity.
# - word_count: return the number of words in the entity.
# - words_with_*(value) (where is an arbitrary feature):
# return the words that have the given feature.
# - word_with_*(value) : return the first word with
# the feature specified by * in value.
#
# Also provides magical methods for types of words:
#
# - each_noun:
# - nouns:
# - noun:
# - noun_count:
# - nouns_with_*(value)
# - noun_with_*(value)
# - words_with_*(value) (where * is an arbitrary feature):
# return the words that have the given feature set to value.
#
# Also provides magical methods for types of words (each_noun,
# nouns, noun_count, nouns_with_*(value) noun_with_*(value), etc.)
# For this to be used, the words in the text must have been
# tokenized and categorized in the first place.
def magic(sym, *args)
# Cache this for performance.
@ -80,9 +73,21 @@ module Treat::Entities::Abilities::Magical
elsif method =~ /^frequency_in_#{@@entities_regexp}$/
frequency_in($1.intern)
else
return :no_magic
return :no_magic # :-(
end
end
# Return the first element in the array, warning if not
# the only one in the array. Used for magic methods: e.g.,
# the magic method "word" if called on a sentence with many
# words, Treat will return the first word, but warn the user.
def first_but_warn(array, type)
if array.size > 1
warn "Warning: requested one #{type}, but" +
" there are many #{type}s in this entity."
end
array[0]
end
end

View File

@ -0,0 +1,36 @@
# Registers the entities ocurring in the subtree of
# a node as children are added. Also registers text
# occurrences for word groups and tokens (n grams).
module Treat::Entities::Entity::Registrable
# Registers a token or phrase in the registry.
# The registry keeps track of children by id,
# by entity type, and also keeps the position
# of the entity in its parent entity.
def register(entity)
unless @registry
@count, @registry = 0,
{id: {}, value: {}, position:{}, type: {}}
end
if entity.is_a?(Treat::Entities::Token) ||
entity.is_a?(Treat::Entities::Group)
val = entity.to_s.downcase
@registry[:value][val] ||= 0
@registry[:value][val] += 1
end
@registry[:id][entity.id] = true
@registry[:type][entity.type] ||= 0
@registry[:type][entity.type] += 1
@registry[:position][entity.id] = @count
@count += 1
@parent.register(entity) if has_parent?
end
# Backtrack up the tree to find a token registry,
# by default the one in the root node of the tree.
def registry(type = nil)
(has_parent? && type != self.type) ?
@parent.registry(type) : @registry
end
end

View File

@ -1,18 +1,22 @@
# Gives entities the ability to be converted
# to string representations (#to_string, #to_s,
# #to_str, #inspect, #print_tree).
module Treat::Entities::Abilities::Stringable
# Return the entity's true string value in
# plain text format. Non-terminal entities
# will normally have an empty value.
def to_string; @value; end
module Treat::Entities::Entity::Stringable
# Returns the entity's true string value.
def to_string; @value.dup; end
# Returns an array of the childrens' string
# values, found by calling #to_s on them.
def to_a; @children.map { |c| c.to_s }; end
alias :to_ary :to_a
# Returns the entity's string value by
# imploding the value of all terminal
# entities in the subtree of that entity.
def to_s
@value != '' ? @value : implode.strip
has_children? ? implode.strip : @value.dup
end
# #to_str is the same as #to_s.
@ -24,12 +28,10 @@ module Treat::Entities::Abilities::Stringable
def short_value(max_length = 30)
s = to_s
words = s.split(' ')
if s.length < max_length
s
else
words[0..2].join(' ') + ' [...] ' +
words[-2..-1].join(' ')
end
return s if (s.length < max_length) ||
!(words[0..2] && words[-2..-1])
words[0..2].join(' ') + ' [...] ' +
words[-2..-1].join(' ')
end
# Print out an ASCII representation of the tree.
@ -38,33 +40,32 @@ module Treat::Entities::Abilities::Stringable
# Return an informative string representation
# of the entity.
def inspect
s = "#{cl(self.class)} (#{@id.to_s})"
name = self.class.mn
s = "#{name} (#{@id.to_s})"
if caller_method(2) == :inspect
@id.to_s
else
dependencies = []
@dependencies.each do |dependency|
dependencies <<
"#{dependency.target}#{dependency.type}"
edges = []
@edges.each do |edge|
edges <<
"#{edge.target}#{edge.type}"
end
s += " --- #{short_value.inspect}" +
" --- #{@features.inspect} " +
" --- #{dependencies.inspect} "
" --- #{edges.inspect} "
end
s
end
# Helper method to implode the string value of the subtree.
def implode
def implode(value = "")
return @value.dup if !has_children?
value = ''
each do |child|
if child.is_a?(Treat::Entities::Section)
value += "\n\n"
value << "\n\n"
end
if child.is_a?(Treat::Entities::Token) || child.value != ''
@ -72,14 +73,14 @@ module Treat::Entities::Abilities::Stringable
child.is_a?(Treat::Entities::Enclitic)
value.strip!
end
value += child.to_s + ' '
value << child.to_s + ' '
else
value += child.implode
child.implode(value)
end
if child.is_a?(Treat::Entities::Title) ||
child.is_a?(Treat::Entities::Paragraph)
value += "\n\n"
value << "\n\n"
end
end

View File

@ -1,18 +0,0 @@
module Treat::Entities
# Represents a group of tokens.
class Group < Entity; end
# Represents a group of words
# with a sentence ender (.!?)
class Sentence < Group; end
# Represents a group of words,
# with no sentence ender.
class Phrase < Group; end
# Represents a non-linguistic
# fragment (e.g. stray symbols).
class Fragment < Group; end
end

View File

@ -1,13 +0,0 @@
module Treat::Entities
# Represents a section.
class Section < Entity; end
# Represents a page of text.
class Page < Section; end
# Represents a block of text
class Block < Section; end
# Represents a list.
class List < Section; end
end

Some files were not shown because too many files have changed in this diff Show More