Commit Graph

5 Commits

Author SHA1 Message Date
Li Jiang f2d7553cdc
Add support to custom text spliter (#270)
* Add support to custom text spliter function and a list of files or urls

* Add parameter to retrieve_config, add tests

* Fix tests

* Fix tests
2023-10-17 14:53:40 +00:00
Li Jiang fa6e2a52c0
Add support to customized vectordb and embedding functions (#161)
* Add custom embedding function

* Add support to custom vector db

* Improve docstring

* Improve docstring

* Improve docstring

* Add support to customized is_termination_msg fucntion

* Add a test for customize vector db with lancedb

* Fix tests

* Add test for embedding_function

* Update docstring
2023-10-10 12:53:18 +00:00
Li Jiang 19f8711c1b
Update num tokens from text (#149)
* Improve num_tokens_from_text

* Format

* Update comments

* Improve docstrings
2023-10-09 02:30:11 +00:00
Mohamed Attia a3547f82c4
Replace the use of `assert` in non-test code (#80)
* Replace `assert`s in the `conversable_agent` module with `if-log-raise`.

* Use a `logger` object in the `code_utils` module.

* Replace use of `assert` with `if-log-raise` in the `code_utils` module.

* Replace use of `assert` in the `math_utils` module with `if-not-raise`.

* Replace `assert` with `if` in the `oai.completion` module.

* Replace `assert` in the `retrieve_utils` module with an if statement.

* Add missing `not`.

* Blacken `completion.py`.

* Test `generate_reply` and `a_generate_reply` raise an assertion error
when there are neither `messages` nor a `sender`.

* Test `execute_code` raises an `AssertionError` when neither code nor
filename is provided.

* Test `split_text_to_chunks` raises when passed an invalid chunk mode.

* * Add `tiktoken` and `chromadb` to test dependencies as they're used in
the `test_retrieve_utils` module.

* Sort the test requirements alphabetically.
2023-10-03 17:52:50 +00:00
Aaron 4adbffa94b
retrieve_utils.py - Updated.py to have the ability to parse text from PDF Files (#50)
* UPDATE - Updated retrieve_utils.py to have the ability to parse text from pdf files

* UNDO - change to recursive condition

* UPDATE - updated agentchat_RetrieveChat.ipynb to clarify which file types are accepted to be in the docs path

* ADD - missing import

* UPDATE - setup.py to have PyPDF2 in retrievechat

* RE-ADD - urls

* ADD - tests for retrieve utils, and removed deprecated PyPdf2

* Update agentchat_RetrieveChat.ipynb

* Update retrieve_utils.py

Fix format

* Update retrieve_utils.py

Replace print with logger

* UPDATE - added more specific exception to PDF decryption try/catch

* FIX - typo, return statement at wrong indentation in extract_text_from_pdf

---------

Co-authored-by: Ward <award40@LAMU0CLP74YXVX6.uhc.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-10-01 10:22:58 +00:00