KAG/kag/examples/csqa/README.md

3.4 KiB

KAG Example: CSQA

English | 简体中文

The UltraDomain cs.jsonl dataset contains 10 documents in Computer Science and 100 questions with their answers about those documents.

Here we demonstrate how to build a knowledge graph for those documents, generate answers to those questions with KAG and compare KAG generated answers with those from other RAG systems.

1. Precondition

Please refer to Quick Start to install KAG and its dependency OpenSPG server, and learn about using KAG in developer mode.

2. Steps to reproduce

Step 1: Enter the example directory

cd kag/examples/csqa

Step 2: (Optional) Prepare the data

Download UltraDomain cs.jsonl and execute generate_data.py to generate data files in ./builder/data and ./solver/data. Since the generated files were committed, this step is optional.

python generate_data.py

Step 3: Configure models

Update the generative model configurations openie_llm and chat_llm and the representational model configuration vectorize_model in kag_config.yaml.

You need to fill in correct api_keys. If your model providers and model names are different from the default values, you also need to update base_url and model.

The splitter and num_threads_per_chain configurations may also be updated to match with other systems.

Step 4: Project initialization

Initiate the project with the following command.

knext project restore --host_addr http://127.0.0.1:8887 --proj_path .

Step 5: Commit the schema

Execute the following command to commit the schema CsQa.schema.

knext schema commit

Step 6: Build the knowledge graph

Execute indexer.py in the builder directory to build the knowledge graph.

cd builder && python indexer.py && cd ..

Step 7: Generate the answers

Execute eval.py in the solver directory to generate the answers.

cd solver && python eval.py && cd ..

The results are saved to ./solver/data/csqa_kag_answers.json.

Step 8: (Optional) Get the answers generated by other systems

Follow the LightRAG Reproduce steps to generate answers to the questions and save the results to ./solver/data/csqa_lightrag_answers.json. Since a copy was committed, this step is optional.

Step 9: Calculate the metrics

Update the LLM configurations in summarization_metrics.py and factual_correctness.py and execute them to calculate the metrics.

python ./solver/summarization_metrics.py
python ./solver/factual_correctness.py

Step 10: (Optional) Cleanup

To delete the checkpoints, execute the following command.

rm -rf ./builder/ckpt
rm -rf ./solver/ckpt

To delete the KAG project and related knowledge graph, execute the following similar command. Replace the OpenSPG server address and KAG project id with actual values.

curl http://127.0.0.1:8887/project/api/delete?projectId=1