3.4 KiB
KAG Example: CSQA
The UltraDomain cs.jsonl
dataset contains 10 documents in Computer Science and 100 questions with their answers about those documents.
Here we demonstrate how to build a knowledge graph for those documents, generate answers to those questions with KAG and compare KAG generated answers with those from other RAG systems.
1. Precondition
Please refer to Quick Start to install KAG and its dependency OpenSPG server, and learn about using KAG in developer mode.
2. Steps to reproduce
Step 1: Enter the example directory
cd kag/examples/csqa
Step 2: (Optional) Prepare the data
Download UltraDomain cs.jsonl
and execute generate_data.py to generate data files in ./builder/data and ./solver/data. Since the generated files were committed, this step is optional.
python generate_data.py
Step 3: Configure models
Update the generative model configurations openie_llm
and chat_llm
and the representational model configuration vectorize_model
in kag_config.yaml.
You need to fill in correct api_key
s. If your model providers and model names are different from the default values, you also need to update base_url
and model
.
The splitter
and num_threads_per_chain
configurations may also be updated to match with other systems.
Step 4: Project initialization
Initiate the project with the following command.
knext project restore --host_addr http://127.0.0.1:8887 --proj_path .
Step 5: Commit the schema
Execute the following command to commit the schema CsQa.schema.
knext schema commit
Step 6: Build the knowledge graph
Execute indexer.py in the builder directory to build the knowledge graph.
cd builder && python indexer.py && cd ..
Step 7: Generate the answers
Execute eval.py in the solver directory to generate the answers.
cd solver && python eval.py && cd ..
The results are saved to ./solver/data/csqa_kag_answers.json
.
Step 8: (Optional) Get the answers generated by other systems
Follow the LightRAG Reproduce steps to generate answers to the questions and save the results to ./solver/data/csqa_lightrag_answers.json. Since a copy was committed, this step is optional.
Step 9: Calculate the metrics
Update the LLM configurations in summarization_metrics.py and factual_correctness.py and execute them to calculate the metrics.
python ./solver/summarization_metrics.py
python ./solver/factual_correctness.py
Step 10: (Optional) Cleanup
To delete the checkpoints, execute the following command.
rm -rf ./builder/ckpt
rm -rf ./solver/ckpt
To delete the KAG project and related knowledge graph, execute the following similar command. Replace the OpenSPG server address and KAG project id with actual values.
curl http://127.0.0.1:8887/project/api/delete?projectId=1