Readme optimize (#65)

* update readme for cn,en

* update readme for cn,en

* update readme for cn,en

* minor format tweak

---------

Co-authored-by: huaidong.xhd <huaidong.xhd@antgroup.com>
This commit is contained in:
蚂蚁田常 2024-11-21 19:26:23 +08:00 committed by GitHub
parent 421090e849
commit 8c80cdc98b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 120 additions and 76 deletions

102
README.md
View File

@ -1,60 +1,85 @@
# KAG: Knowledge Augmented Generation
[中文版文档](./README_cn.md)
[日本語版ドキュメント](./README_ja.md)
<div align="center">
<a href="https://spg.openkg.cn/en-US">
<img src="./_static/images/OpenSPG-1.png" width="520" alt="openspg logo">
</a>
</div>
## 1. What is KAG
<p align="center">
<a href="./README.md">English</a> |
<a href="./README_cn.md">简体中文</a> |
<a href="./README_ja.md">日本語版ドキュメント</a>
</p>
Retrieval Augmentation Generation (RAG) technology promotes the integration of domain applications with large language models. However, RAG has problems such as a large gap between vector similarity and knowledge reasoning correlation, and insensitivity to knowledge logic (such as numerical values, time relationships, expert rules, etc.), which hinder the implementation of professional knowledge services.
<p align="center">
<a href='https://arxiv.org/pdf/2409.13731'><img src='https://img.shields.io/badge/arXiv-2409.13731-b31b1b'></a>
<a href="https://github.com/OpenSPG/KAG/releases/latest">
<img src="https://img.shields.io/github/v/release/OpenSPG/KAG?color=blue&label=Latest%20Release" alt="Latest Release">
</a>
<a href="https://github.com/OpenSPG/KAG/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>
On October 24, 2024, OpenSPG released v0.5, officially releasing the professional domain knowledge service framework of knowledge augmented generation (KAG). The goal of KAG is to build a knowledge-enhanced LLM service framework in professional domains, supporting logical reasoning, factual Q&A, etc. KAG fully integrates the logical and factual characteristics of the KGs. Meanwhile, it uses OpenIE to lower the threshold for knowledgeization of domain documents and alleviates the sparsity problem of the KG through hybrid reasoning. As far as we know, KAG is the only RAG framework that supports logical reasoning and multi-hop factual Q&A. Its core features include:
# 1. What is KAG?
* Knowledge and Chunk Mutual Indexing structure to integrate more complete contextual text information
* Knowledge alignment using conceptual semantic reasoning to alleviate the noise problem caused by OpenIE
* Schema-constrained knowledge construction to support the representation and construction of domain expert knowledge
* Logical form-guided hybrid reasoning and retrieval to support logical reasoning and multi-hop reasoning Q&A
KAG is a logical reasoning and Q&A framework based on the [OpenSPG](https://github.com/OpenSPG/openspg) engine and large language models, which is used to build logical reasoning and Q&A solutions for vertical domain knowledge bases. KAG can effectively overcome the ambiguity of traditional RAG vector similarity calculation and the noise problem of GraphRAG introduced by OpenIE. KAG supports logical reasoning and multi-hop fact Q&A, etc., and is significantly better than the current SOTA method.
KAG is significantly better than NaiveRAG, HippoRAG and other methods in multi-hop Q&A tasks. The F1 score on hotpotQA is relatively increased by 19.6%, and the F1 score on 2wiki is relatively increased by 33.5%. We have successfully applied KAG to Ant Group's professional knowledge Q&A tasks, such as e-government Q&A and e-health Q&A, and the professionalism has been significantly improved compared to the traditional RAG method.
The goal of KAG is to build a knowledge-enhanced LLM service framework in professional domains, supporting logical reasoning, factual Q&A, etc. KAG fully integrates the logical and factual characteristics of the KGs. Its core features include:
- Knowledge and Chunk Mutual Indexing structure to integrate more complete contextual text information
- Knowledge alignment using conceptual semantic reasoning to alleviate the noise problem caused by OpenIE
- Schema-constrained knowledge construction to support the representation and construction of domain expert knowledge
- Logical form-guided hybrid reasoning and retrieval to support logical reasoning and multi-hop reasoning Q&A
⭐️ Star our repository to stay up-to-date with exciting new features and improvements! Get instant notifications for new releases! 🌟
![Star KAG](./_static/images/star-kag.gif)
## 2 Core Features
# 2. Core Features
### 2.1 Knowledge Representation
## 2.1 Knowledge Representation
In the context of private knowledge bases, unstructured data, structured information, and business expert experience often coexist. KAG references the DIKW hierarchy to upgrade SPG to a version that is friendly to LLMs. For unstructured data such as news, events, logs, and books, as well as structured data like transactions, statistics, and approvals, along with business experience and domain knowledge rules, KAG employs techniques such as layout analysis, knowledge extraction, property normalization, and semantic alignment to integrate raw business data and expert rules into a unified business knowledge graph.
In the context of private knowledge bases, unstructured data, structured information, and business expert experience often coexist. KAG references the DIKW hierarchy to upgrade SPG to a version that is friendly to LLMs.
For unstructured data such as news, events, logs, and books, as well as structured data like transactions, statistics, and approvals, along with business experience and domain knowledge rules, KAG employs techniques such as layout analysis, knowledge extraction, property normalization, and semantic alignment to integrate raw business data and expert rules into a unified business knowledge graph.
![KAG Diagram](./_static/images/kag-diag.jpg)
This makes it compatible with schema-free information extraction and schema-constrained expertise construction on the same knowledge type (e. G., entity type, event type), and supports the cross-index representation between the graph structure and the original text block. This mutual index representation is helpful to the construction of inverted index based on graph structure, and promotes the unified representation and reasoning of logical forms.
This makes it compatible with schema-free information extraction and schema-constrained expertise construction on the same knowledge type (e. G., entity type, event type), and supports the cross-index representation between the graph structure and the original text block.
### 2.2 Mixed Reasoning Guided by Logic Forms
This mutual index representation is helpful to the construction of inverted index based on graph structure, and promotes the unified representation and reasoning of logical forms.
## 2.2 Mixed Reasoning Guided by Logic Forms
![Logical Form Solver](./_static/images/kag-lf-solver.png)
KAG proposes a logically formal guided hybrid solution and inference engine. The engine includes three types of operators: planning, reasoning, and retrieval, which transform natural language problems into problem solving processes that combine language and notation. In this process, each step can use different operators, such as exact match retrieval, text retrieval, numerical calculation or semantic reasoning, so as to realize the integration of four different problem solving processes: Retrieval, Knowledge Graph reasoning, language reasoning and numerical calculation.
KAG proposes a logically formal guided hybrid solution and inference engine.
The engine includes three types of operators: planning, reasoning, and retrieval, which transform natural language problems into problem solving processes that combine language and notation.
## 3. Release Notes
In this process, each step can use different operators, such as exact match retrieval, text retrieval, numerical calculation or semantic reasoning, so as to realize the integration of four different problem solving processes: Retrieval, Knowledge Graph reasoning, language reasoning and numerical calculation.
### 3.1 Released Versions
* 2024.11.21 : Support docs upload, model invoke concurrency setting, User experience optimization, etc.
* 2024.10.25 : KAG release
# 3. Release Notes
### 3.2 Future Plan
* 2024.12 : domain knowledge injection, domain schema customization, QFS tasks support, Visual query analysis, etc.
* 2025.01 : Logical reasoning optimization, conversational tasks support
* 2025.02 : kag-model release, kag solution for event reasoning knowledge graph and medical knowledge graph
* 2025.03 : Kag front-end open source, distributed build support, mathematical reasoning optimization
## 3.1 Latest Updates
* 2024.11.21 : Support Word docs upload, model invoke concurrency setting, User experience optimization, etc.
* 2024.10.25 : KAG initial release
## 4. How to use it
## 3.2 Future Plans
### 4.1 product-based (for ordinary users)
* domain knowledge injection, domain schema customization, QFS tasks support, Visual query analysis, etc.
* Logical reasoning optimization, conversational tasks support
* kag-model release, kag solution for event reasoning knowledge graph and medical knowledge graph
* kag front-end open source, distributed build support, mathematical reasoning optimization
#### 4.1.1 Engine & Dependent Image Installation
# 4. Quick Start
## 4.1 product-based (for ordinary users)
### 4.1.1 Engine & Dependent Image Installation
* **Recommend System Version:**
@ -81,19 +106,20 @@ curl -sSL https://raw.githubusercontent.com/OpenSPG/openspg/refs/heads/master/de
docker compose -f docker-compose.yml up -d
```
#### 4.1.2 Use the product
### 4.1.2 Use the product
Navigate to the default url of the KAG product with your browser: <http://127.0.0.1:8887>
See the [Product](https://openspg.yuque.com/ndx6g9/wc9oyq/rgd8ecefccwd1ga5) guide for detailed introduction.
### 4.2 toolkit-based (for developers)
## 4.2 toolkit-based (for developers)
#### 4.2.1 Engine & Dependent Image Installation
### 4.2.1 Engine & Dependent Image Installation
Refer to the 3.1 section to complete the installation of the engine & dependent image.
#### 4.2.2 Installation of KAG
### 4.2.2 Installation of KAG
**macOS / Linux developers**
@ -117,13 +143,13 @@ Refer to the 3.1 section to complete the installation of the engine & dependent
# Install KAG: cd KAG && pip install -e .
```
#### 4.2.3 Use the toolkit
### 4.2.3 Use the toolkit
Please refer to the [Quick Start](https://openspg.yuque.com/ndx6g9/wc9oyq/owp4sxbdip2u7uvv) guide for detailed introduction of the toolkit. Then you can use the built-in components to reproduce the performance results of the built-in datasets, and apply those components to new busineness scenarios.
## 5. Technical Architecture
# 5. Technical Architecture
![Figure 1 KAG technical architecture](./_static/images/kag-arch.png)
![KAG technical architecture](./_static/images/kag-arch.png)
The KAG framework includes three parts: kg-builder, kg-solver, and kag-model. This release only involves the first two parts, kag-model will be gradually open source release in the future.
@ -131,7 +157,7 @@ kg-builder implements a knowledge representation that is friendly to large-scale
kg-solver uses a logical symbol-guided hybrid solving and reasoning engine that includes three types of operators: planning, reasoning, and retrieval, to transform natural language problems into a problem-solving process that combines language and symbols. In this process, each step can use different operators, such as exact match retrieval, text retrieval, numerical calculation or semantic reasoning, so as to realize the integration of four different problem solving processes: Retrieval, Knowledge Graph reasoning, language reasoning and numerical calculation.
## 6. Contact us
# 6. Contact us
**GitHub**: <https://github.com/OpenSPG/KAG>
@ -139,11 +165,11 @@ kg-solver uses a logical symbol-guided hybrid solving and reasoning engine that
<img src="./_static/images/openspg-qr.png" alt="Contact Us: OpenSPG QR-code" width="200">
# Differences between KAG, RAG, and GraphRAG
# 7. Differences between KAG, RAG, and GraphRAG
**KAG introduction and applications**: <https://github.com/orgs/OpenSPG/discussions/52>
# Cite
# 8. Citation
If you use this software, please cite it as below:

View File

@ -1,28 +1,46 @@
# 大模型知识服务框架 KAG
[English version](./README.md)
[日本語版ドキュメント](./README_ja.md)
<div align="center">
<a href="https://spg.openkg.cn/en-US">
<img src="./_static/images/OpenSPG-1.png" width="520" alt="openspg logo">
</a>
</div>
## 1. KAG 是什么
<p align="center">
<a href="./README.md">English</a> |
<a href="./README_cn.md">简体中文</a> |
<a href="./README_ja.md">日本語版ドキュメント</a>
</p>
检索增强生成RAG技术推动了领域应用与大模型结合。然而RAG 存在着向量相似度与知识推理相关性差距大、对知识逻辑(如数值、时间关系、专家规则等)不敏感等问题,这些都阻碍了专业知识服务的落地。
<p align="center">
<a href='https://arxiv.org/pdf/2409.13731'><img src='https://img.shields.io/badge/arXiv-2409.13731-b31b1b'></a>
<a href="https://github.com/OpenSPG/KAG/releases/latest">
<img src="https://img.shields.io/github/v/release/OpenSPG/KAG?color=blue&label=Latest%20Release" alt="Latest Release">
</a>
<a href="https://github.com/OpenSPG/KAG/blob/main/LICENSE">
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?labelColor=d4eaf7&color=2e6cc4" alt="license">
</a>
</p>
2024 年 10 月 24 日OpenSPG 发布 v0.5 版本正式发布了知识增强生成KAG的专业领域知识服务框架。KAG 旨在充分利用知识图谱和向量检索的优势构建专业领域知识增强的LLM服务框架支持逻辑推理、事实问答等。KAG充分融合了知识图谱的逻辑性和事实性同时利用OpenIE降低领域文档知识化的门槛通过混合推理缓解传统知识图谱的稀疏性问题。据我们所知KAG是唯一一个支持逻辑推理和多跳事实问答的RAG框架。其核心特点包括
# 1. KAG 是什么
* 知识与Chunk互索引结构以整合更丰富的上下文文本信息
* 利用概念语义推理进行知识对齐缓解OpenIE引入的噪音问题
* 支持Schema-Constraint知识构建支持领域专家知识的表示与构建
KAG 是基于 [OpenSPG](https://github.com/OpenSPG/openspg) 引擎和大型语言模型的逻辑推理问答框架用于构建垂直领域知识库的逻辑推理问答解决方案。KAG 可以有效克服传统 RAG 向量相似度计算的歧义性和 OpenIE 引入的 GraphRAG 的噪声问题。KAG 支持逻辑推理、多跳事实问答等,并且明显优于目前的 SOTA 方法。
KAG 的目标是在专业领域构建知识增强的 LLM 服务框架支持逻辑推理、事实问答等。KAG 充分融合了 KG 的逻辑性和事实性特点,其核心功能包括:
* 知识与 Chunk 互索引结构,以整合更丰富的上下文文本信息
* 利用概念语义推理进行知识对齐,缓解 OpenIE 引入的噪音问题
* 支持 Schema-Constraint 知识构建,支持领域专家知识的表示与构建
* 逻辑符号引导的混合推理与检索,实现逻辑推理和多跳推理问答
KAG 在多跳问答任务中显著优于 NaiveRAG、HippoRAG 等方法,在 hotpotQA 上的 F1 分数相对提高了 19.6%,在 2wiki 上的 F1 分数相对提高了 33.5%。我们已成功将 KAG 应用于蚂蚁集团的电子政务问答和电子健康问答等专业知识服务场景,与 RAG 方法相比,专业性和准确率有了显著提高。
⭐️点击右上角的 Star 关注 KAG可以获取最新发布的实时通知🌟
![Star KAG](./_static/images/star-kag.gif)
## 2.KAG 核心功能
# 2. KAG 核心功能
## 2.1 LLM 友好的语义化知识管理
### 2.1、LLM 友好的语义化知识管理
私域知识库场景,非结构化数据、结构化信息、业务专家经验 往往三者共存KAG 提出了一种对大型语言模型LLM友好的知识表示框架在 DIKW数据、信息、知识和智慧的层次结构基础上将 SPG 升级为对 LLM 友好的版本,命名为 LLMFriSPG。
这使得它能够在同一知识类型(如实体类型、事件类型)上兼容无 schema 约束的信息提取和有 schema 约束的专业知识构建,并支持图结构与原始文本块之间的互索引表示。
@ -31,30 +49,33 @@ KAG 在多跳问答任务中显著优于 NaiveRAG、HippoRAG 等方法,在 hot
![KAG 示意图](./_static/images/kag-diag.jpg)
### 2.2、逻辑符号引导的混合推理引擎
## 2.2 逻辑符号引导的混合推理引擎
KAG 提出了一种逻辑符号引导的混合求解和推理引擎。该引擎包括三种类型的运算符:规划、推理和检索,将自然语言问题转化为结合语言和符号的问题求解过程。
在这个过程中每一步都可以利用不同的运算符如精确匹配检索、文本检索、数值计算或语义推理从而实现四种不同问题求解过程的集成图谱推理、逻辑计算、Chunk 检索和 LLM 推理。
![Logical Form Solver](./_static/images/kag-lf-solver.png)
## 3. 版本发布
# 3. 版本发布
### 3.1、已发布版本
* 2024.11.21 : 支持word 文档上传、知识库删除、模型调用并发度设置、用户体验优化等
* 2024.10.25 : KAG 发布
## 3.1 最近更新
### 3.2、后续计划
* 2024.12 : 领域知识注入、领域schema 自定义、摘要生成类任务支持、可视化图分析查询等
* 2025.01 : 逻辑推理 优化、对话式任务支持
* 2025.02 : kag-model 发布、事理图谱 和 医疗图谱的kag 解决方案发布
* 2025.03 : kag 前端开源、分布式构建支持、数学推理 优化
* 2024.11.21 : 支持 Word 文档上传、知识库删除、模型调用并发度设置、用户体验优化等
* 2024.10.25 : KAG 首次发布
## 4. 怎样使用
## 3.2 后续计划
### 4.1 基于产品(面向普通用户)
* 领域知识注入、领域 schema 自定义、摘要生成类任务支持、可视化图分析查询等
* 逻辑推理 优化、对话式任务支持
* kag-model 发布、事理图谱 和 医疗图谱的 kag 解决方案发布
* kag 前端开源、分布式构建支持、数学推理 优化
#### 4.1.1 引擎&依赖 镜像安装
# 4. 快速开始
## 4.1 基于产品(面向普通用户)
### 4.1.1 引擎&依赖 镜像安装
* **推荐系统版本:**
@ -81,19 +102,19 @@ curl -sSL https://raw.githubusercontent.com/OpenSPG/openspg/refs/heads/master/de
docker compose -f docker-compose.yml up -d
```
#### 4.1.2 使用
### 4.1.2 使用
浏览器打开 KAG 产品默认链接:<http://127.0.0.1:8887>
具体使用参考[产品使用](https://openspg.yuque.com/ndx6g9/0.0.5/bv9zc3gyi98k0oyx)介绍。
### 4.2 基于工具包(面向开发者)
## 4.2 基于工具包(面向开发者)
#### 4.2.1 引擎&依赖 镜像安装
### 4.2.1 引擎&依赖 镜像安装
参考 4.1 部分完成引擎&依赖的镜像安装。
#### 4.2.2 KAG 安装
### 4.2.2 KAG 安装
**macOS / Linux 开发者**
@ -117,16 +138,14 @@ docker compose -f docker-compose.yml up -d
# KAG 安装: cd KAG && pip install -e .
```
#### 4.2.3 使用
### 4.2.3 使用
开发者可以参考 [KAG 案例](https://openspg.yuque.com/ndx6g9/0.5/vbbdp80vg0xf5n3k),基于 KAG 内置的各种组件,实现内置数据集的效果复现 + 新场景的落地。
## 5. 技术架构
# 5. 技术架构
![图1 KAG 技术架构](./_static/images/kag-arch.png)
图1 KAG 技术架构
![KAG 技术架构](./_static/images/kag-arch.png)
KAG 框架包括 kg-builder、kg-solver、kag-model 三部分。本次发布只涉及前两部分kag-model 将在后续逐步开源发布。
@ -134,7 +153,7 @@ kg-builder 实现了一种对大型语言模型LLM友好的知识表示
kg-solver 采用逻辑形式引导的混合求解和推理引擎,该引擎包括三种类型的运算符:规划、推理和检索,将自然语言问题转化为结合语言和符号的问题求解过程。在这个过程中,每一步都可以利用不同的运算符,如精确匹配检索、文本检索、数值计算或语义推理,从而实现四种不同问题求解过程的集成:检索、知识图谱推理、语言推理和数值计算。
## 6. 联系我们
# 6. 联系我们
**GitHub**: <https://github.com/OpenSPG/KAG>
@ -142,11 +161,11 @@ kg-solver 采用逻辑形式引导的混合求解和推理引擎,该引擎包
<img src="./_static/images/openspg-qr.png" alt="联系我们OpenSPG 二维码" width="200">
# KAG 与RAG、GraphRAG 差异
# 7. KAG 与 RAG、GraphRAG 差异
**KAG introduction and applications**: <https://github.com/orgs/OpenSPG/discussions/52>
# 引用
# 8. 引用
如果您使用本软件,请以下面的方式引用:
@ -167,7 +186,6 @@ kg-solver 采用逻辑形式引导的混合求解和推理引擎,该引擎包
}
```
# 许可协议
[Apache License 2.0](LICENSE)

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB