From 2a5de337936fe8bc47b7fa520d57db1d550bad23 Mon Sep 17 00:00:00 2001
From: xujiao <xj@debian>
Date: Fri, 18 Apr 2025 15:14:44 +0800
Subject: [PATCH] update docs

---
 README.md            | 404 ++++++++++++++++++++++++++++++++++++++++++-
 quick_start_guide.md |  72 ++++----
 2 files changed, 436 insertions(+), 40 deletions(-)

diff --git a/README.md b/README.md
index 2187f38..f775c38 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,400 @@
-# netrans 
-Netrans 是一套运行在unbuntu 20.04上的模型处理工具，提供命令行工具 netrans_cli 和 python api netrans_py， 其核心功能是将模型权重转换成在pnna芯片上运行的 nbg（network binary graph）格式（.nb 为后缀）。
+# Netrans 简介
 
-快速使用请参考[quick_start_guide.md](./quick_start_guide.md)
-详细说明请参考[introduction.md](./introduction.md) 
-具体示例请参考examples。
+Netrans 是Pnna NPU 配套的AI编译器，提供命令行工具 netrans_cli 和 python api netrans_py， 其功能是将模型权重转换成在 Pnna NPU 上运行的 nbg（network binary graph）格式文件（.nb 后缀）。
+
+## 工程结构
+
+Netrans 目录结构如下：
+
+```text
+netrans-ai-compiler/
+├── bin/                  # 编译器可执行文件
+├── netrans_cli/          # 命令行工具
+├── netrans_py/           # Python接口
+├── examples/             # 示例代码
+└── setup.sh              # 安装脚本
+
+```
+
+## 安装指南
+
+### 系统依赖
+
+- CPU ： Intel® Core™ i5-6500 CPU @ 3.2 GHz x4 支持 the Intel® Advanced Vector Extensions.
+- RAM ： 至少8GB
+- 硬盘 ： 160GB
+- 操作系统 ： Ubuntu 20.04 LTS 64-bit with Python 3.8，不推荐使用其他版本
+
+### 安装步骤
+
+- 安装依赖
+
+```shell
+sudo apt update
+sudo apt install build-essential
+```
+
+- 创建 python3.8 环境
+
+```bash
+wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
+mkdir -p ~/app
+INSTALL_PATH="${HOME}/app/miniforge3"
+bash Miniforge3-Linux-x86_64.sh -b -p ${INSTALL_PATH}
+echo "source "${INSTALL_PATH}/etc/profile.d/conda.sh"" >> ${HOME}/.bashrc
+echo "source "${INSTALL_PATH}/etc/profile.d/mamba.sh"" >> ${HOME}/.bashrc
+source ${HOME}/.bashrc
+mamba create -n netrans python=3.8 -y
+mamba activate netrans
+```
+
+- 下载 Netrans
+
+```bash
+cd ~/app
+git clone https://gitlink.org.cn/nudt_dsp/netrans.git
+```
+
+- 运行配置脚本
+
+```bash
+cd ~/app/netrans
+./setup.sh
+```
+
+## Netrans 使用说明
+
+Netrans 提供 tensorflow、caffe、darknet、onnx 和 pytorch 的模型转换示例，请参考 [示例](./examples/index.rst)
+
+### 命令行工具
+
+Netrans CLI 提供了简单的命令行接口，用于编译和优化模型。
+基本用法
+
+```bash
+load.sh model_path  # 模型导入
+config.sh model_path  # 参数配置
+quantize.sh model_path quantize_type # 模型量化
+export.sh model_path quantize_type # 模型导出
+```
+
+详细说明请参考[netrans_cli 使用](netrans_cli.md)。
+
+### Python接口
+
+通过Netrans Python接口，可以方便地在Python脚本中调用编译器。
+示例代码：
+
+ ```py3
+from nertans import Netrans
+model_path = 'example/darknet/yolov4_tiny'
+netrans_path = "netrans/bin" # 如果进行了export定义申明，这一步可以不用
+
+# 初始化netrans
+net = Netrans(model_path,netrans=netrans_path)
+# 模型载入
+net.import()
+# 配置预处理 normlize 的参数
+net.config(scale=1,mean=0)
+# 模型量化
+net.quantize("uint8")
+# 模型导出
+net.export()
+
+# 模型直接量化成 int16 并导出, 直接复用刚配置好的 inputmeta
+net.model2nbg(quantize_type = "int16", inputmeta=True)
+```
+
+详细说明请参考[netrans_py 使用](netrans_py.md)。
+
+## 模型支持
+
+Netrans 支持主流框架见下表。
+
+|输入支持|描述|
+|:---|---|
+| caffe|支持所有的Caffe 模型 |
+| Tensorflow|支持版本1.4.x, 2.0.x, 2.3.x, 2.6.x, 2.8.x, 2.10.x, 2.12.x 以tf.io.write_graph()保存的模型 |
+| ONNX|支持 ONNX 至 1.14.0， opset支持至19 |
+| Pytorch | 支持 Pytorch 至 1.5.1 |
+| Darknet |支持[官网](https://pjreddie.com/darknet/)列出 darknet 模型|
+
+<font color="#dd0000">注意：</font> Pytorch 动态图的特性，建议将 Pytorch 模型导出成 onnx ，再使用 Netrans 进行转换。
+
+## 算子支持
+
+### 支持的Caffe算子
+
+```{table}
+:class: noheader
+|   |   |   |
+|:---| -- | -- |
+absval   |   innerproduct   |   reorg
+axpy   |   lrn   |   roipooling
+batchnorm/bn   |   l2normalizescale   |   relu
+convolution   |   leakyrelu   |   reshape
+concat   |   lstm   |   reverse
+convolutiondepthwise   |   normalize   |   swish
+dropout   |   poolwithargmax   |   slice
+depthwiseconvolution   |   premute   |   scale
+deconvolution   |   prelu   |   shufflechannel
+elu   |   pooling   |   softmax
+eltwise   |   priorbox   |   sigmoid
+flatten   |   proposal   |   tanh
+```
+
+### 支持的TensorFlow算子
+
+```{table}
+:class: noheader
+|   |   |   |
+|:---| -- | -- |
+tf.abs   |   tf.nn.rnn_cell_GRUCell   |   tf.negative
+tf.add   |   tf.nn.dynamic_rnn   |   tf.pad
+tf.nn.bias_add   |   tf.nn.rnn_cell_GRUCell   |   tf.transpose
+tf.add_n   |   tf.greater   |   tf.nn.avg_pool
+tf.argmin   |   tf.greater_equal   |   tf.nn.max_pool
+tf.argmax   |   tf.image.resize_bilinear   |   tf.reduce_mean
+tf.batch_to_space_nd   |   tf.image.resize_nearest_neighbor   |   tf.nn.max_pool_with_argmax
+tf.nn.batch_normalization   |   tf.contrib.layers.instance_norm   |   tf.pow
+tf.nn.fused_batchnorm   |   tf.nn.fused_batch_norm   |   tf.reduce_mean
+tf.cast   |   tf.stack   |   tf.reduce_sum
+tf.clip_by_value   |   tf.nn.sigmoid   |   tf.reverse
+tf.concat   |   tf.signal.frame   |   tf.reverse_sequence
+tf.nn.conv1d   |   tf.slice   |   tf.nn.relu
+tf.nn.conv2d   |   tf.nn.softmax   |   tf.nn.relu6
+tf.nn.depthwise_conv2d   |   tf.space_to_batch_nd   |   tf.rsqrt
+tf.nn.conv1d   |   tf.space_to_depth   |   tf.realdiv
+tf.nn.conv3d   |   tf.nn.local_response_normalization   |   tf.reshape
+tf.image.crop_and_resize   |   tf.nn.l2_normalize   |   tf.expand_dims
+tf.nn.conv2d_transposed   |   tf.nn.rnn_cell_LSTMCelltf.nn_dynamic_rnn   |   tf.squeeze
+tf.depth_to_space   |   tf.rnn_cell.LSTMCell   |   tf.strided_slice
+tf.equal   |   tf.less   |   tf.sqrt
+tf.exp   |   tf.less_equal   |   tf.square
+tf.nn.elu   |   tf.logical_or   |   tf.subtract
+tf.nn.embedding_lookup   |   tf.logical_add   |   tf.scatter_nd
+tf.maximum   |   tf.nn.leaky_relu   |   tf.split
+tf.floor   |   tf.multiply   |   tf.nn.swish
+tf.matmul   |   tf.nn.moments   |   tf.tile
+tf.floordiv   |   tf.minimum   |   tf.nn.tanh
+tf.gather_nd   |   tf.matmul   |   tf.unstack
+tf.gather   |   tf.batch_matmul   |   tf.where
+tf.nn.embedding_lookup   |   tf.not_equal   |   tf.select
+```
+  
+### 支持的ONNX算子
+
+```{table}
+:class: noheader
+|   |   |   |
+|:---| -- | -- |
+ArgMin   |   LeakyRelu   |   ReverseSequence
+ArgMax   |   Less   |   ReduceMax
+Add   |   LSTM   |   ReduceMin
+Abs   |   MatMul   |   ReduceL1
+And   |   Max   |   ReduceL2
+BatchNormalization   |   Min   |   ReduceLogSum
+Clip   |   MaxPool   |   ReduceLogSumExp
+Cast   |   AveragePool   |   ReduceSumSquare
+Concat   |   Globa   |   Reciprocal
+ConvTranspose   |   lAveragePool   |   Resize
+Conv   |   GlobalMaxPool   |   Sum
+Div   |   MaxPool   |   SpaceToDepth
+Dropout   |   AveragePool   |   Sqrt
+DepthToSpace   |   Mul   |   Split
+DequantizeLinear   |   Neg   |   Slice
+Equal   |   Or   |   Squeeze
+Exp   |   Prelu   |   Softmax
+Elu   |   Pad   |   Sub
+Expand   |   POW   |   Sigmoid
+Floor   |   QuantizeLinear   |   Softsign
+InstanceNormalization   |   QLinearMatMul   |   Softplus
+Gemm   |   QLinearConv   |   Sin
+Gather   |   Relu   |   Tile
+Greater   |   Reshape   |   Transpose
+GatherND   |   Squeeze   |   Tanh
+GRU   |   Unsqueeze   |   Upsample
+Logsoftmax   |   Flatten   |   Where
+LRN   |   ReduceSum   |   Xor
+Log   |   ReduceMean   |      |
+```
+
+### 支持的Darknet算子
+
+```{table}
+:class: noheader
+|   |   |   |
+|:---| -- | -- |
+avgpool   |   maxpool   |   softmax
+batch_normalize   |   mish   |   shortcut
+connected   |   region   |   scale_channels
+convolutional   |   reorg   |   swish
+depthwise_convolutional   |   relu   |   upsample
+leaky   |   route   |   yolo
+logistic
+```
+
+<!-- ## 数据准备
+
+对于不同框架下训练的模型，需要准备不同的数据，所有的数据都需要放在同一个文件夹下。
+模型名和文件名需要保持一致。
+
+### caffe
+
+转换 caffe 模型时，模型工程目录应包含以下文件：
+
+- 以 .prototxt 结尾的模型结构定义文件
+- 以 .caffemode 结尾的模型权重文件
+- dataset.txt 包含数据路径的文本文件（支持图像和NPY格式）
+
+以 lenet_caffe 为例，初始目录为：
+
+```bash
+lenet_caffe/
+├── 0.jpg                   # 校准数据
+├── dataset.txt             # 指定数据地址的文件
+├── lenet_caffe.caffemodel  # caffe 模型权重
+└── lenet_caffe.prototxt    # caffe 模型结构
+```
+
+### tensorflow
+
+转换 tenrsorflow 模型时，模型工程目录应包含以下文件：
+
+- .pb 文件：冻结图模型文件
+- inputs_outputs.txt：输入输出节点定义文件
+- dataset.txt：数据路径配置文件
+
+以 lenet 为例，初始目录为：
+
+```bash
+lenet/
+├── 0.jpg                # 校准数据
+├── dataset.txt          # 指定数据地址的文件 
+├── inputs_outputs.txt   # 输入输出节点定义文件
+└── lenet.pb             # 冻结图模型文件
+```
+
+### darknet
+
+转换Darknet模型需准备：
+
+- .cfg 文件：网络结构配置文件
+- .weights 文件：训练权重文件
+- .dataset.txt：数据路径配置文件
+
+以 yolov4_tiny 为例，初始目录为：
+
+```bash
+yolov4_tiny/
+├── 0.jpg                 # 校准数据
+├── dataset.txt           # 指定数据地址的文件 
+├── yolov4_tiny.cfg       # 网络结构配置文件
+└── yolov4_tiny.weights   # 预训练权重文件
+```
+
+### onnx
+
+转换ONNX模型需准备：
+
+- .onnx 文件：网络模型
+- dataset.txt：数据路径配置文件
+
+以 yolov5s 为例，初始目录为：
+
+```bash
+yolov5s/
+├── 0.jpg          # 校准数据
+├── dataset.txt    # 指定数据地址的文件 
+└── yolov5s.onnx   # 网络模型
+``` -->
+
+## 配置文件说明
+
+Inputmeta.yml 是 config 生成的配置文件模版，该文件用于为Netrans中间模型配置输入层数据集合。
+Netrans中的量化、推理、导出和图片转dat的操作都需要用到这个文件。
+Inputmeta.yml内容如下：
+
+```yaml
+%YAML 1.2
+---
+# !!!This file disallow TABs!!!
+# "category" allowed values: "image, undefined"
+# "database" allowed types: "H5FS, SQLITE, TEXT, LMDB, NPY, GENERATOR"
+# "tensor_name" only support in H5FS database
+# "preproc_type" allowed types:"IMAGE_RGB, IMAGE_RGB888_PLANAR, IMAGE_RGB888_PLANAR_SEP, 
+IMAGE_I420, 
+# IMAGE_NV12, IMAGE_YUV444, IMAGE_GRAY, IMAGE_BGRA, TENSOR"
+input_meta:
+ databases:
+ - path: dataset.txt
+ type: TEXT
+ ports:
+ - lid: data_0
+ category: image
+ dtype: float32
+ sparse: false
+ tensor_name:
+ layout: nhwc
+ shape:
+ - 50
+ - 224
+ - 224
+ - 3
+ preprocess:
+ reverse_channel: false
+ mean:
+ - 103.94
+ - 116.78
+ - 123.67
+ scale: 0.017
+ preproc_node_params:
+ preproc_type: IMAGE_RGB
+ add_preproc_node: false
+ preproc_perm:
+ - 0
+ - 1
+ - 2
+ - 3
+ - lid: label_0
+ redirect_to_output: true
+ category: undefined
+ tensor_name:
+ dtype: float32
+ shape:
+ - 1
+ - 1
+```
+
+参数说明：
+
+```{table}
+:widths: 20, 80
+:align: left
+|  参数   | 说明  |
+| :---  | ---  
+| input_meta  | 预处理参数配置申明。 |
+| databases  | 数据配置，包括设置 path、type 和 ports 。|
+| path  | 数据集文件的相对（执行目录）或绝对路径。默认为 dataset.txt, 不建议修改。 |
+| type  | 数据集文件格式，固定为TEXT。 |
+| ports  | 指向网络中的输入或重定向的输入，目前只支持一个输入，如果网络存在多个输入，请与@ccyh联系。 |
+| lid  | 输入层的lid |
+| category  | 输入的类别。将此参数设置为以下值之一：image（图像输入）或 undefined（其他类型的输入）。 |
+| dtype  | 输入张量的数据类型，用于将数据发送到 pnna 网络的输入端口。支持的数据类型包括 float32 和 quantized。 |
+| sparse  | 指定网络张量是否以稀疏格式存在。将此参数设置为以下值之一：true（稀疏格式）或 false（压缩格式）。 |
+| tensor_name  | 留空此参数 |
+| layout  | 输入张量的格式，使用 nchw 用于 Caffe、Darknet、ONNX 和 PyTorch 模型。使用 nhwc 用于 TensorFlow、TensorFlow Lite 和 Keras 模型。 |
+| shape  | 此张量的形状。第一维，shape[0]，表示每批的输入数量，允许在一次推理操作之前将多个输入发送到网络。如果batch维度设置为0，则需要从命令行指定--batch-size。如果 batch维度设置为大于1的值，则直接使用inputmeta.yml中的batch size并忽略命令行中的--batch-size。 |
+| fitting  | 保留字段 |
+| preprocess  | 预处理步骤和顺序。预处理支持下面的四个参数，参数的顺序代表预处理的顺序。 |
+| reverse_channel  | 指定是否保留通道顺序。将此参数设置为以下值之一：true（保留通道顺序）或 false（不保留通道顺序）。对于 TensorFlow 和 TensorFlow Lite 框架的模型使用 true。 |
+| mean  | 用于每个通道的均值。 |
+| scale  | 张量的缩放值。均值和缩放值用于根据公式 (inputTensor - mean) × scale 归一化输入张量。|
+| preproc_node_params  | 预处理节点参数，在 OVxlib C 项目案例中启用预处理任务 |
+| add_preproc_node  | 用于处理 OVxlib C 项目案例中预处理节点的插入。[true, false] 中的布尔值，表示通过配置以下参数将预处理层添加到导出的应用程序中。此参数仅在 add_preproc_node 参数设置为 true 时有效。|
+| preproc_type  | 预处理节点输入类型。 [IMAGE_RGB, IMAGE_RGB888_PLANAR,IMAGE_YUV420, IMAGE_GRAY, IMAGE_BGRA, TENSOR] 中的字符串值 |
+| preproc_perm  | 预处理节点输入的置换参数。 |
+| redirect_to_output  | 将database张量重定向到图形输出的特殊属性。如果为该属性设置了一个port，网络构建器将自动为该port生成一个输出层，以便后处理文件可以直接处理来自database的张量。 如果使用网络进行分类，则上例中的lid“input_0”表示输入数据集的标签lid。 请注意，redirect_to_output 必须设置为 true，以便后处理文件可以直接处理来自database的张量。 标签的lid必须与后处理文件中定义的 labels_tensor 的lid相同。 [true, false] 中的布尔值。 指定是否将由张量表示的输入端口的数据直接发送到网络输出。true（直接发送到网络输出）或 false（不直接发送到网络输出）|
+```
+
+需要根据具体模型的参数对生成的inputmeta文件进行修改。
diff --git a/quick_start_guide.md b/quick_start_guide.md
index 289d0e8..338720a 100644
--- a/quick_start_guide.md
+++ b/quick_start_guide.md
@@ -9,24 +9,35 @@
 - RAM 至少 8GB
 
 ## 安装Netrans
-创建 conda 环境 .
+
+创建 python3.8 环境
+
 ```bash
-conda create -n netrans python=3.8 -y
-conda activate netrans
+wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
+mkdir -p ~/app
+INSTALL_PATH="${HOME}/app/miniforge3"
+bash Miniforge3-Linux-x86_64.sh -b -p ${INSTALL_PATH}
+echo "source "${INSTALL_PATH}/etc/profile.d/conda.sh"" >> ${HOME}/.bashrc
+echo "source "${INSTALL_PATH}/etc/profile.d/mamba.sh"" >> ${HOME}/.bashrc
+source ${HOME}/.bashrc
+mamba create -n netrans python=3.8 -y
+mamba activate netrans
 ```
 
-下载 Netrans .
+下载 Netrans
+
 ```bash
-mkdir -p ~/app
 cd ~/app
 git clone https://gitlink.org.cn/nudt_dsp/netrans.git
 ```
 
-安装 Netrans。
+配置 Netrans
+
 ```bash
 cd ~/app/netrans
 ./setup.sh
 ```
+
 ## 使用 Netrans 编译 yolov5s 模型
 
 进入工作目录
@@ -37,7 +48,7 @@ cd ～/app/netrans/examples/onnx
 
 此时目录如下：
 
-```
+```text
 onnx/
 ├── README.md
 └── yolov5s
@@ -46,21 +57,21 @@ onnx/
     └── yolov5s.onnx
 ```
 
-### 使用 netrans_cli 编译 yolov5s 
+### 使用 netrans_cli 编译 yolov5s
 
 #### 导入模型
 
 ```bash
-import.sh yolov5s
+load.sh yolov5s
 ```
 
 该命令会在工程目录下生成包含模型信息的 .json 和 .data 数据文件。
 
 此时 yolov5s 的目录结构如下
-```
+
+```text
 yolov5s/
 ├── 0.jpg
-├── dataset.txt
 ├── yolov5s.data
 ├── yolov5s.json
 └── yolov5s.onnx
@@ -75,7 +86,8 @@ config.sh yolov5s
 ```
 
 此时 yolov5s 的目录结构如下：
-```
+
+```text
 yolov5s/
 ├── 0.jpg
 ├── dataset.txt
@@ -85,25 +97,28 @@ yolov5s/
 └── yolov5s.onnx
 
 ```
+
 根据 yolov5s 的前处理参数 ，修改 yml 中的 scale 为 0.003921568627。
 打开 ` yolov5s_inputmeta.yml ` 文件，修改第30-33行：
-```
+
+```text
         scale:
         - 0.003921568627
         - 0.003921568627
         - 0.003921568627
-
 ```
 
 #### 量化模型
-生成 unit8 量化的量化参数文件。
+
+生成 unit8 量化的量化参数文件
+
 ```bash
 quantize.sh yolov5s uint8
-
 ```
+
 此时 yolov5s 的目录结构如下：
 
-```
+```text
 yolov5s/
 ├── 0.jpg
 ├── dataset.txt
@@ -115,35 +130,22 @@ yolov5s/
 ```
 
 #### 导出模型
-导出 unit8 量化的模型项目工程。
+
+导出 unit8 量化的模型项目工程
 
 ```bash
 export.sh yolov5s uint8
 ```
+
 此时 yolov5s 的目录结构如下：
 
-```
+```text
 yolov5s/
 ├── 0.jpg
 ├── dataset.txt
 ├── wksp
 │   └── asymmetric_affine
-│       ├── BUILD
-│       ├── dump_core_graph.json
-│       ├── graph.json
-│       ├── main.c
-│       ├── makefile.linux
-│       ├── network_binary.nb
-│       ├── vnn_global.h
-│       ├── vnn_post_process.c
-│       ├── vnn_post_process.h
-│       ├── vnn_pre_process.c
-│       ├── vnn_pre_process.h
-│       ├── vnn_yolov5sasymmetricaffine.c
-│       ├── vnn_yolov5sasymmetricaffine.h
-│       ├── yolov5sasymmetricaffine.2012.vcxproj
-│       ├── yolov5s_asymmetric_affine.export.data
-│       └── yolov5sasymmetricaffine.vcxproj
+│       └── network_binary.nb
 ├── yolov5s_asymmetric_affine.quantize
 ├── yolov5s.data
 ├── yolov5s_inputmeta.yml