!1147 修复：添加了bf16-dtype字段以防止影响训练精度

Merge pull request !1147 from 黄宇豪/master
2024-03-26 01:09:59 +00:00 · 2024-03-26 01:09:59 +00:00 · 1cd3206f58
parent 8cc8b1e919
commit 1cd3206f58
9 changed files with 20 additions and 8 deletions
--- a/examples/baichuan/README.md
+++ b/examples/baichuan/README.md
@ -345,6 +345,7 @@ python tools/checkpoint/util.py \
    --load-dir ./baichuan-13B-hf \
    --save-dir ./baichuan-13B-mt \
    --tokenizer-model ./baichuan-13B-hf/tokenizer.model \
+    --params-dtype bf16 \
    --w-pack True      
 ```

--- a/examples/baichuan/README_en.md
+++ b/examples/baichuan/README_en.md
@ -338,6 +338,7 @@ python tools/checkpoint/util.py \
    --load-dir ./baichuan-13B-hf \
    --save-dir ./baichuan-13B-mt \
    --tokenizer-model ./baichuan-13B-hf/tokenizer.model \
+    --params-dtype bf16 \
    --w-pack True  
 ```

--- a/examples/baichuan2/README.md
+++ b/examples/baichuan2/README.md
@ -108,6 +108,7 @@ python tools/checkpoint/util.py \
    --load-dir ./baichuan2-7B-hf \
    --save-dir ./baichuan2-7B-mt \
    --tokenizer-model ./baichuan2-7B-hf/tokenizer.model \
+    --params-dtype bf16 \
    --w-pack True    
 ```

@ -327,6 +328,7 @@ python tools/checkpoint/util.py \
    --load-dir ./baichuan2-13B-hf \
    --save-dir ./baichuan2-13B-mt \
    --tokenizer-model ./baichuan2-13B-hf/tokenizer.model \
+    --params-dtype bf16 \
    --w-pack True      
 ```

--- a/examples/baichuan2/README_en.md
+++ b/examples/baichuan2/README_en.md
@ -112,6 +112,7 @@ python tools/checkpoint/util.py \
    --load-dir ./baichuan2-7B-hf \
    --save-dir ./baichuan2-7B-mt \
    --tokenizer-model ./baichuan2-7B-hf/tokenizer.model \
+    --params-dtype bf16 \
    --w-pack True  
 ```
 Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy
@ -331,6 +332,7 @@ python tools/checkpoint/util.py \
    --load-dir ./baichuan2-13B-hf \
    --save-dir ./baichuan2-13B-mt \
    --tokenizer-model ./baichuan2-13B-hf/tokenizer.model \
+    --params-dtype bf16 \
    --w-pack True  
 ```

--- a/examples/llama2/README.md
+++ b/examples/llama2/README.md
@ -769,7 +769,8 @@ pip install -r requirements.txt
    --target-pipeline-parallel-size 4 \
    --load-dir ./llama2-70b-hf/ \
    --save-dir ./load_ckpt \
-    --tokenizer-model ./llama2-70b-hf/tokenizer.model
+    --tokenizer-model ./llama2-70b-hf/tokenizer.model \
+    --params-dtype bf16 
  ```

 4.2 将Llama-2-34B权重从huggingface格式转换为megatron格式
@ -786,7 +787,8 @@ pip install -r requirements.txt
     --target-pipeline-parallel-size 4 \
     --load-dir ./codellama-34b-hf \
     --save-dir ./load_ckpt \
-     --tokenizer-model ./llama2-70b-hf/tokenizer.model
+     --tokenizer-model ./llama2-70b-hf/tokenizer.model \
+     --params-dtype bf16 
    ```

 4.3 将Llama-2-70B权重从megatron格式转换为huggingface格式
--- a/examples/llama2/README_en.md
+++ b/examples/llama2/README_en.md
@ -765,7 +765,8 @@ pip install -r requirements.txt
     --target-pipeline-parallel-size 4 \
     --load-dir ./codellama-34b-hf \
     --save-dir ./load_ckpt \
-     --tokenizer-model ./llama2-70b-hf/tokenizer.model                                                               
+     --tokenizer-model ./llama2-70b-hf/tokenizer.model \
+     --params-dtype bf16                                                              
    ```

    Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy.
--- a/examples/mixtral/README.md
+++ b/examples/mixtral/README.md
@ -90,7 +90,8 @@
        --tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
        --target-tensor-parallel-size 1 \
        --target-pipeline-parallel-size 8 \
-        --target-expert-parallel-size 2 
+        --target-expert-parallel-size 2 \
+        --params-dtype bf16 
   cd ..
   ```

@ -140,7 +141,8 @@ python ./tools/preprocess_data.py \
        --tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
        --target-tensor-parallel-size 1 \
        --target-pipeline-parallel-size 8 \
-        --target-expert-parallel-size 2 
+        --target-expert-parallel-size 2 \
+        --params-dtype bf16 
    ```
 2. 任意并行切分策略的Megatron权重 --> 任意并行切分策略的Megatron权重
    ***（该场景一般用于重新配置切分后模型的权重，比如在双机16卡 EP2-PP8策略下训练完了，想在单机8卡 TP8上进行推理）***
--- a/examples/mixtral/README_en.md
+++ b/examples/mixtral/README_en.md
@ -89,7 +89,8 @@ Recommended hardware configuration for inference:
        --tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
        --target-tensor-parallel-size 1 \
        --target-pipeline-parallel-size 8 \
-        --target-expert-parallel-size 2 
+        --target-expert-parallel-size 2 \
+        --params-dtype bf16 
    cd ..
   ```

@ -139,7 +140,8 @@ python ./tools/preprocess_data.py \
        --tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
        --target-tensor-parallel-size 1 \
        --target-pipeline-parallel-size 8 \
-        --target-expert-parallel-size 2 
+        --target-expert-parallel-size 2 \
+        --params-dtype bf16 
    ```
 2. Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy
 ***(This scenario is generally used to reconfigure the sliced model weights, such as training on a dual-node 16-card EP2-PP8 strategy, and then wanting to infer on a single-node 8-card TP8)***
--- a/tools/checkpoint/loader_llama2_hf.py
+++ b/tools/checkpoint/loader_llama2_hf.py
@ -75,7 +75,6 @@ def load_args_from_checkpoint(args):
    args.use_rotary_position_embeddings = True
    args.swiglu = True
    args.tokenizer_type = "Llama2Tokenizer"
-    args.fp16 = True
    args.normalization = "RMSNorm"
    args.add_bias_linear = False
    args.untie_embeddings_and_output_weights = True