!1147 修复:添加了bf16-dtype字段以防止影响训练精度

Merge pull request !1147 from 黄宇豪/master
This commit is contained in:
黄宇豪 2024-03-26 01:09:59 +00:00 committed by i-robot
parent 8cc8b1e919
commit 1cd3206f58
9 changed files with 20 additions and 8 deletions

View File

@ -345,6 +345,7 @@ python tools/checkpoint/util.py \
--load-dir ./baichuan-13B-hf \
--save-dir ./baichuan-13B-mt \
--tokenizer-model ./baichuan-13B-hf/tokenizer.model \
--params-dtype bf16 \
--w-pack True
```

View File

@ -338,6 +338,7 @@ python tools/checkpoint/util.py \
--load-dir ./baichuan-13B-hf \
--save-dir ./baichuan-13B-mt \
--tokenizer-model ./baichuan-13B-hf/tokenizer.model \
--params-dtype bf16 \
--w-pack True
```

View File

@ -108,6 +108,7 @@ python tools/checkpoint/util.py \
--load-dir ./baichuan2-7B-hf \
--save-dir ./baichuan2-7B-mt \
--tokenizer-model ./baichuan2-7B-hf/tokenizer.model \
--params-dtype bf16 \
--w-pack True
```
@ -327,6 +328,7 @@ python tools/checkpoint/util.py \
--load-dir ./baichuan2-13B-hf \
--save-dir ./baichuan2-13B-mt \
--tokenizer-model ./baichuan2-13B-hf/tokenizer.model \
--params-dtype bf16 \
--w-pack True
```

View File

@ -112,6 +112,7 @@ python tools/checkpoint/util.py \
--load-dir ./baichuan2-7B-hf \
--save-dir ./baichuan2-7B-mt \
--tokenizer-model ./baichuan2-7B-hf/tokenizer.model \
--params-dtype bf16 \
--w-pack True
```
Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy
@ -331,6 +332,7 @@ python tools/checkpoint/util.py \
--load-dir ./baichuan2-13B-hf \
--save-dir ./baichuan2-13B-mt \
--tokenizer-model ./baichuan2-13B-hf/tokenizer.model \
--params-dtype bf16 \
--w-pack True
```

View File

@ -769,7 +769,8 @@ pip install -r requirements.txt
--target-pipeline-parallel-size 4 \
--load-dir ./llama2-70b-hf/ \
--save-dir ./load_ckpt \
--tokenizer-model ./llama2-70b-hf/tokenizer.model
--tokenizer-model ./llama2-70b-hf/tokenizer.model \
--params-dtype bf16
```
4.2 将Llama-2-34B权重从huggingface格式转换为megatron格式
@ -786,7 +787,8 @@ pip install -r requirements.txt
--target-pipeline-parallel-size 4 \
--load-dir ./codellama-34b-hf \
--save-dir ./load_ckpt \
--tokenizer-model ./llama2-70b-hf/tokenizer.model
--tokenizer-model ./llama2-70b-hf/tokenizer.model \
--params-dtype bf16
```
4.3 将Llama-2-70B权重从megatron格式转换为huggingface格式

View File

@ -765,7 +765,8 @@ pip install -r requirements.txt
--target-pipeline-parallel-size 4 \
--load-dir ./codellama-34b-hf \
--save-dir ./load_ckpt \
--tokenizer-model ./llama2-70b-hf/tokenizer.model
--tokenizer-model ./llama2-70b-hf/tokenizer.model \
--params-dtype bf16
```
Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy.

View File

@ -90,7 +90,8 @@
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
--target-tensor-parallel-size 1 \
--target-pipeline-parallel-size 8 \
--target-expert-parallel-size 2
--target-expert-parallel-size 2 \
--params-dtype bf16
cd ..
```
@ -140,7 +141,8 @@ python ./tools/preprocess_data.py \
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
--target-tensor-parallel-size 1 \
--target-pipeline-parallel-size 8 \
--target-expert-parallel-size 2
--target-expert-parallel-size 2 \
--params-dtype bf16
```
2. 任意并行切分策略的Megatron权重 --> 任意并行切分策略的Megatron权重
***该场景一般用于重新配置切分后模型的权重比如在双机16卡 EP2-PP8策略下训练完了想在单机8卡 TP8上进行推理***

View File

@ -89,7 +89,8 @@ Recommended hardware configuration for inference:
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
--target-tensor-parallel-size 1 \
--target-pipeline-parallel-size 8 \
--target-expert-parallel-size 2
--target-expert-parallel-size 2 \
--params-dtype bf16
cd ..
```
@ -139,7 +140,8 @@ python ./tools/preprocess_data.py \
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
--target-tensor-parallel-size 1 \
--target-pipeline-parallel-size 8 \
--target-expert-parallel-size 2
--target-expert-parallel-size 2 \
--params-dtype bf16
```
2. Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy
***(This scenario is generally used to reconfigure the sliced model weights, such as training on a dual-node 16-card EP2-PP8 strategy, and then wanting to infer on a single-node 8-card TP8)***

View File

@ -75,7 +75,6 @@ def load_args_from_checkpoint(args):
args.use_rotary_position_embeddings = True
args.swiglu = True
args.tokenizer_type = "Llama2Tokenizer"
args.fp16 = True
args.normalization = "RMSNorm"
args.add_bias_linear = False
args.untie_embeddings_and_output_weights = True