!1147 修复:添加了bf16-dtype字段以防止影响训练精度
Merge pull request !1147 from 黄宇豪/master
This commit is contained in:
parent
8cc8b1e919
commit
1cd3206f58
|
@ -345,6 +345,7 @@ python tools/checkpoint/util.py \
|
|||
--load-dir ./baichuan-13B-hf \
|
||||
--save-dir ./baichuan-13B-mt \
|
||||
--tokenizer-model ./baichuan-13B-hf/tokenizer.model \
|
||||
--params-dtype bf16 \
|
||||
--w-pack True
|
||||
```
|
||||
|
||||
|
|
|
@ -338,6 +338,7 @@ python tools/checkpoint/util.py \
|
|||
--load-dir ./baichuan-13B-hf \
|
||||
--save-dir ./baichuan-13B-mt \
|
||||
--tokenizer-model ./baichuan-13B-hf/tokenizer.model \
|
||||
--params-dtype bf16 \
|
||||
--w-pack True
|
||||
```
|
||||
|
||||
|
|
|
@ -108,6 +108,7 @@ python tools/checkpoint/util.py \
|
|||
--load-dir ./baichuan2-7B-hf \
|
||||
--save-dir ./baichuan2-7B-mt \
|
||||
--tokenizer-model ./baichuan2-7B-hf/tokenizer.model \
|
||||
--params-dtype bf16 \
|
||||
--w-pack True
|
||||
```
|
||||
|
||||
|
@ -327,6 +328,7 @@ python tools/checkpoint/util.py \
|
|||
--load-dir ./baichuan2-13B-hf \
|
||||
--save-dir ./baichuan2-13B-mt \
|
||||
--tokenizer-model ./baichuan2-13B-hf/tokenizer.model \
|
||||
--params-dtype bf16 \
|
||||
--w-pack True
|
||||
```
|
||||
|
||||
|
|
|
@ -112,6 +112,7 @@ python tools/checkpoint/util.py \
|
|||
--load-dir ./baichuan2-7B-hf \
|
||||
--save-dir ./baichuan2-7B-mt \
|
||||
--tokenizer-model ./baichuan2-7B-hf/tokenizer.model \
|
||||
--params-dtype bf16 \
|
||||
--w-pack True
|
||||
```
|
||||
Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy
|
||||
|
@ -331,6 +332,7 @@ python tools/checkpoint/util.py \
|
|||
--load-dir ./baichuan2-13B-hf \
|
||||
--save-dir ./baichuan2-13B-mt \
|
||||
--tokenizer-model ./baichuan2-13B-hf/tokenizer.model \
|
||||
--params-dtype bf16 \
|
||||
--w-pack True
|
||||
```
|
||||
|
||||
|
|
|
@ -769,7 +769,8 @@ pip install -r requirements.txt
|
|||
--target-pipeline-parallel-size 4 \
|
||||
--load-dir ./llama2-70b-hf/ \
|
||||
--save-dir ./load_ckpt \
|
||||
--tokenizer-model ./llama2-70b-hf/tokenizer.model
|
||||
--tokenizer-model ./llama2-70b-hf/tokenizer.model \
|
||||
--params-dtype bf16
|
||||
```
|
||||
|
||||
4.2 将Llama-2-34B权重从huggingface格式转换为megatron格式
|
||||
|
@ -786,7 +787,8 @@ pip install -r requirements.txt
|
|||
--target-pipeline-parallel-size 4 \
|
||||
--load-dir ./codellama-34b-hf \
|
||||
--save-dir ./load_ckpt \
|
||||
--tokenizer-model ./llama2-70b-hf/tokenizer.model
|
||||
--tokenizer-model ./llama2-70b-hf/tokenizer.model \
|
||||
--params-dtype bf16
|
||||
```
|
||||
|
||||
4.3 将Llama-2-70B权重从megatron格式转换为huggingface格式
|
||||
|
|
|
@ -765,7 +765,8 @@ pip install -r requirements.txt
|
|||
--target-pipeline-parallel-size 4 \
|
||||
--load-dir ./codellama-34b-hf \
|
||||
--save-dir ./load_ckpt \
|
||||
--tokenizer-model ./llama2-70b-hf/tokenizer.model
|
||||
--tokenizer-model ./llama2-70b-hf/tokenizer.model \
|
||||
--params-dtype bf16
|
||||
```
|
||||
|
||||
Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy.
|
||||
|
|
|
@ -90,7 +90,8 @@
|
|||
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
|
||||
--target-tensor-parallel-size 1 \
|
||||
--target-pipeline-parallel-size 8 \
|
||||
--target-expert-parallel-size 2
|
||||
--target-expert-parallel-size 2 \
|
||||
--params-dtype bf16
|
||||
cd ..
|
||||
```
|
||||
|
||||
|
@ -140,7 +141,8 @@ python ./tools/preprocess_data.py \
|
|||
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
|
||||
--target-tensor-parallel-size 1 \
|
||||
--target-pipeline-parallel-size 8 \
|
||||
--target-expert-parallel-size 2
|
||||
--target-expert-parallel-size 2 \
|
||||
--params-dtype bf16
|
||||
```
|
||||
2. 任意并行切分策略的Megatron权重 --> 任意并行切分策略的Megatron权重
|
||||
***(该场景一般用于重新配置切分后模型的权重,比如在双机16卡 EP2-PP8策略下训练完了,想在单机8卡 TP8上进行推理)***
|
||||
|
|
|
@ -89,7 +89,8 @@ Recommended hardware configuration for inference:
|
|||
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
|
||||
--target-tensor-parallel-size 1 \
|
||||
--target-pipeline-parallel-size 8 \
|
||||
--target-expert-parallel-size 2
|
||||
--target-expert-parallel-size 2 \
|
||||
--params-dtype bf16
|
||||
cd ..
|
||||
```
|
||||
|
||||
|
@ -139,7 +140,8 @@ python ./tools/preprocess_data.py \
|
|||
--tokenizer-model ../Mixtral-8x7B-v0.1/tokenizer.model \
|
||||
--target-tensor-parallel-size 1 \
|
||||
--target-pipeline-parallel-size 8 \
|
||||
--target-expert-parallel-size 2
|
||||
--target-expert-parallel-size 2 \
|
||||
--params-dtype bf16
|
||||
```
|
||||
2. Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy
|
||||
***(This scenario is generally used to reconfigure the sliced model weights, such as training on a dual-node 16-card EP2-PP8 strategy, and then wanting to infer on a single-node 8-card TP8)***
|
||||
|
|
|
@ -75,7 +75,6 @@ def load_args_from_checkpoint(args):
|
|||
args.use_rotary_position_embeddings = True
|
||||
args.swiglu = True
|
||||
args.tokenizer_type = "Llama2Tokenizer"
|
||||
args.fp16 = True
|
||||
args.normalization = "RMSNorm"
|
||||
args.add_bias_linear = False
|
||||
args.untie_embeddings_and_output_weights = True
|
||||
|
|
Loading…
Reference in New Issue