13 KiB

Raw Blame History

Bloom `\color{black}{\rm\tiny{【Model}}` `\color{black}{\rm\tiny{contributed}}` `\color{black}{\rm\tiny{by}}` `\color{black}{\rm\tiny{Ascend】}}`

简体中文 | English

[toc]

Bloom-7B

Training

Here's a hardware summary of pre-training Bloom-7B:

Hardware	Value
NPU	8 x Ascend NPUs

Script

Clone the repository to your local server:

git clone https://gitee.com/ascend/ModelLink.git
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout -f bcce6f
cp -r megatron ../ModelLink/
cd ..
cd ModelLink
git checkout 1.0
mkdir logs
mkdir model_from_hf
mkdir dataset
mkdir ckpt

Build environment

# python3.8
conda create -n test python=3.8
conda activate test

# install torch and torch_npu 
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_aarch64.whl
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl

# modify the path according to your own  ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh

# install MindSpeed
git clone https://gitee.com/ascend/MindSpeed.git
cd MindSpeed
git checkout 224ae35e8fc96778f957029d1371ddb623452a50
pip install -r requirements.txt 
pip3 install -e .
cd ..

# install other packages
pip install -r requirements.txt

Prepare pretrained weights Download the Bloom-7B checkpoint from here

mkdir ./model_from_hf/Bloom-7B/
cd ./model_from_hf/Bloom-7B/
cd tokenizer
wget https://huggingface.co/bigscience/bloom/resolve/main/special_tokens_map.json
wget https://huggingface.co/bigscience/bloom/resolve/main/tokenizer.json
wget https://huggingface.co/bigscience/bloom/resolve/main/tokenizer_config.json
...
cd ../../

Weights convert

HuggingFace weights --> Megatron weights (This scenario is generally used to train open-source HuggingFace models on Megatron)

python tools/checkpoint/util.py \
    --model-type GPT \
    --loader loader_bloom_hf \
    --saver saver_megatron \
    --target-tensor-parallel-size 8 \
    --target-pipeline-parallel-size 1 \
    --load-dir ./model_from_hf/Bloom-7B/ \
    --save-dir ./model_weights/Bloom-7B-v0.1-tp8-pp1/ \
    --tokenizer-model None

Any Megatron weights with parallel slicing strategy --> Any Megatron weights with parallel slicing strategy (This scenario is generally used to convert the trained megatron model back to the HuggingFace format)

# Modify the ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
python tools/checkpoint/util.py \
    --model-type GPT \
    --loader megatron \
    --saver megatron \
    --save-model-type save_huggingface_llama \
    --load-dir ./model_weights/Bloom-7B-v0.1-tp8-pp1/ \
    --target-tensor-parallel-size 1 \
    --target-pipeline-parallel-size 1 \
    --embed-layernorm \
    --save-dir ./model_from_hf/Bloom-7B/   # <-- Fill in the original HF model path here, new weights will be saved in ./model_from_hf/Bloom-7B/mg2hg/

Prepare dataset

Download the Bloom-7B datasets from here

# download datasets
cd dataset/
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..

# prepare datasets
mkdir ./dataset/Bloom-7B/
python ./tools/preprocess_data.py \
  --input ./dataset/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
  --tokenizer-name-or-path ./model_from_hf/Bloom-7B/ \
  --output-prefix ./dataset/Bloom-7B/alpaca \
  --workers 4 \
  --log-interval 1000 \
  --tokenizer-type PretrainedFromHF

Config Bloom-7B pre-training script(Bloom-7B does not support Flash Attention) : examples/bloom/pretrain_bloom_ptd_7B.sh

# modify the script according to your own  ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh 

CKPT_SAVE_DIR="./ckpt/Bloom-7B/"
DATA_PATH="./dataset/Bloom-7B/alpaca_text_document"
TOKENIZER_PATH="./model_from_hf/Bloom-7B/"
CKPT_LOAD_DIR="./model_weights/Bloom-7B-v0.1-tp8-pp1/"

Launch Bloom-7B pre-training script: examples/bloom/pretrain_bloom_ptd_7B.sh

bash examples/bloom/pretrain_bloom_ptd_7B.sh

Note: If using multi machine training, and no data sharing configuration on the mechines, it's necessary to add the parameter --no-shared-storage. This parameter will determine whether non master nodes need to load data based on distributed parameters, and check the corresponding cache and generated data.

Performance

Machine performance

The performance of Bloom-7B in Ascend NPU and Reference:

Device	Model	total Iterations	throughput rate (samples/s)	throughput rate (tokens/s/p)	single-step time (s/step)
NPUs	Bloom 7b	1000	7.95	2034	64.55
Reference	Bloom 7B	1000	9.894	2525	19.40

Inference Bloom-7B

Config Bloom-7B inference script: tasks/inference/generate_bloom_7b_ptd.sh

# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh 
 
# modify script model path and tokenizer path
CHECKPOINT="./model_weights/Bloom-7B-Base-v0.1-tp8-pp1/"
TOKENIZER_PATH="./model_from_hf/Bloom-7B-Base/"

Launch Bloom-7B inference script: tasks/inference/generate_bloom_7b_ptd.sh

bash tasks/inference/generate_bloom_7b_ptd.sh

Some inference samples are as follows:

Evaluation Bloom-7B

Config Bloom-7B evaluation script: tasks/evaluation/evaluate_bloom_7B_ptd.sh

source /usr/local/Ascend/ascend-toolkit/set_env.sh 

# modify script model path and tokenizer path
CHECKPOINT="./model_weights/Bloom-7B-Base-v0.1-tp8-pp1/"
TOKENIZER_PATH="./model_from_hf/Bloom-7B-Base/"
# configure task and data path
DATA_PATH="your dataset path"
TASK="your task"

Launch Bloom-7B evaluation script:

bash tasks/evaluation/evaluate_bloom_7B_ptd.sh

Evaluation results

dataset	subject_num	question_num	reference_acc	NPU acc
MMLU	57	14042	0.254	0.251

Bloom-176B

Training

BLOOM model is from: A 176B-Parameter Open-Access Multilingual Language Model.

Scao, Teven Le, et al. "Bloom: A 176b-parameter open-access multilingual language model." arXiv preprint arXiv:2211.05100 (2022).

Here's a hardware summary of pre-training Bloom-176B:

Hardware	Value
NPU	12x8 Ascend 910

Script

Clone the repository to your local server

git clone https://gitee.com/ascend/ModelLink.git
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout -f bcce6f
cp -r megatron ../ModelLink/
cd ..
cd ModelLink
git checkout 1.0
mkdir logs
mkdir model_from_hf
mkdir dataset
mkdir ckpt

Build enviroment

# python3.8
conda create -n test python=3.8
conda activate test

# install torch and torch_npu 
pip install torch-2.1.0-cp38-cp38m-linux_aarch64.whl
pip install torch_npu-2.1.0.XXX-cp38-cp38m-linux_aarch64.whl
pip install apex-0.1_ascend*-cp38-cp38m-linux_aarch64.whl

# modify the path according to your own  ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh

# install MindSpeed
git clone https://gitee.com/ascend/MindSpeed.git
cd MindSpeed
git checkout 224ae35e8fc96778f957029d1371ddb623452a50
pip install -r requirements.txt 
pip3 install -e .
cd ..

# install other packages
pip install -r requirements.txt

Prepare pretrained weights

Download the Bloom-176B tokensizer from here.

mkdir ./model_from_hf/Bloom-176B/
cd ./model_from_hf/Bloom-176B/
wget https://huggingface.co/bigscience/bloom/resolve/main/special_tokens_map.json
wget https://huggingface.co/bigscience/bloom/resolve/main/tokenizer.json
wget https://huggingface.co/bigscience/bloom/resolve/main/tokenizer_config.json
...
cd ../../

Weights convert

HuggingFace weights --> Megatron weights (This scenario is generally used to train open-source HuggingFace models on Megatron)

python tools/checkpoint/util.py \
    --model-type GPT \
    --loader loader_bloom_hf \
    --saver saver_megatron \
    --target-tensor-parallel-size 8 \
    --target-pipeline-parallel-size 5 \
    --load-dir ./model_from_hf/Bloom-176B/ \
    --save-dir ./model_weights/Bloom-176B-v0.1-pt8-pp5/ \
    --tokenizer-model None \
    --params-dtype bf16

# Modify the ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh
python tools/checkpoint/util.py \
    --model-type GPT \
    --loader megatron \
    --saver megatron \
    --save-model-type save_huggingface_llama \
    --load-dir ./model_weights/Bloom-176B-v0.1-pt8-pp5/ \
    --target-tensor-parallel-size 1 \
    --target-pipeline-parallel-size 1 \
    --embed-layernorm \
    --params-dtype bf16 \
    --save-dir ./model_from_hf/Bloom-176B/   # <-- Fill in the original HF model path here, new weights will be saved in ./model_from_hf/Bloom-176B/mg2hg/

Prepare dataset

Download the bloom-176b datasets from here

# download datasets
cd dataset/
wget https://huggingface.co/datasets/tatsu-lab/alpaca/resolve/main/data/train-00000-of-00001-a09b74b3ef9c3b56.parquet
cd ..

# process datasets  
mkdir ./dataset/Bloom-176B/
python ./tools/preprocess_data.py \
  --input ./dataset/train-00000-of-00001-a09b74b3ef9c3b56.parquet \
  --tokenizer-name-or-path ./model_from_hf/Bloom-176B/ \
  --output-prefix ./dataset/Bloom-176B/alpaca \
  --workers 4 \
  --log-interval 1000 \
  --tokenizer-type PretrainedFromHF

Config Bloom-176B pre-training script(Bloom-176B does not support Flash Attention): examples/bloom/pretrain_bloom_176b.sh

# modify MASTER_ADDR to the IP address of the master node in the cluster.
# the master node is localhost, and the other nodes are the IP address of the master node
MASTER_ADDR=localhost

# modify the rank number of a node. The rank number of the master node is 0, and the rank number of other nodes increases in ascending order.
NODE_RANK=0

# modify the datasets path and tokenizer path
TOKENIZER_NAME_OR_PATH=./model_from_hf/Bloom-176B/
DATA_PATH=./dataset/Bloom-176B/alpaca_text_document

Launch Bloom-176B pre-training script: examples/bloom/pretrain_bloom_176b.sh

Run the examples/bloom/pretrain_bloom_176b.sh on all nodes in the cluster.

bash examples/bloom/pretrain_bloom_176b.sh

Performance

Machine Performance

The performance of Bloom-176B in Ascend NPU and Reference:

Devices	Model	total iterations	throughput rate (tokens/s/p)
NPUs	Bloom-176B	1000	100
Reference	Bloom-176B	NA	107

Inference Bloom 176B

Config Bloom-176B inference script: tasks/inference/generate_bloom_176b_ptd.sh

# modify the script according to your own ascend-toolkit path
source /usr/local/Ascend/ascend-toolkit/set_env.sh 
 
# modify script model path and tokenizer path
CHECKPOINT="./model_weights/Bloom-176B-v0.1-tp8-pp5/"
TOKENIZER_PATH="./model_from_hf/Bloom-176B/"

Launch Bloom-176B inference script: tasks/inference/generate_bloom_176b_ptd.sh Bloom-176b needs 5 machines to inference, so you need to convert a new model, set tp=8, pp=5

bash tasks/inference/generate_bloom_176b_ptd.sh

Some inference samples are as follows:

Evaluation Bloom 176B

Config Bloom-176B evaluation script: tasks/evaluation/evaluate_bloom_176B_ptd.sh

source /usr/local/Ascend/ascend-toolkit/set_env.sh 

# modify script model path and tokenizer path
CHECKPOINT="./model_weights/Bloom-176B-v0.1-tp8-pp5/"
TOKENIZER_PATH="./model_from_hf/Bloom-176B/"
# configure task and data path
DATA_PATH="your dataset path"
TASK="your task"

Launch Bloom-176B evaluation script:

bash tasks/evaluation/evaluate_bloom_176B_ptd.sh

Evaluation results

dataset	reference_acc	NPU acc
boolq	/	0.645

13 KiB Raw Blame History

Bloom \color{black}{\rm\tiny{【Model}} \color{black}{\rm\tiny{contributed}} \color{black}{\rm\tiny{by}} \color{black}{\rm\tiny{Ascend】}}

Bloom-7B

Training

Script

Performance

Machine performance

Inference Bloom-7B

Evaluation Bloom-7B

Bloom-176B

Training

Script

Performance

Machine Performance

Inference Bloom 176B

Evaluation Bloom 176B

13 KiB

Raw Blame History

Bloom `\color{black}{\rm\tiny{【Model}}` `\color{black}{\rm\tiny{contributed}}` `\color{black}{\rm\tiny{by}}` `\color{black}{\rm\tiny{Ascend】}}`