VinBERT

VinBERT is a combination of two powerful Vietnamese language models: Vintern-1b-v2 and PhoBERT. With VinBERT, we create a language model optimized to better serve applications in the Vietnamese language, including tasks such as text classification, entity extraction, and more.

Objectives

VinBERT leverages the strengths of Vintern-1b-v2 and PhoBERT, providing high efficiency and accuracy for Vietnamese NLP applications.
It supports distributed training on multiple GPUs and AWS Sagemaker infrastructure, optimizing time and resources.

Support training

cuda: Data parallelism and Model parallelism are supported with backend nccl
xla : Data parallelism are supported with backend xla

Requirements

An AWS account with access to Sagemaker.
An environment set up to interact with AWS CLI and Sagemaker.
You have quota to use ml.p4d.24xlarge and ml.trn1.32xlarge instances.

    pip install -r requirements.txt

Distributed Training on GPU (AWS Sagemaker `ml.p4d.24xlarge`)

Prepare the environment: pull docker image flash attn base from dockerhub: vantufit/flash-attn-cuda

docker pull vantufit/flash-attn-cuda

Run the job:
- Configure parameters such as instance type, number of GPUs, and batch size.
- Run the following command to initiate the job:
```
export INSTANCE=ml.p4d.24xlarge
python training.py
```

Training with Trainium (`ml.trn1.32xlarge`)

Run the job:

export INSTANCE=ml.trn1.32xlarge
python training.py

TODO: Monitoring Training

Implement Tensor parallelism with neuronx_distributed
Monitoring training process

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
src-cuda		src-cuda
src-xla		src-xla
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.base		Dockerfile.base
Dockerfile.training		Dockerfile.training
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VinBERT

Objectives

Support training

Requirements

Distributed Training on GPU (AWS Sagemaker `ml.p4d.24xlarge`)

Training with Trainium (`ml.trn1.32xlarge`)

TODO: Monitoring Training

About

Releases

Packages

Languages

vantu-fit/VinBERT

Folders and files

Latest commit

History

Repository files navigation

VinBERT

Objectives

Support training

Requirements

Distributed Training on GPU (AWS Sagemaker ml.p4d.24xlarge)

Training with Trainium (ml.trn1.32xlarge)

TODO: Monitoring Training

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Distributed Training on GPU (AWS Sagemaker `ml.p4d.24xlarge`)

Training with Trainium (`ml.trn1.32xlarge`)

Packages