ATLAS provides guidance on training effective multilingual models
Researchers introduced ATLAS, a new scaling law for massively multilingual language models, which provides guidance on mixing data and training models for languages beyond English. The study spanned 774 training runs across 10M-8B parameter models, including data from 400+ languages and evaluations in 48 languages. ATLAS estimates synergies between 1,400 pairs of languages and introduces adaptive transfer scaling laws for building multilingual models.
ATLAS provides a practical approach to determining optimal model size, data volume, and language mixtures for training multilingual models. This can help developers efficiently balance the mix of languages in training data with model size, leading to more effective and efficient model development.
Use ATLAS to optimize the training of a multilingual language model for a specific set of languages, such as Spanish, French, and German, and evaluate the performance improvement.
Tags
Signals by role
Tools mentioned