Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

0

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model | NVIDIA Developer Blog

“We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a joint effort between Microsoft and NVIDIA to advance the state of the art in AI for natural language generation.

As the successor to Turing NLG 17B and Megatron-LM, MT-NLG has 3x the number of parameters compared to the existing largest model of this type and demonstrates unmatched accuracy in a broad set of natural language tasks such as:

  • Completion prediction
  • Reading comprehension
  • Commonsense reasoning
  • Natural language inferences
  • Word sense disambiguation

The 105-layer, transformer-based MT-NLG improved upon the prior state-of-the-art models in zero-, one-, and few-shot settings and set the new standard for large-scale language models in both model scale and quality…”

Source: developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/

October 15, 2021
0 Comments
Inline Feedbacks
View all comments

Subscribe to our Digest