Mastering LLM Techniques: Inference Optimization | NVIDIA Technical Blog

“Most of the popular decoder-only LLMs (GPT-3, for example) are pretrained on the causal modeling objective, essentially as next-word predictors. These LLMs take a series of tokens as inputs, and generate subsequent tokens autoregressively until they meet a stopping criteria (a limit on the number of tokens to generate or a list of stop words, for example) or until it generates a special <end> token marking the end of generation. This process involves two phases: the prefill phase and the decode phase.

Note that tokens are the atomic parts of language that a model processes. One token is approximately four English characters. All inputs in natural language are converted to tokens before inputting into the model…”

Source: developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/

July 2, 2024

0 Comments

Inline Feedbacks

View all comments

Request a Quote

Log In

Mastering LLM Techniques: Inference Optimization | NVIDIA Technical Blog

Mastering LLM Techniques: Inference Optimization | NVIDIA Technical Blog

Mastering LLM Techniques: Inference Optimization | NVIDIA Technical Blog