Transformer models: an introduction and catalog — 2022 Edition
Transformer models: an introduction and catalog — 2022 Edition
“Transformers are a class of deep learning models that are defined by some architectural traits. They were first introduced in the now famous Attention is All you Need paper by Google researchers in 2017 (the paper has accumulated a whooping 38k citations in only 5 years) and associated blog post.
The Transformer architecture is a specific instance of the encoder-decoder models that had become popular just over the 2–3 years prior. Up until that point however, attention was just one of the mechanisms used by these models, which were mostly based on LSTM (Long Short Term Memory) and other RNN (Recurrent Neural Networks) variations. The key insight of the Transformers paper was that, as the title implies, attention could be used as the only mechanism to derive dependencies between input and output…”
Source: amatriain.net/blog/transformer-models-an-introduction-and-catalog-2d1e9039f376/