A Multi-Axis Approach for Vision Transformer and MLP Models

0

A Multi-Axis Approach for Vision Transformer and MLP Models

A Multi-Axis Approach for Vision Transformer and MLP Models

“Convolutional neural networks have been the dominant machine learning architecture for computer vision since the introduction of AlexNet in 2012. Recently, inspired by the evolution of Transformers in natural language processingattention mechanisms have been prominently incorporated into vision models. These attention methods boost some parts of the input data while minimizing other parts so that the network can focus on small but important parts of the data. The Vision Transformer (ViT) has created a new landscape of model designs for computer vision that is completely free of convolution. ViT regards image patches as a sequence of words, and applies a Transformer encoder on top. When trained on sufficiently large datasets, ViT demonstrates compelling performance on image recognition…”

Source: https://research.google/blog/a-multi-axis-approach-for-vision-transformer-and-mlp-models/

Paper: https://arxiv.org/pdf/2204.01697

Paper: https://openaccess.thecvf.com/content/CVPR2022/papers/Tu_MAXIM_Multi-Axis_MLP_for_Image_Processing_CVPR_2022_paper.pdf

August 31, 2025
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments

Subscribe to our Digest