SmolVLA: Efficient Vision Language Action Model – LeRobot
SmolVLA: Efficient Vision Language Action Model – LeRobot

“SmolVLA represents a significant step towards democratizing advanced robotics. It is a Vision-Language-Action model engineered for efficiency, enabling complex robotic tasks to be performed using systems with modest computational resources. The “Smol” designation refers to its compact architecture, approximately 450 million parameters, a figure notably smaller than many contemporary VLA models, which can scale into the billions of parameters. This reduced scale is pivotal, as it allows SmolVLA to be trained and deployed on consumer-grade GPUs, thereby lowering the barrier to entry for researchers, developers, and educational institutions. The overarching goal is to foster broader innovation in intelligent robotics by providing a capable yet accessible VLA framework…”
Source: learnopencv.com/smolvla-lerobot-vision-language-action-model/