Microsoft drops Florence-2, a unified model to handle a variety of vision tasks

“Today, Microsoft’s Az u re AI team dropped a new vision foundation model called Florence-2 on Hugging Face.

Available under a permissive MIT license, the model can handle a variety of vision and vision-language tasks using a unified, prompt-based representation. It comes in two sizes — 232M and 771M parameters — and already excels at tasks such as captioning, object detection, visual grounding and segmentation, performing on par or better than many large vision models out there.

While the real-world performance of the model is yet to be tested, the work is expected to give enterprises a single, unified approach to handle different types of vision applications. This will save investments on separate task-specific vision models that fail to go beyond their primary function, without extensive fine-tuning…”

Source: venturebeat.com/ai/microsoft-drops-florence-2-a-unified-model-to-handle-a-variety-of-vision-tasks/

Paper: https://arxiv.org/pdf/2311.06242

Source: https://huggingface.co/papers/2311.06242

June 23, 2024

0 Comments

Inline Feedbacks

View all comments

Request a Quote

Log In

Microsoft drops Florence-2, a unified model to handle a variety of vision tasks

Microsoft drops Florence-2, a unified model to handle a variety of vision tasks

Microsoft drops Florence-2, a unified model to handle a variety of vision tasks