Build VLM-Powered Visual AI Agents Using NVIDIA NIM and NVIDIA VIA Microservices

0

Build VLM-Powered Visual AI Agents Using NVIDIA NIM and NVIDIA VIA Microservices

Build VLM-Powered Visual AI Agents Using NVIDIA NIM and NVIDIA VIA Microservices | NVIDIA Technical Blog

“Traditional video analytics applications and their development workflow are typically built on fixed-function, limited models that are designed to detect and identify only a select set of predefined objects.

With generative AI, NVIDIA NIM microservices, and foundation models, you can now build applications with fewer models that have broad perception and rich contextual understanding.

The new class of generative AI models, vision language models (VLM), powers visual AI agents that can understand natural language prompts and perform visual question answering. These agents unlock entirely application possibilities for a wide range of industries. They significantly streamline app development workflows and also deliver transformative and new perception capabilities, such as image or video summarization, interactive visual Q&A, and visual alerts.

These visual AI agents will be deployed throughout factories, warehouses, retail stores, airports, traffic intersections, and more. They’ll help operations teams make better decisions using richer insights generated from natural interactions.

The NVIDIA NIM and NVIDIA VIA microservices are here to accelerate the development of visual AI agents. In this post, we show you how to seamlessly build an AI agent with these two technologies with a summarization microservice to help process large amounts of videos with VLMs and NIM microservices and produce curated summaries.

NVIDIA VIA uses the OpenAI GPT-4o model as the VLM by default…”

Source: developer.nvidia.com/blog/build-vlm-powered-visual-ai-agents-using-nvidia-nim-and-nvidia-via-microservices

July 31, 2024
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments

Subscribe to our Digest