

Agentic Vision in Gemini 3 Flash introduces an agentic approach to image understanding that treats vision as an active investigation rather than a passive observation. This capability combines visual reasoning with code execution to enable the model to formulate plans and manipulate images step-by-step, grounding its answers in visual evidence rather than probabilistic guessing.
Key features include the ability to zoom and inspect fine-grained details, annotate images by drawing bounding boxes and labels, and perform visual math and plotting through Python code execution. The system operates through an agentic Think, Act, Observe loop where the model analyzes queries, generates and executes code to manipulate images, and observes the transformed results before providing final responses. Code execution with Gemini 3 Flash delivers a consistent 5-10% quality boost across most vision benchmarks.
The unique approach involves treating image understanding as an active investigation process rather than a single static glance. When detecting fine-grained details, Gemini 3 Flash is trained to implicitly zoom in and inspect specific areas. The model formulates multi-step plans to crop, rotate, annotate, analyze, and manipulate images using Python code execution, then appends the transformed images back into its context window for better understanding.
Benefits include improved accuracy in visual tasks, with examples showing 5% accuracy improvements in building plan validation platforms. Use cases demonstrated include inspecting high-resolution building plans for code compliance, counting objects by drawing bounding boxes, and parsing complex data tables to generate professional visualizations. The system replaces probabilistic guessing with verifiable execution for more reliable results.
Target users include developers building AI applications, with integrations available via the Gemini API in Google AI Studio and Vertex AI. The capability is also rolling out in the Gemini app, and developers can experiment with code execution tools in the AI Studio Playground. Future plans include expanding implicit code-driven behaviors, adding more tools like web search, and extending the capability to other model sizes.
admin
Developers building AI applications who need advanced visual reasoning capabilities, including those working on building plan validation platforms, visual data analysis tools, and image processing applications. The product targets users who require grounded visual understanding with code execution capabilities available through the Gemini API in Google AI Studio and Vertex AI.