メインコンテンツへスキップ

Gemini 3.5

Gemini 3.5
A complete guide to the Gemini 3.5 series — starting with Gemini 3.5 Flash (I/O 2026). Outperforms Gemini 3.1 Pro on coding and agentic benchmarks, 4x faster, and Gemini 3.5 Pro confirmed coming next month.

Gemini 3.5 is Google’s next model family for coding, multimodal work, and agentic execution. The first public model, Gemini 3.5 Flash, changes the usual Flash versus Pro tradeoff: it beats Gemini 3.1 Pro on Terminal-Bench 2.1, MCP Atlas, Finance Agent v2, and several other agent evaluations.

That makes Gemini 3.5 Flash more than a fast, low-cost option. For many development workflows, it can serve as the primary execution model when the work is clearly scoped, testable, and tool-driven.

Verdent helps teams evaluate that choice against real repository tasks instead of abstract scores. Plan-First Intelligence turns the request into success criteria, tests, and review points, while Parallel Power runs model implementations in isolated workspaces so teams can compare quality, speed, and cost.

For buyers and engineering leads, the practical question is no longer whether Flash is merely cheaper than Pro. It is where Gemini 3.5 Flash is strong enough to ship work, where Gemini 3.1 Pro still belongs in review or migration flows, and how to prepare for Gemini 3.5 Pro as public details arrive.

What Is Gemini 3.5

Gemini 3.5 is Google’s model family for coding, multimodal tasks, and agentic execution. The series starts with Gemini 3.5 Flash, which Google introduced at I/O 2026 on May 19, 2026.

Gemini 3.5 Flash became the default model in the Gemini app and Search AI Mode. Developers can access it through Google AI Studio and the Gemini API.

The model is designed for tool use, subagent coordination, and longer reasoning workflows. It can inspect context, call tools, continue after tool results, and preserve reasoning context across compatible requests.

For software teams, the important shift is practical: Gemini 3.5 Flash can support repository analysis, implementation, testing, debugging, and correction in one workflow. It is not only a chat model or a draft generator.

This makes Gemini 3.5 most relevant for teams that want to compare fast agent execution against stronger reasoning models. The right evaluation should measure completed work, not only first-response quality.

Gemini 3.5 Flash Overview

Gemini 3.5 Flash combines a 1,048,576-token input context window, high output throughput, tool-oriented behavior, and four configurable thinking levels.

SpecificationGemini 3.5 Flash
Release dateMay 19, 2026
Model IDgemini-3.5-flash
Input context1,048,576 tokens
Maximum output65,536 tokens
Knowledge cutoffJanuary 2025
Standard input price$1.50 per 1M tokens
Standard output price$9 per 1M tokens
Cached input price$0.15 per 1M tokens
Batch input price$0.75 per 1M tokens
Batch output price$4.50 per 1M tokens
Computer Use API toolNot supported

Google reports up to four times higher output-token throughput. This measures output tokens per second, not total task completion time. A coding task can still be limited by dependency installation, test runtime, tool latency, review time, and failed repair loops.

Thinking Levels

Gemini 3.5 Flash supports four thinking levels:

  • minimal for simple classification, rewriting, and direct answers
  • low for routine code edits and short debugging tasks
  • medium for general agent tasks, repository inspection, and multi-step changes
  • high for architecture choices, difficult debugging, and higher-risk reasoning

The default setting is medium.

Lower thinking levels can reduce latency and token use. Higher thinking levels can improve planning quality, but they can also increase token consumption. Teams should record thinking level, input size, output size, tool output, retries, and test failures when they compare models.

Context caching can reduce repeated input costs when the same repository or documentation appears across requests. Batch processing can reduce asynchronous workload costs by 50%. These features are most useful when the team has repeatable tasks, stable context, and enough volume to justify measurement.

View official Gemini API pricing.

For production planning, treat the published prices and limits as inputs to an evaluation rather than a final cost estimate. Long-context prompts, preserved reasoning, tool output, retries, and higher thinking levels can all change the real token profile of a coding task.

Key Benchmarks vs 3.1 Pro

Gemini 3.5 Flash leads Gemini 3.1 Pro on several coding and agentic benchmarks.

BenchmarkGemini 3.5 FlashGemini 3.1 Pro
Terminal-Bench 2.176.2%70.3%
SWE-Bench Pro Public55.1%54.2%
MCP Atlas83.6%78.2%
OSWorld-Verified78.4%76.2%
Finance Agent v257.9%43.0%
GDPval-AA1656 Elo1314 Elo
CharXiv Reasoning84.2%83.3%

Terminal-Bench 2.1 measures terminal-based agent tasks. The model must plan, run commands, inspect results, and continue toward a goal. Gemini 3.5 Flash scores 76.2%, compared with 70.3% for Gemini 3.1 Pro.

Finance Agent v2 shows the largest percentage-point lead in this set. Gemini 3.5 Flash scores 57.9%, while Gemini 3.1 Pro scores 43.0%, a 14.9-point difference.

These numbers make Gemini 3.5 Flash worth testing for tool-heavy development workflows. They do not prove it will outperform Gemini 3.1 Pro on every repository, framework, or production codebase.

A good internal benchmark should include the same repository, same prompt, same acceptance criteria, same time limit, same model access path, and same review standard. Track whether the model changed the right files, passed the right tests, avoided regressions, and produced code that a reviewer would merge.

Verdent Plan Mode helps define that repeatable test before any model starts coding. Teams can compare correctness, test results, time, cost, diff quality, and reviewer findings across models.

From Leaderboard Lead to Repository Proof

A benchmark lead is a reason to test, not a reason to standardize. Blind AI starts coding before the task, constraints, and acceptance criteria are clear.

Verdent reported a 76.1% resolution rate on SWE-bench Verified. That result comes from a broader workflow: plan the task, execute the change, run checks, repair failures, and review the final patch.

For model selection, the useful question is not whether Gemini 3.5 Flash is impressive on a leaderboard. The useful question is whether it completes your real tickets with lower cost, shorter cycle time, and acceptable review quality.

Gemini 3.5 Pro: What We Know

Google announced Gemini 3.5 Pro at I/O 2026 and said the model was already in internal use. Google gave a June 2026 target window but did not provide an exact public release date.

As of June 12, 2026, no public API model ID is available for Gemini 3.5 Pro. Final pricing, context limits, throughput, and tool support are also unconfirmed.

Gemini 3.5 Pro is expected to focus on deeper reasoning. Gemini 3.5 Flash focuses more on speed, cost control, and agentic throughput. The likely evaluation question is whether Pro adds enough reasoning quality to justify higher latency or cost on complex work.

Teams can prepare before launch by defining the tasks they will use to compare Flash and Pro. Good candidates include architecture-sensitive refactors, bug fixes with unclear root causes, multi-file feature work, and changes where a weaker model often passes tests but misses hidden requirements.

The evaluation should include clear acceptance criteria, required checks, cost limits, and a reviewer pass. If Gemini 3.5 Pro becomes available, teams can run the same task through both models and compare completed work rather than relying on marketing claims.

Track model updates in the Verdent changelog.

Before committing to Pro, compare the current workflow against Gemini 2.5 Pro to spot differences in reasoning depth and tool behavior.

For source-level validation, the Google blog is worth checking after you understand the Gemini 3.5 workflow described here.

3.5 Flash vs 3 Flash vs 3.1 Pro

Gemini 3.5 Flash replaces Gemini 3 Flash for most new workloads. Gemini 3.1 Pro remains useful when deeper reasoning matters more than speed.

Evaluation3.5 Flash3 Flash3.1 Pro
Terminal-Bench 2.176.2%58.0%70.3%
Finance Agent v257.9%42.6%43.0%
Humanity’s Last Exam40.2%33.7%44.4%
MRCR v2 at 128K77.3%67.2%84.9%

Choose Gemini 3.5 Flash for fast agent loops, coding tasks, tool use, repository inspection, and workflows where the model can run checks and repair failures. Its benchmark profile is strongest when execution speed and tool coordination matter.

Choose Gemini 3.1 Pro for harder reasoning tasks. It leads 3.5 Flash on Humanity’s Last Exam and MRCR v2 at 128K, which makes it a better candidate for tasks that require deeper abstraction, long-context reasoning, or careful synthesis.

Use Gemini 3 Flash mainly as a legacy option. It may still matter for existing workflows, compatibility constraints, or features not yet available in Gemini 3.5 Flash.

Many teams should not choose only one model. A practical pattern is to use Gemini 3.5 Flash as the implementation model and use a stronger reasoning model for planning or review. This separates fast execution from high-stakes judgment.

The best choice depends on the task type, failure cost, latency target, and review process. If a task is easy to test, Flash may be enough. If a task requires judgment across ambiguous requirements, Pro-class reasoning may still be worth the added cost.

Teams still maintaining older stacks can compare migration tradeoffs with Gemini 3 Flash before standardizing on 3.5 Flash.

When details such as limits or setup steps matter, Google AI documentation can help confirm the latest implementation surface.

Agentic Capabilities

Gemini 3.5 Flash is designed for multi-step work. It can continue after tool calls, read test results, revise code, and move through a task without requiring a human to restate each step.

Long-Running Tasks

The model can move from analysis to implementation, then run tests and correct failures. This matters for development work because many useful changes are not single-prompt tasks.

A strong long-running workflow usually includes repository inspection, plan creation, file edits, test execution, failure analysis, and a final review. Gemini 3.5 Flash is a better fit when those steps are explicit and the success criteria are measurable.

The large context window supports bigger repositories and related documentation. Teams should still avoid stuffing the full repository into every request. Relevant files, failing tests, architecture notes, and coding conventions usually matter more than raw volume.

Subagent Deployment

The model can assign focused tasks to subagents. Each subagent can handle a separate workstream, such as inspecting one module, proposing tests, updating documentation, or validating an implementation path.

Subagents work best when tasks have clear boundaries. They are less effective when every task edits the same files, depends on the same unresolved decision, or requires constant cross-worker coordination.

Verdent’s parallel workflow is built around this constraint. Independent workers can explore separate implementations or subtasks while workspace isolation keeps their changes from colliding too early.

Faster Tool Loops

Higher output throughput can shorten code generation and repeated tool interactions. It can make the model feel more responsive during inspect-edit-test cycles.

Total execution time still depends on external tools. Dependency installation, type checks, unit tests, integration tests, linters, and build steps can dominate the wall-clock time of a coding task.

For evaluation, measure end-to-end completion time, not only output speed. A model that writes code quickly but needs many repair loops may be slower than a model that produces a smaller, cleaner first patch.

Thought Preservation

Compatible requests can preserve reasoning context. This can reduce repeated analysis across related steps, especially when the model needs to remember constraints, prior tool results, or unresolved risks.

Preserved context still consumes tokens. Teams should keep useful context and discard stale details, failed branches, noisy logs, and obsolete assumptions.

Verdent applies these capabilities inside a structured workflow. Verdent Manager divides goals into smaller tasks. Workspace Isolation gives each task a separate Git worktree. Reviewer checks the result before integration.

Gemini 3.5 Flash is a stronger fit when the task can be decomposed into clear steps: inspect the codebase, make a bounded change, run tests, read failures, and repair. It is a weaker fit for vague product direction, tightly coupled edits across many owners, or work where the model cannot safely run the checks needed to verify its own changes.

The goal is not just faster code generation. The goal is faster work that remains testable, reviewable, and safe to merge.

For workloads that need stronger reasoning before long tool chains, Gemini 3 Pro offers a useful comparison point for deciding where extra latency is worthwhile.

Before you budget a real project around Gemini 3.5, compare the claims here with Google DeepMind.

Using Gemini 3.5 in Verdent

Gemini 3.5 Flash is not in Verdent’s published built-in model list.

Verdent supports BYOK for Anthropic, OpenAI, and OpenRouter. Direct Google AI Studio keys are not listed. If OpenRouter provides gemini-3.5-flash for your account, you may be able to configure it through Verdent’s model settings.

Open Settings -> Models -> Configure Models, select OpenRouter, add your key, and enable the model if it appears for your account.

Read the Verdent BYOK guide.

A practical evaluation has four steps:

  1. Define the task. Set the scope, files or modules in play, expected behavior, and acceptance criteria.
  2. Isolate the work. Give each model a separate workspace so changes do not overwrite each other.
  3. Keep conditions equal. Use the same repository state, prompt, time limit, tools, and required checks.
  4. Review the result. Compare tests, diff quality, regressions, time, token cost, and reviewer feedback.

Use a real development task rather than a simple demo prompt. A good test may touch several files, require existing conventions, trigger at least one automated check, and expose whether the model can recover from failures.

Verdent keeps each implementation separate and lets independent models review final changes. This creates a clearer model comparison because it measures completed work instead of the first response.

When comparing Gemini 3.5 Flash with a Pro model, keep the evaluation small enough to review but realistic enough to expose failure modes. A good test includes project conventions, automated checks, and a reviewer pass that looks for hidden regressions rather than only whether the patch compiles.

The final decision should be based on merge-ready output. If Flash finishes faster but creates review risk, use it for bounded implementation tasks. If Pro finds issues that Flash misses, use Pro for planning, review, or higher-risk work.

Frequently Asked Questions

How much does Gemini 3.5 Flash cost?

Gemini 3.5 Flash standard pricing is $1.50 per million input tokens and $9 per million output tokens. Cached input costs $0.15 per million tokens. Batch pricing is $0.75 per million input tokens and $4.50 per million output tokens. Real task cost can vary based on context length, thinking level, tool output, retries, and test-repair loops.

What context window does it support?

Gemini 3.5 Flash supports a 1,048,576-token input context window and up to 65,536 output tokens. Large context helps with repository analysis and documentation-heavy tasks, but teams should still include the most relevant files, errors, tests, and constraints rather than sending unnecessary context.

Does it support Computer Use?

Gemini 3.5 Flash scored 78.4% on OSWorld-Verified, but that benchmark used an external control harness. Google’s native Computer Use API tool is not supported for this model according to the supplied specifications.

When will Gemini 3.5 Pro launch?

Google announced a June 2026 target window for Gemini 3.5 Pro but did not announce an exact date. As of June 12, 2026, there is no public API model ID, final pricing, or confirmed context limit for Gemini 3.5 Pro.

Can I use Gemini 3.5 Flash in Verdent?

Gemini 3.5 Flash is not listed as a built-in Verdent model at this time. Verdent supports BYOK for Anthropic, OpenAI, and OpenRouter. If OpenRouter provides gemini-3.5-flash for your account, check Verdent’s model configuration settings to see whether you can enable it.

Related Model Guides
Put Flash and Pro on the Same Ticket

Give Gemini 3.5 Flash and a Pro model the same repository, task, acceptance criteria, tests, and time limit. Keep their work isolated, then compare the final patches for correctness, review quality, regressions, cost, and speed. Verdent can run that comparison as a structured workflow instead of turning the developer into the coordinator for every step.

Next Step

Compare Gemini 3.5 Flash and Pro

Run the same repository task, tests, and time limit across both models while keeping their work isolated. Use the results to choose the right Gemini model for your next build.