Pavel (Pasha) Simakov - Meet the Agent: The Brain Behind Gemini CLI

Meet the Agent: The Brain Behind Gemini CLI

by Pasha Simakov, 2025-07-18

Understanding a large and complex codebase is one of the hardest challenges in software engineering. Even with experience, it often takes significant time to discover how components interact, what rules they follow, and where critical logic resides. This is where AI-powered tools, combined with the right heuristics, can make a real difference.

I’m Pasha Simakov, a Google software engineer passionate about building intelligent systems that help developers work faster and smarter. Over the years, the meaning of “intelligent” has evolved—today, I focus on leveraging LLM-powered tools like Gemini CLI to improve code comprehension, streamline workflows, and increase developer productivity. I also share these techniques with others through 1-on-1 sessions and group masterclasses.

To demonstrate this in action, let’s take a simple yet challenging question: How do Gemini CLI agents work? With my local tools set up for full-text search grounding, I can quickly answer this question by exploring the codebase. This article shows exactly how I approached it and what I found.

In this article, I not only want to show how Gemini CLI agents work, but also how the CLI is extensible -- making it adaptable to a wide range of engineering tasks. With my local tools set up for full-text search grounding, I can quickly answer these questions by exploring the codebase. This article walks through that process and the insights I discovered.

Let’s Begin

I’ve already set up my local tools for full-text search grounding. Now, let’s launch Gemini CLI and ask about its agents.

GEMINI CLI PROMPT: I'm interested in understanding how Gemini CLI works: what LLM agents it uses and what those agents do. Use full-text-search("agent") for grounding.

That's a great question. Based on the files, it appears Gemini CLI doesn't use multiple "agents" but has a central "agent" concept. This "agent" is the core AI-powered logic that drives the CLI's interactive capabilities.

To understand what the agent is and what it does, let's look at two key files that your search revealed:

gemini-cli/packages/cli/src/acp/acp.ts: This file likely defines the technical interface for the agent.
gemini-cli/packages/core/src/core/prompts.ts: This file likely contains the system prompt, which is the set of instructions that tells the agent how to behave.

The Gemini CLI Agent Interface in TypeScript

Okay, the file acp.ts gives us the technical definition of the agent. "ACP" likely stands for "Agent-Client Protocol". This file defines the communication layer between the CLI's user interface (the "client") and the AI logic (the "agent").

The Agent interface is defined at the very end of the file:

export interface Agent {
  // Initializes the agent's state.
  initialize(params: InitializeParams): Promise<InitializeResponse>;

  // Begins the authentication process.
  authenticate(): Promise<void>;

  // Allows the user to send a message to the agent.
  sendUserMessage(params: SendUserMessageParams): Promise<void>;

  // Cancels the current generation.
  cancelSendMessage(): Promise<void>;
}

This tells us that from a technical standpoint, the agent is an object with four main capabilities: initializing, authenticating, sending/receiving messages, and canceling operations.

The Gemini CLI System Prompt (aka System Instructions)

However, this only tells us how the CLI interacts with the agent, not what the agent does. The real "personality" and logic of the agent comes from its system prompt. Your search also found prompts.ts, which should contain these instructions. Let's examine that file.

This file is the "brain" of the agent. It contains the system prompt, which is a detailed set of instructions that dictates the agent's behavior, goals, and limitations.

Here’s a summary of what the agent does, based on these instructions:

Core Identity

You are an interactive CLI agent specializing in software engineering tasks. This is its fundamental purpose. It's designed to help with coding.

Core Mandates (Its Rules)

The agent is instructed to follow a strict set of rules to ensure it's safe and effective:

Follow Conventions: It must rigorously analyze and adhere to the existing project's conventions, style, and structure. It won't just add code in any style; it tries to mimic the surrounding code.
Verify Dependencies: It never assumes a library is available. It must check package.json, requirements.txt, etc., before using a new dependency.
Be Proactive but Cautious: It will try to fulfill your request thoroughly but will ask for clarification if the request is ambiguous or goes beyond the original scope.
Use Absolute Paths: It must always resolve file paths to their absolute form before using file system tools.
Explain Critical Commands: For safety, it must explain any shell command that modifies the file system before running it.

Primary Workflow (How It Works)

For any software engineering task (like fixing a bug or adding a feature), the agent follows a specific 4-step process:

Understand: It uses tools like search_file_content and glob to explore the codebase, understand file structures, and find relevant code. It reads files to get context.
Plan: It creates a step-by-step plan to address your request and shares it with you.
Implement: It uses tools like replace, write_file, and run_shell_command to execute the plan and modify the code.
Verify: After making changes, it runs the project's tests and any available linting or type-checking commands (npm run test, tsc, etc.) to ensure the changes are correct and meet the project's quality standards.

In Summary

Combining both files, we get a complete picture:

The Agent interface in acp.ts is the technical "socket" that plugs the AI logic into the CLI application.
The system prompt in prompts.ts is the "mind" of the agent. It defines its personality, rules, and workflow, instructing it to be a careful, context-aware, and methodical software engineering assistant that uses a variety of tools to understand, plan, implement, and verify changes to your code.

How It Works

This article highlighted full-text search grounding as just one of many techniques available for enhancing LLM-powered code exploration. By combining such methods with structured workflows, engineers can significantly improve the accuracy and efficiency of AI-assisted development on large, production-grade projects. To dive deeper into these advanced techniques and practical applications, consider joining my Gemini CLI Masterclass.

PS: Written with and about CLI Version 0.1.12

Gemini CLI Masterclass Articles

Articles in the Gemini CLI Masterclass series:

Architecting AI Memory: Lessons from Gemini CLI (2025/8/13) original
Inside the Mind: Gemini CLI's System Prompts Deep Dive (2025/7/19) original
Meet the Agent: The Brain Behind Gemini CLI (2025/7/18) original

Want to learn more? Need help?

If you'd like to learn these techniques in depth, join my
Gemini CLI Masterclass

Note: Gemini CLI Master class is not a Google product. It's not developed, funded, supported, or approved by Google LLC. Note: Gemini CLI is a Google product; references here are for educational and practical AI development purposes.