Gemini CLI Masterclass by Pasha Simakov

Architecting AI Memory: Lessons from Gemini CLI

by Pasha Simakov, 2025-08-13

Understanding a large and complex codebase is one of the hardest challenges in software engineering. Even with experience, it often takes significant time to discover how components interact, what rules they follow, and where critical logic resides. This is where AI-powered tools, combined with the right heuristics, can make a real difference.

I’m Pasha Simakov, a Google software engineer passionate about building intelligent systems that help developers work faster and smarter. Over the years, the meaning of “intelligent” has evolved—today, I focus on leveraging LLM-powered tools like Gemini CLI to improve code comprehension, streamline workflows, and increase developer productivity. I also share these techniques with others through 1-on-1 sessions and group masterclasses.

One of the things I love most about the Gemini CLI is how extensible it is. Out of the box, it’s already a powerful tool for interacting with LLMs—but the real magic happens when you start adding your own extensions. With just a handful of custom commands, prompts, data connectors, and RAG integrations, you can push it far beyond and make it feel like it was built for your exact workflow.

In this article, I set out to answer one specific question: How does Gemini CLI manage long conversations and stay within an LLM’s context limits? It’s a small slice of the system, but one that reveals just how thoughtfully the tool is designed—and why that matters for anyone building with LLMs.

Introduction: The Challenge of LLM Conversational Memory

Spend enough time working with conversational AI and you start to notice something: the chat history is more than a transcript — it’s state. And like any other form of state in software systems, it’s a living, mutable resource that must be managed with care.

An ever-expanding context window is both a blessing and a curse. On one hand, it enables a model to carry the thread of a conversation over long, complex exchanges. On the other, it increases processing costs, adds latency, and risks losing clarity as irrelevant material accumulates. The more noise in the history, the harder it becomes for the model to focus on what truly matters.

The authors of the Gemini CLI seem to have understood this challenge deeply. They didn’t offer a single, one-size-fits-all mechanism for managing state. Instead, they built a spectrum of controls — tools that match different situations, levels of urgency, and styles of working.

  • /clear is the decisive break: a complete erasure of history.
  • /chat is for deliberate preservation and restoration, the equivalent of a well-organized archive.
  • /compress is the nuanced option, distilling the past into a focused, structured memory snapshot.

These aren’t just “features” — they’re philosophies. Together, they form a comprehensive approach to context curation that goes beyond merely logging conversation turns. The toolkit recognizes that developers have different workflows, different needs, and different tolerances for risk when it comes to managing the state of their interactions.

The Hard Reset: Wiping the Slate Clean with /clear

Every developer has had the experience: you’re deep in a debugging conversation with your AI assistant, working through a tricky bug. Then, midstream, you pivot to something else — perhaps designing a new feature. Yet the model still insists on anchoring its thinking in the earlier debugging context. It answers your design question as if you’re still knee-deep in the same stack trace from ten minutes ago.

This is the perfect moment for /clear.

The command’s implementation is elegantly minimal. In packages/cli/src/ui/commands/clearCommand.ts, the action function makes two calls:

  1. Backend reset: geminiClient.resetChat() — defined in packages/core/src/core/client.ts — destroys all backend conversational state. That means every turn, token, and tool call vanishes. It’s the equivalent of tearing out all the pages of your session notebook and tossing them away.
  2. Frontend reset: context.ui.clear() — a user interface cleanup so you see a pristine chat area with no visual remnants of the past exchange.

This isn’t a feature you reach for in nuanced cases. It’s for clean breaks — starting a completely new and unrelated task, or recovering from a conversational dead end. It’s also valuable for mental hygiene: a way to declutter your own thinking space. When your mind feels tethered to an old problem, /clear lets you sever that rope entirely.

Some might see it as crude — it discards everything without exception — but that’s exactly the point. In state management, sometimes the most powerful tool is the simplest one: a big, red reset button.

The Manual Archive: Saving and Swapping Contexts with /chat

Human memory is imprecise, fading and distorting over time. AI conversation history is not — it’s a deterministic data structure, exact in its record of every exchange. This means it can be saved, stored, and reloaded without loss. The /chat command suite takes this potential and turns it into an accessible, everyday tool for developers.

At its top level, the command entry point lives in packages/cli/src/ui/commands/chatCommand.ts. Its job is to parse the subcommand — save, resume, list, and others — and dispatch the request to the right handler. This routing is clean and intentional, keeping parsing concerns separate from the underlying logic.

The heavy lifting happens in the CheckpointManager (packages/core/src/checkpoint/manager.ts). When you run: /chat save my-feature

The save method:

  • Retrieves the current conversation history from the GeminiClient.
  • Serializes it into a checkpoint file.
  • Writes it to the user’s configuration directory with the given tag.

Later, when you execute: /chat resume my-feature

The load method reads the file back, deserializes it, and calls setHistory() on the GeminiClient. This replaces the active session’s context wholesale — as if you had never left that conversation.

This architecture enables powerful workflows:

  • Branching: Save the current state before exploring a risky tangent. If it doesn’t work out, resume the original branch instantly.
  • Project isolation: Keep separate states for different projects, switching between them without interference.
  • Session preservation: Capture a productive discussion at its peak and return to it days or weeks later without loss.

If /clear is the reset button, /chat is the filing cabinet — neat, organized, and ready to serve the exact state you need at any time.

The Intelligent Continuation: State Distillation with /compress

While /clear and /chat are explicit actions — either destroying or preserving the entirety of history — /compress is about refinement. It acknowledges that a long conversation has both essential and expendable parts. The trick is to keep the essentials and distill the rest into something lighter but still useful.

The logic for triggering this process lives in tryCompressChat (packages/core/src/core/client.ts). There are two ways it can start:

  • Manual trigger: At any time, you can type /compress. This is a way of saying, “I think the context is too bloated — clean it up now.”
  • Automatic trigger: Before executing a prompt, the system checks whether the history has reached ~70% of the model’s maximum context window. If so, compression runs automatically. This prevents hard failures when the limit is reached and helps maintain responsiveness.

When /compress runs, it partitions the conversation:

  • The most recent 30% of turns are preserved exactly.
  • The older 70% are summarized into a single structured <state_snapshot> in XML.

Why XML? Because structure matters. In getCompressionPrompt (packages/core/src/core/prompts.ts), the model is temporarily assigned the role of a “state summarization component” and told to output specific tags: <overall_goal>, <key_knowledge>, <current_plan>, and more. These represent the irreducible essentials of conversational memory — the minimum required to reconstruct the mental state of the session.

There’s a subtle refinement here: the split between “recent” and “older” turns is adjusted so it always begins at a user message. This prevents a question from being cut away from its answer, or a tool request from being severed from its result. It’s a detail that avoids frustrating discontinuities in the compressed session.

After generating the structured summary, the system discards the old history entirely and rebuilds a fresh session containing:

  • The XML <state_snapshot>
  • A short acknowledgment from the model
  • The preserved recent turns

This approach is immutable — instead of editing the existing history in place, a new one is constructed from clean parts. That guarantees predictability and removes the risk of subtle corruption in the conversation record.

Conclusion: Lessons from Gemini CLI’s Approach

The Gemini CLI’s state management design reflects a core truth: conversational history is not an infinite append-only log. It’s a working set, constantly in flux, and must be managed intentionally.

With /clear, /chat, and /compress, developers have tools for:

  • Starting from nothing
  • Preserving exactly what exists
  • Refining a large history into a focused, usable form

The /compress command, in particular, illustrates an important principle for working with large, probabilistic models: be prescriptive. Assign explicit roles. Demand structured, machine-readable output. This discipline transforms “summarization” from an unreliable request into a consistent, predictable process.

Above all, the CLI respects the user’s judgment. It doesn’t force one workflow; it gives a spectrum of options, trusting developers to pick the right tool for the right job. That combination of flexibility and engineering rigor is what makes it a model worth studying.

PS: Written with and about CLI Version 0.1.20

Gemini CLI Masterclass Articles

Articles in the Gemini CLI Masterclass series:
  • Architecting AI Memory: Lessons from Gemini CLI (2025/8/13) original
  • Inside the Mind: Gemini CLI's System Prompts Deep Dive (2025/7/19) original
  • Meet the Agent: The Brain Behind Gemini CLI (2025/7/18) original
Want to learn more? Need help?

If you'd like to learn these techniques in depth, join my
Gemini CLI Masterclass


Note: Gemini CLI Master class is not a Google product. It's not developed, funded, supported, or approved by Google LLC. Note: Gemini CLI is a Google product; references here are for educational and practical AI development purposes.