Observational Memory is a state-of-the-art memory system for AI agents that scores ~95% on LongMemEval benchmarks. It works like human memory by compressing conversations into observations and maintaining a stable context window.

Observational Memory is a new type of memory system for agentic systems that provides human-like memory capabilities for AI agents. It achieves state-of-the-art performance on benchmarks like LongMemEval while maintaining a completely stable context window. The system is text-based and compatible with various AI providers including Anthropic and OpenAI prompt caching.

The core functionality involves compressing context into observations similar to how human memory works. When new messages come in, they are appended until reaching a token threshold (default 30k tokens), at which point an observer agent compresses messages into new observations. The system uses formatted text messages resembling logs with a three-date model for temporal reasoning and emoji-based prioritization (🔴 for important, 🟡 for maybe important, 🟢 for info only). Observations are managed in two blocks - one for compressed observations and another for raw messages awaiting compression.

The system operates with two background agents: an observer agent that compresses messages into observations when thresholds are reached, and a reflector agent that garbage collects unimportant observations when they hit another threshold (default 40k tokens). This structure enables consistent prompt caching where messages keep getting appended until thresholds are hit, providing full cache hits on every turn.

The benefits include achieving 94.87% on LongMemEval with gpt-5-mini - over 3 points higher than any previously recorded score. With gpt-4o, it scores 84.23%, beating the gpt-4o oracle by 2 points. The system solves problems with context windows being overwhelmed by tool call results and enables aggressive caching to reduce token costs, particularly useful for parallelizable agents generating large amounts of context quickly.

The system is designed for AI agent developers working with coding agents, browser agents using Playwright, deep research agents browsing multiple URLs, and other agentic systems that generate substantial context. It's available in Mastra with adapters for LangChain, Vercel AI SDK, OpenCode and others coming soon. The implementation is open-source and uses text-based formatting optimized for LLMs.

Observational Memory

Observational Memory

Key Features

Use Cases

Who is this for?

Comments