What Is HyperMemory?
An AI backend that stores memories โ and actually knows how to find them again.
Stores memories
Any chunk of text โ a conversation, a note, a fact โ stored and indexed
Finds them later
Ask a question in plain English and get back the most relevant memories
Forgets gracefully
Old, unused memories fade โ like human memory, but on a schedule
Builds a knowledge graph
Extracts who, what, and how things relate โ so it can reason across memories
The Problem It Solves
AI assistants forget everything between conversations. HyperMemory is the persistent memory layer that gives them a real past.
A conversation happens
You talk to an AI assistant about your project, your preferences, your life
HyperMemory stores it
The conversation gets sent to the API, indexed, and filed away
Later, you ask a question
"What database did we decide to use?" โ and HyperMemory finds the right memory
The answer comes back
Not a keyword match โ a semantically relevant result, synthesized into a plain-English answer
When You Store a Memory
Imagine you send this to the API endpoint POST /mcp/memory/store:
Here is what happens next โ click through to trace the journey:
Seven Generations of Getting Smarter
HyperMemory didn't start this complex. It was built layer by layer โ each version adding a new superpower on top of the last.
Hybrid Search
Semantic + temporal + keyword + fact โ four ways to find a memory at once
Knowledge Graph
Entities, relationships, multi-hop reasoning โ it understands who knows what
Memory Tiers
Short-term, medium-term, long-term, archive โ just like how humans remember
PRISM Retrieval
A 3-agent agentic loop that finds exactly the right evidence for complex questions
enhanced-query-v5 endpoint uses all five strategies at once, weighted and blended. Older endpoints still exist for backwards compatibility.
The Database Trio
Three completely different databases, each solving a problem the others can't.
The filing cabinet
The concept library
The quick-reference board
PostgreSQL โ The Source of Truth
Everything real lives here. Memories, extracted facts, entities, relationships, tiers. If it's deleted from PostgreSQL, it's gone.
class Memory(Base):
__tablename__ = "memories"
id = Column(Integer, primary_key=True)
content = Column(Text, nullable=False)
summary = Column(Text)
user_id = Column(String, index=True)
importance = Column(Float, default=0.5)
timestamp = Column(DateTime)
This is a class called Memory โ a blueprint for what a memory looks like in the database
The table in the database is named "memories"
Every memory gets a unique number ID (like a ticket stub)
The actual text you stored โ required, can never be blank
A 1-2 sentence LLM summary of the memory (v5 feature)
Which user this memory belongs to โ indexed for fast lookups
How important this memory is, from 0.0 (trivial) to 1.0 (critical)
When it was created โ used for time-based queries
Qdrant โ The Concept Library
Qdrant stores memories as vectors โ numerical fingerprints of meaning. When you search, it finds the memories whose meaning is closest to your query.
hypermemory_embeddings_v2
The raw content of each memory โ what was actually stored
โฆ_facts
Each extracted fact triple โ entity, attribute, value โ as its own vector
โฆ_v5_summaries
The LLM-generated summaries โ often more semantically dense than the raw text
โฆ_v7_events
Episodic event segments โ slices of memory grouped by what was happening
Redis โ The Quick-Reference Board
Redis holds recent search results in memory so they can be returned instantly, without re-running expensive searches. Like a whiteboard next to the filing cabinet โ frequently needed answers written in marker, cleared at the end of the day.
Watch how these three databases collaborate on a typical query:
Check Your Understanding
A user asks: "What were my thoughts about moving to the cloud?" โ which database finds the answer first?
Your Redis server crashes and restarts. What happens to your memories?
You want to find all facts about "database choices" even though some memories say "PostgreSQL" and others say "relational DB." Which system makes this possible?
The Extraction Factory
How raw text becomes structured knowledge โ automatically, in the background, without blocking you.
Memory arrives
"Had a great conversation about switching from MySQL to PostgreSQL for the new project."
LLM reads it
An LLM (GPT-4o-mini via OpenRouter) receives the text and a prompt asking for structured facts
Facts extracted
{entity: "PostgreSQL", attribute: "database-type", value: "relational"}
{entity: "project", attribute: "switching-from", value: "MySQL"}
Graph built
Entities (PostgreSQL, MySQL, project) and their relationships get added to the knowledge graph
The Background Worker
Extraction doesn't block your API call. An async worker runs in the background, polling for unprocessed memories every 30 seconds.
class ExtractionWorker:
def __init__(self,
batch_size: int = 10,
poll_interval: int = 30,
graph_extraction_enabled: bool = True,
event_segmentation_enabled: bool = True,
):
The background extraction service
Its setup options:
Process up to 10 memories at a time
Check for new unprocessed memories every 30 seconds
Should we also extract entities and relationships? (yes)
Should we slice memories into narrative events? (yes)
POST /store call would be slow. By doing it asynchronously, the API returns instantly and extraction happens quietly in the background.
Three Extraction Pipelines
The ExtractionWorker runs three separate pipelines on each memory:
Fact Extraction
Pulls out entity-attribute-value triples. "PostgreSQL is a relational database." โ {entity, attribute, value}
Graph Extraction
Identifies entities (nouns) and relationships (verbs). Builds a web of connections that can be traversed later.
Event Segmentation
Groups a window of recent memories into narrative episodes โ "the database migration project" as one coherent event.
Inside the Extraction Pipeline
Here is what happens inside the worker when it picks up a new memory:
Check Your Understanding
You store a memory via the API at 3:00pm. At what point do facts get extracted from it?
You stored "the team uses MySQL" a month ago. Now you store "switched to PostgreSQL." What does HyperMemory do with these contradictory facts?
The Search Brain
Five strategies. One question. The right answer โ even when the words don't match.
Summary Search
Match by meaning of the LLM summary
Fact Search
Match against entity-attribute-value triples
Graph Search
Traverse entity relationships multi-hop
BM25 Keyword
Classic keyword relevance ranking
Semantic Fallback
Raw content vector similarity
A Query Travels Through Seven Stages
Before any database is touched, your query goes through analysis, expansion, and classification. Here is the full pipeline:
Two Types of "Similar"
Keyword search and semantic search find different things. HyperMemory runs both and blends the results.
BM25 Keyword
Looks for the same words. Query "PostgreSQL" finds memories with "PostgreSQL" in them โ but misses "Postgres" or "relational DB."
Term frequency ร inverse document frequency
Semantic / Vector
Looks for the same meaning. "What database do we use?" finds "switched from MySQL to Postgres" because both are about database decisions.
Cosine similarity between meaning-vectors
vector_engine = RetrievalEngine(
postgres_adapter=db_adapter,
vector_adapter=vector_adapter,
synthesis_model=settings.EXTRACTION_MODEL,
synthesis_base_url=settings.EXTRACTION_BASE_URL,
synthesis_api_key=settings.EXTRACTION_API_KEY,
)
Create the search engine (assembled at startup)
Give it access to PostgreSQL (for facts and memories)
Give it access to Qdrant (for vector / semantic search)
Which LLM to use for answer synthesis (writing a plain-English answer)
Where that LLM lives (OpenRouter in this case)
The API key to authenticate with the LLM service
PRISM โ The Research Robot
For complex questions that need multiple memory pieces to answer, v7 adds PRISM: a 3-agent loop that keeps refining until it has exactly the right evidence.
Decomposer
Breaks the complex question into sub-questions. "What did we decide about the stack?" โ "What database?" + "What framework?" + "What hosting?"
Selector
Given all candidate memories, removes the ones that are distractors โ not actually relevant to answering the question
Adder
Looks at what the Selector kept and asks "what's missing?" โ adds bridging memories that connect the dots
Iterate (up to 3ร)
Selector and Adder cycle until the evidence set stabilizes โ no new adds, no new removals
Check Your Understanding
A user types "AI model" but all memories say "LLM." Which part of the pipeline helps bridge this gap?
Which search strategy has the highest weight in v5 โ and why?
A user asks: "How did our thinking about databases and infrastructure evolve over the last year?" Which feature handles this best?
Memory That Lives and Dies
Not all memories are equal. HyperMemory knows which ones to keep, which to fade, and which to bury.
NOISE
1-hour lifespan. Ephemeral chatter. Auto-fades.
TACTICAL
24-hour lifespan. Short-term useful info.
MEDIUM
30-day lifespan. Project-level context.
LONG TERM
180-day lifespan. Important decisions.
PERMANENT
Never decays. Core identity facts.
The Heat Score โ A Memory's Life Force
Every memory has a heat score between 0.0 and 1.0. High heat = hot memory, actively used, should be kept close. Low heat = cold memory, forgotten, getting moved to the archive.
visit_score = min(1.0, math.log10(visit_count + 1) / 3.0)
recency_score = math.exp(-hours_since / 168)
heat = (
0.4 * visit_score
+ 0.3 * recency_score
+ 0.3 * importance_score
)
Visit score: how many times has this been accessed? (log scale โ 10 visits โ 0.33, 1000 visits โ 1.0)
Recency score: how recently was it accessed? (decays by half every 7 days of inactivity)
The final heat is a weighted sum of three factors:
40% from visit frequency โ how often it gets used
30% from recency โ how recently it was used
30% from its base importance score โ how significant it was when stored
The Memory Tier Ladder
Memories are constantly being promoted and demoted based on their heat score. A background worker runs every hour and reorganizes them.
Every new memory starts here. Quick access, recent context.
and age > 24h
Active memories that are proving useful. Actively maintained.
and age > 7 days
Core knowledge. Rarely changes. High-value facts you always need.
and age > 30 days
Heat < 0.1 for 30+ days. Still exists but deprioritized in search results.
The Hourly Gardening Round
Every hour, the DecayScheduler wakes up and runs its batch. Watch what happens:
Check Your Understanding
A memory was accessed 500 times last year but hasn't been touched in 3 months. Will it have a high heat score?
You have a memory: "The company was founded on April 1, 2020." It hasn't been accessed in two years. What happens to it?
You store an important memory today. Where does it start, and how does it reach LTM?
You now understand HyperMemory.
You know how memories are stored across three databases, how facts are extracted in the background, how five search strategies combine to find the right answer, and how a memory's importance is managed over time. That's not just knowing what buttons to click โ that's understanding the machinery behind the buttons.