Module 1

What Is HyperMemory?

An AI backend that stores memories — and actually knows how to find them again.

🧠

Stores memories

Any chunk of text — a conversation, a note, a fact — stored and indexed

🔍

Finds them later

Ask a question in plain English and get back the most relevant memories

📉

Forgets gracefully

Old, unused memories fade — like human memory, but on a schedule

🕸️

Builds a knowledge graph

Extracts who, what, and how things relate — so it can reason across memories

Think of it this way: Your phone's photo gallery stores images. HyperMemory stores knowledge — and it understands the meaning of what you stored, not just the exact words.

The Problem It Solves

AI assistants forget everything between conversations. HyperMemory is the persistent memory layer that gives them a real past.

1

A conversation happens

You talk to an AI assistant about your project, your preferences, your life

2

HyperMemory stores it

The conversation gets sent to the API, indexed, and filed away

3

Later, you ask a question

"What database did we decide to use?" — and HyperMemory finds the right memory

4

The answer comes back

Not a keyword match — a semantically relevant result, synthesized into a plain-English answer

When You Store a Memory

Imagine you send this to the API endpoint POST /mcp/memory/store:

"Had a great conversation about switching from MySQL to PostgreSQL."

Here is what happens next — click through to trace the journey:

📱

Your App

🚪

FastAPI

🔮

Qdrant

⚙️

Extractor

🗄️

PostgreSQL

Click "Next Step" to begin

Seven Generations of Getting Smarter

HyperMemory didn't start this complex. It was built layer by layer — each version adding a new superpower on top of the last.

v1

Hybrid Search

Semantic + temporal + keyword + fact — four ways to find a memory at once

v3

Knowledge Graph

Entities, relationships, multi-hop reasoning — it understands who knows what

v5

Memory Tiers

Short-term, medium-term, long-term, archive — just like how humans remember

v7

PRISM Retrieval

A 3-agent agentic loop that finds exactly the right evidence for complex questions

Key insight: Every version is still running. The enhanced-query-v5 endpoint uses all five strategies at once, weighted and blended. Older endpoints still exist for backwards compatibility.

Module 2

The Database Trio

Three completely different databases, each solving a problem the others can't.

Analogy: Think of a record store. The filing cabinets hold everything in order by artist and album (PostgreSQL). The listening booths let you find music by vibe, not title (Qdrant). The front desk sticky notes remind the clerk of what you played last week (Redis).

🗄️

PostgreSQL

The filing cabinet

🔮

Qdrant

The concept library

📋

Redis

The quick-reference board

Click any database to learn what it does

PostgreSQL — The Source of Truth

Everything real lives here. Memories, extracted facts, entities, relationships, tiers. If it's deleted from PostgreSQL, it's gone.

CODE


class Memory(Base):
    __tablename__ = "memories"
    id = Column(Integer, primary_key=True)
    content = Column(Text, nullable=False)
    summary = Column(Text)
    user_id = Column(String, index=True)
    importance = Column(Float, default=0.5)
    timestamp = Column(DateTime)

PLAIN ENGLISH

This is a class called Memory — a blueprint for what a memory looks like in the database

The table in the database is named "memories"

Every memory gets a unique number ID (like a ticket stub)

The actual text you stored — required, can never be blank

A 1-2 sentence LLM summary of the memory (v5 feature)

Which user this memory belongs to — indexed for fast lookups

How important this memory is, from 0.0 (trivial) to 1.0 (critical)

When it was created — used for time-based queries

Why a separate summary column? Sometimes the meaning of a memory is buried in a long conversation. A short summary captures the essence better for searching. Both are stored — and both are embedded for vector search.

Qdrant — The Concept Library

Qdrant stores memories as vectors — numerical fingerprints of meaning. When you search, it finds the memories whose meaning is closest to your query.

collection

hypermemory_embeddings_v2

The raw content of each memory — what was actually stored

collection

…_facts

Each extracted fact triple — entity, attribute, value — as its own vector

collection

…_v5_summaries

The LLM-generated summaries — often more semantically dense than the raw text

collection

…_v7_events

Episodic event segments — slices of memory grouped by what was happening

Why four collections? Different questions need different granularity. "What database did we choose?" is best answered by a fact. "What was the vibe of that meeting?" is best answered by an event summary. Having four collections means four ways to find the right answer.

Redis — The Quick-Reference Board

Redis holds recent search results in memory so they can be returned instantly, without re-running expensive searches. Like a whiteboard next to the filing cabinet — frequently needed answers written in marker, cleared at the end of the day.

Watch how these three databases collaborate on a typical query:

Check Your Understanding

A user asks: "What were my thoughts about moving to the cloud?" — which database finds the answer first?

Your Redis server crashes and restarts. What happens to your memories?

You want to find all facts about "database choices" even though some memories say "PostgreSQL" and others say "relational DB." Which system makes this possible?

Module 3

The Extraction Factory

How raw text becomes structured knowledge — automatically, in the background, without blocking you.

Analogy: Think of a journalist reading a story and pulling out the key facts to file separately — "John (person) works at (relationship) Acme Corp (organization)." HyperMemory does this for every memory you store, automatically, while you move on to something else.

1

Memory arrives

"Had a great conversation about switching from MySQL to PostgreSQL for the new project."

2

LLM reads it

An LLM (GPT-4o-mini via OpenRouter) receives the text and a prompt asking for structured facts

3

Facts extracted

{entity: "PostgreSQL", attribute: "database-type", value: "relational"}
{entity: "project", attribute: "switching-from", value: "MySQL"}

4

Graph built

Entities (PostgreSQL, MySQL, project) and their relationships get added to the knowledge graph

The Background Worker

Extraction doesn't block your API call. An async worker runs in the background, polling for unprocessed memories every 30 seconds.

CODE


class ExtractionWorker:
    def __init__(self,
        batch_size: int = 10,
        poll_interval: int = 30,
        graph_extraction_enabled: bool = True,
        event_segmentation_enabled: bool = True,
    ):

PLAIN ENGLISH

The background extraction service

Its setup options:

Process up to 10 memories at a time

Check for new unprocessed memories every 30 seconds

Should we also extract entities and relationships? (yes)

Should we slice memories into narrative events? (yes)

Why async? Fact extraction involves calling an LLM API — that can take 2-5 seconds. If the API waited for extraction before responding, every POST /store call would be slow. By doing it asynchronously, the API returns instantly and extraction happens quietly in the background.

Three Extraction Pipelines

The ExtractionWorker runs three separate pipelines on each memory:

📋

Fact Extraction

Pulls out entity-attribute-value triples. "PostgreSQL is a relational database." → {entity, attribute, value}

FactExtractor → LLM

🕸️

Graph Extraction

Identifies entities (nouns) and relationships (verbs). Builds a web of connections that can be traversed later.

GraphExtractor → EntityStore

📅

Event Segmentation

Groups a window of recent memories into narrative episodes — "the database migration project" as one coherent event.

EventSegmenter (v7)

The ADD/UPDATE/DELETE trick: When a new fact conflicts with an existing one, the OperationClassifier decides what to do. If the system stored "team uses MySQL" yesterday and today extracts "team prefers PostgreSQL," it doesn't add a contradiction — it marks the old fact as superseded and adds the new one.

Inside the Extraction Pipeline

Here is what happens inside the worker when it picks up a new memory:

Check Your Understanding

You store a memory via the API at 3:00pm. At what point do facts get extracted from it?

You stored "the team uses MySQL" a month ago. Now you store "switched to PostgreSQL." What does HyperMemory do with these contradictory facts?

Module 4

The Search Brain

Five strategies. One question. The right answer — even when the words don't match.

The retrieval problem: Searching for text with keywords fails when people use different words for the same thing. "What database do we prefer?" won't find a memory that says "switched to Postgres." HyperMemory uses five complementary strategies so no relevant memory escapes.

🔮

Summary Search

weight: 0.45

Match by meaning of the LLM summary

📋

Fact Search

weight: 0.25

Match against entity-attribute-value triples

🕸️

Graph Search

weight: 0.20

Traverse entity relationships multi-hop

🔤

BM25 Keyword

weight: 0.10

Classic keyword relevance ranking

🌐

Semantic Fallback

weight: 0.15

Raw content vector similarity

A Query Travels Through Seven Stages

Before any database is touched, your query goes through analysis, expansion, and classification. Here is the full pipeline:

📥

Query In

🔎

Analyzer

➕

Expander

⚖️

Weights

🧠

Engine

Click "Next Step" to begin

Two Types of "Similar"

Keyword search and semantic search find different things. HyperMemory runs both and blends the results.

BM25 Keyword

Looks for the same words. Query "PostgreSQL" finds memories with "PostgreSQL" in them — but misses "Postgres" or "relational DB."

score ∝ tf(term) × idf(term)

Term frequency × inverse document frequency

Semantic / Vector

Looks for the same meaning. "What database do we use?" finds "switched from MySQL to Postgres" because both are about database decisions.

score = cosine(query_vec, doc_vec)

Cosine similarity between meaning-vectors

CODE


vector_engine = RetrievalEngine(
    postgres_adapter=db_adapter,
    vector_adapter=vector_adapter,
    synthesis_model=settings.EXTRACTION_MODEL,
    synthesis_base_url=settings.EXTRACTION_BASE_URL,
    synthesis_api_key=settings.EXTRACTION_API_KEY,
)

PLAIN ENGLISH

Create the search engine (assembled at startup)

Give it access to PostgreSQL (for facts and memories)

Give it access to Qdrant (for vector / semantic search)

Which LLM to use for answer synthesis (writing a plain-English answer)

Where that LLM lives (OpenRouter in this case)

The API key to authenticate with the LLM service

PRISM — The Research Robot

For complex questions that need multiple memory pieces to answer, v7 adds PRISM: a 3-agent loop that keeps refining until it has exactly the right evidence.

Analogy: Imagine a detective with two assistants. The detective breaks down the question. The first assistant (Selector) prunes out the red herrings. The second assistant (Adder) makes sure nothing important was left out. They argue back and forth until the evidence set is just right.

1

Decomposer

Breaks the complex question into sub-questions. "What did we decide about the stack?" → "What database?" + "What framework?" + "What hosting?"

2

Selector

Given all candidate memories, removes the ones that are distractors — not actually relevant to answering the question

3

Adder

Looks at what the Selector kept and asks "what's missing?" — adds bridging memories that connect the dots

↺

Iterate (up to 3×)

Selector and Adder cycle until the evidence set stabilizes — no new adds, no new removals

Check Your Understanding

A user types "AI model" but all memories say "LLM." Which part of the pipeline helps bridge this gap?

Which search strategy has the highest weight in v5 — and why?

A user asks: "How did our thinking about databases and infrastructure evolve over the last year?" Which feature handles this best?

Module 5

Memory That Lives and Dies

Not all memories are equal. HyperMemory knows which ones to keep, which to fade, and which to bury.

Analogy: Think of a greenhouse. Seeds that get regular water and sunlight grow into strong plants. Seeds that are ignored wilt and get composted. HyperMemory's decay system is the gardener — checking every memory hourly, keeping the important ones alive and clearing out what you no longer need.

⚡

NOISE

1-hour lifespan. Ephemeral chatter. Auto-fades.

📌

TACTICAL

24-hour lifespan. Short-term useful info.

📂

MEDIUM

30-day lifespan. Project-level context.

🏛️

LONG TERM

180-day lifespan. Important decisions.

💎

PERMANENT

Never decays. Core identity facts.

The Heat Score — A Memory's Life Force

Every memory has a heat score between 0.0 and 1.0. High heat = hot memory, actively used, should be kept close. Low heat = cold memory, forgotten, getting moved to the archive.

CODE


visit_score = min(1.0, math.log10(visit_count + 1) / 3.0)
recency_score = math.exp(-hours_since / 168)
heat = (
    0.4 * visit_score
    + 0.3 * recency_score
    + 0.3 * importance_score
)

PLAIN ENGLISH

Visit score: how many times has this been accessed? (log scale — 10 visits ≈ 0.33, 1000 visits ≈ 1.0)

Recency score: how recently was it accessed? (decays by half every 7 days of inactivity)

The final heat is a weighted sum of three factors:

40% from visit frequency — how often it gets used

30% from recency — how recently it was used

30% from its base importance score — how significant it was when stored

Why logarithmic for visit count? A memory visited 100 times shouldn't be 100x more important than one visited once. The log scale compresses the range — preventing super-popular memories from crowding out newer, also-relevant ones.

The Memory Tier Ladder

Memories are constantly being promoted and demoted based on their heat score. A background worker runs every hour and reorganizes them.

STM — Short-Term Memory

Every new memory starts here. Quick access, recent context.

Promoted when heat > 0.6
and age > 24h

↕

MTM — Medium-Term Memory

Active memories that are proving useful. Actively maintained.

Promoted when heat > 0.7
and age > 7 days

↕

LTM — Long-Term Memory

Core knowledge. Rarely changes. High-value facts you always need.

Demoted when heat < 0.2
and age > 30 days

↓

ARCHIVE — Cold Storage

Heat < 0.1 for 30+ days. Still exists but deprioritized in search results.

The Hourly Gardening Round

Every hour, the DecayScheduler wakes up and runs its batch. Watch what happens:

Temporal protection: Memories with dated facts — "the project launched on March 5th" — are protected from aggressive decay. You should always be able to ask "when did X happen?" and get an accurate answer.

Check Your Understanding

A memory was accessed 500 times last year but hasn't been touched in 3 months. Will it have a high heat score?

You have a memory: "The company was founded on April 1, 2020." It hasn't been accessed in two years. What happens to it?

You store an important memory today. Where does it start, and how does it reach LTM?

You now understand HyperMemory.

You know how memories are stored across three databases, how facts are extracted in the background, how five search strategies combine to find the right answer, and how a memory's importance is managed over time. That's not just knowing what buttons to click — that's understanding the machinery behind the buttons.