# Working Memory

Working memory stores the **current conversation** for a session. It holds messages, tracks context, and automatically summarizes old messages when the conversation gets too long.

## What Working Memory Does

1. **Stores conversation messages** — The chat history for a session
2. **Tracks session data** — Arbitrary key-value data that lives only in this session
3. **Automatically summarizes** — When messages exceed the token limit, older messages are summarized and removed
4. **Promotes memories** — Structured memories added here get indexed in long-term storage

## Quick Reference

| Feature | Details |
|---------|---------|
| **Scope** | One session |
| **Lifespan** | Persistent (default) or TTL-based |
| **Storage** | Redis JSON |
| **Key Feature** | Automatic summarization |
| **Search** | None (use long-term memory for search) |

## Data Structure

Working memory contains:

| Field | Description |
|-------|-------------|
| `messages` | Conversation history (role/content pairs) |
| `context` | **Summary of older messages** (populated by auto-summarization) |
| `memories` | Structured memory records that get promoted to long-term storage |
| `data` | Arbitrary JSON key-value storage for the session |
| `user_id` | Owner of this session |
| `namespace` | Logical grouping |
| `ttl_seconds` | Optional expiration time |

## Automatic Summarization

When your conversation exceeds the model's context window, working memory automatically:

1. **Summarizes older messages** into a compact summary
2. **Stores the summary** in the `context` field
3. **Removes the summarized messages** to free space
4. **Keeps recent messages** intact

This happens transparently—you don't need to trigger it.

### How It Works

The server tracks token usage against your model's context window. When messages exceed a threshold (default: 70% of the context window), summarization kicks in:

```
Messages: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]
                  ↓ (exceeds threshold)
                  ↓ summarize older messages
Context:  "User discussed trip planning to Paris, preferences for museums..."
Messages: [msg8, msg9, msg10]  ← recent messages preserved
```

### Finding the Summary

The summary is stored in the `context` field of working memory:

```python
# After summarization has occurred
working_memory = await get_working_memory("session_123")

print(working_memory.context)
# "User discussed trip planning to Paris, preferences for museums and food,
#  budget constraints around $3000, and interest in Impressionist art..."

print(working_memory.messages)
# [recent messages only]
```

### Monitoring Summarization

The `WorkingMemoryResponse` includes fields to track context usage:

```python
response = await get_working_memory("session_123")

# How much of the total context window is used (0-100%)
print(response.context_percentage_total_used)  # e.g., 45.2

# How close to triggering summarization (0-100%)
print(response.context_percentage_until_summarization)  # e.g., 64.5
# When this hits 100%, summarization triggers
```

### Configuring Summarization

| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `SUMMARIZATION_THRESHOLD` | `0.7` | Fraction of context window that triggers summarization |
| `GENERATION_MODEL` | `gpt-4o-mini` | Model used for summarization |
| `PROGRESSIVE_SUMMARIZATION_PROMPT` | (see below) | Custom prompt for summarization |

The summarization prompt can be customized. It must include `{prev_summary}` and `{messages_joined}` placeholders:

```bash
PROGRESSIVE_SUMMARIZATION_PROMPT="Your custom prompt with {prev_summary} and {messages_joined}..."
```

## Storing Messages

The primary use of working memory is storing conversation messages:

```python
from datetime import datetime, UTC
import ulid

working_memory = WorkingMemory(
    session_id="chat_123",
    messages=[
        MemoryMessage(
            role="user",
            content="I'm planning a trip to Paris next month",
            id=ulid.ULID(),
            created_at=datetime.now(UTC)
        ),
        MemoryMessage(
            role="assistant",
            content="What type of activities interest you?",
            id=ulid.ULID(),
            created_at=datetime.now(UTC)
        ),
    ]
)
```

> **⚠️ Always provide `created_at` timestamps**
>
> This ensures correct message ordering and proper temporal context when promoting to long-term memory. Omitting `created_at` triggers a deprecation warning—it will become required in a future version.

## Session-Specific Data

Use the `data` field for temporary information that doesn't need to persist across conversations:

```python
working_memory = WorkingMemory(
    session_id="chat_123",
    data={
        "current_topic": "trip_planning",
        "user_timezone": "America/New_York",
    }
)
```

## Structured Memories

Use the `memories` field for facts that should persist beyond this session:

```python
working_memory = WorkingMemory(
    session_id="chat_123",
    memories=[
        MemoryRecord(
            text="User is planning a trip to Paris next month",
            id="trip_planning_paris",
            memory_type="episodic",
            topics=["travel"],
            entities=["Paris"]
        )
    ]
)
```

These are automatically promoted to long-term storage and become searchable across all sessions.

> **Key distinction:**
> - `data` → session-only, not searchable, not persisted beyond session
> - `memories` → promoted to long-term storage, searchable, persistent

## Memory Promotion to Long-Term Storage

Memories added to the `memories` field are automatically promoted to long-term storage:

1. Server identifies memories with `persisted_at=null`
2. Generates vector embeddings
3. Indexes in long-term storage
4. Updates working memory with `persisted_at` timestamps

You can also configure **background extraction** to automatically extract memories from conversation messages:

```python
working_memory = WorkingMemory(
    session_id="chat_123",
    messages=[...],
    long_term_memory_strategy=MemoryStrategyConfig(
        strategy="discrete",  # or "summary", "preferences", "custom"
        config={}
    ),
)
```

See [Memory Extraction Strategies](memory-extraction-strategies.md) for configuration options.

## API Reference

```http
# Get working memory
GET /v1/working-memory/{session_id}?namespace=demo&model_name=gpt-4o

# Set working memory (replaces existing)
PUT /v1/working-memory/{session_id}?ttl_seconds=3600

# Delete working memory
DELETE /v1/working-memory/{session_id}?namespace=demo
```

## TTL and Persistence

Working memory is **persistent by default**. Set `ttl_seconds` to auto-expire:

```python
# Persistent (default)
working_memory = WorkingMemory(session_id="chat_123", messages=[...])

# Expires after 1 hour
working_memory = WorkingMemory(session_id="chat_123", messages=[...], ttl_seconds=3600)
```

**Use TTL for:** temporary sessions, privacy requirements, resource constraints.

**Keep persistent for:** conversation history, multi-turn context, support applications.

## Reconstruction from Long-Term Memory

With `INDEX_ALL_MESSAGES_IN_LONG_TERM_MEMORY=true`, working memory can be reconstructed after TTL expiration:

1. Messages are indexed in long-term memory as they flow through
2. When working memory expires, messages remain in long-term storage
3. Requesting an expired session reconstructs it from long-term memory

This lets you use TTL to save Redis memory while maintaining conversation continuity.

## Configuration Reference

| Variable | Default | Description |
|----------|---------|-------------|
| `SUMMARIZATION_THRESHOLD` | `0.7` | Fraction of context window that triggers summarization |
| `GENERATION_MODEL` | `gpt-4o-mini` | Model for summarization |
| `PROGRESSIVE_SUMMARIZATION_PROMPT` | (built-in) | Custom summarization prompt |
| `LONG_TERM_MEMORY` | `true` | Enable long-term memory features |
| `INDEX_ALL_MESSAGES_IN_LONG_TERM_MEMORY` | `false` | Index messages for reconstruction |

See the [Configuration Guide](configuration.md) for all options.

## Related Documentation

- [Long-term Memory](long-term-memory.md) — Persistent, cross-session storage
- [Memory Integration Patterns](memory-integration-patterns.md) — How to integrate memory
- [Memory Extraction Strategies](memory-extraction-strategies.md) — Automatic memory extraction
- [LLM Providers](llm-providers.md) — Configure OpenAI, Anthropic, Bedrock, Ollama