Memory & RAG
Assemble Context (RAG)
One-shot context assembly for LLM prompts — memories + timeline + graph
POST
Purpose-built for retrieval-augmented generation (RAG). Given a user query, the endpoint retrieves the most relevant memories, recent events, and graph relationships, then returns a pre-formatted context string ready to prepend to your LLM prompt.
This is the easiest way to add memory to your AI chat. One call → formatted context.
See Pricing & Billing for the full rate card.
Best practice: Always check if
Request
Headers
Bearer token with your API key
application/json
Body
The user’s query. This drives retrieval.
Chat session ID for hierarchical session context.
Number of memories to retrieve. Max 100.
Maximum assembled context length in tokens. Older chunks truncated if exceeded.
Include knowledge-graph relationships.
Include recent events from EventStoreDB.
Response
The key field — a pre-formatted context string you can prepend directly to your LLM system message.
Structured context with separate
chunks, sources, graph_context, and timeline sections.Status message (e.g., “Context assembled from 8 memories and 3 recent events”).
Complete RAG example
Billing
Flat **15/month. Response headers:| Header | Meaning |
|---|---|
x-credit-balance | Wallet balance after this charge |
x-credit-charged | 0.000500 |
x-billing-tx | Audit row UUID |
If the memory service is unreachable and the endpoint returns the graceful-degradation empty context (see below), you are not charged — the auto-refund guard reverses the deduction.
Graceful degradation
If the Memory service is unavailable,/memory/context returns prompt_ready: "" instead of failing. Your application can continue with no context — the SLM chat will still work, just without memory-grounded responses.
No-context response
prompt_ready is non-empty before prepending it.