Memory & RAG

Overview

60db’s Memory system gives your AI applications a long-term, searchable memory. It stores user preferences, conversation history, knowledge base documents, and arbitrary facts, then retrieves the most relevant ones on demand using hybrid semantic + keyword search. Built on a multi-layer retrieval architecture combining vector search with a knowledge graph, Memory enables:

Personalized AI chat — the SLM remembers user preferences across sessions
Knowledge base Q&A — ingest docs and retrieve grounded answers
Multi-user collaboration — shared “team” memory collections
Graph-aware recall — find related concepts through knowledge-graph traversal

Hybrid Search

Semantic (vector) + keyword (BM25) scoring with configurable weights

Context Assembly

One-shot endpoint returns an LLM-ready context string for any query

Graph Relationships

Extracted facts link together as a knowledge graph

Multi-Collection

Personal, team, knowledge, and hive (cross-collection) memory types

Document Upload

Upload PDFs, Office docs, and scanned images — text extraction + OCR built in

91+ Formats

PDF, DOCX, XLSX, PPTX, EML, MSG, HTML, scanned images with built-in OCR

Core concepts

Memory collections

Memories live in collections scoped to your workspace. Each collection is one of:

Kind	Visibility	Who can write
personal	Only the owning user	Owner + admin (automatic per-user)
team	All members of the workspace	Owner/admin create; all members write
knowledge	All members (read-only reference)	Owner/admin only
hive	Cross-collection shared facts	Owner/admin only

Your personal collection is created automatically the first time you use Memory. Team/knowledge/hive collections are created by owners/admins.

Memory types

When you store a memory, you specify its type:

user — private, user-scoped facts. Auto-extracted from conversations or manually entered.
knowledge — reference content (docs, policies, FAQs). Read by all members.
hive — workspace-wide shared facts that appear in every user’s search results.

Two search modes

Fast — single-query hybrid search. Returns in ~100ms. Default.
Thinking — multi-query expansion with reranking. Better recall quality, slower (~1-2s).

Storing memories

JavaScript
Python
cURL

import { SixtyDBClient } from '60db';

const client = new SixtyDBClient('your-api-key');

// Store a personal memory
await client.memory.ingest({
  text: "I prefer vegetarian food and am lactose intolerant",
  type: "user",
  infer: true, // extract structured facts via LLM
});

// Store a knowledge base entry
await client.memory.ingest({
  text: "Our refund policy allows returns within 30 days of purchase.",
  title: "Refund Policy",
  type: "knowledge",
  collection: "company_handbook",
});

from sixtydb import SixtyDBClient

client = SixtyDBClient(api_key="your-api-key")

# Store a personal memory
client.memory.ingest(
    text="I prefer vegetarian food and am lactose intolerant",
    type="user",
    infer=True,
)

# Store a knowledge base entry
client.memory.ingest(
    text="Our refund policy allows returns within 30 days of purchase.",
    title="Refund Policy",
    type="knowledge",
    collection="company_handbook",
)

curl -X POST https://api.60db.com/memory/ingest \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "I prefer vegetarian food and am lactose intolerant",
    "type": "user",
    "infer": true
  }'

Response

{
  "success": true,
  "data": {
    "collection_id": "user_abc123",
    "collection_label": "Personal Memories",
    "memory_type": "user",
    "total_queued": 1,
    "results": [
      { "id": "mem_01HV...", "status": "pending", "message": "Queued for processing" }
    ]
  }
}

Memories are processed asynchronously. Poll GET /memory/:id/status to check if ingestion is complete.

Uploading documents

For longer-form content — PDFs, Word docs, spreadsheets, scanned pages, emails — use POST /memory/documents/extract. The server handles format detection, OCR, and chunking for you, so you just upload the raw file and 60db does the rest. Under the hood, 60db runs your file through its document extraction engine (91+ formats with built-in OCR for scanned documents), splits the extracted text into overlapping chunks, and ingests each chunk as a knowledge-type memory in one batch.

JavaScript
Python
cURL

// Upload a PDF from a file input
const form = new FormData();
form.append('file', pdfFile);           // File or Blob
form.append('collection', 'company_handbook');
form.append('type', 'knowledge');
form.append('title', 'Employee Handbook 2026');

const res = await fetch('https://api.60db.com/memory/documents/extract', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer sk_your_api_key' },
  body: form,  // do NOT set Content-Type — browser adds multipart boundary
});

const { data } = await res.json();
console.log(`Uploaded ${data.filename} → ${data.chunks} chunks (${data.characters} chars)`);

import requests

with open('handbook.pdf', 'rb') as f:
    res = requests.post(
        'https://api.60db.com/memory/documents/extract',
        headers={'Authorization': 'Bearer sk_your_api_key'},
        files={'file': ('handbook.pdf', f, 'application/pdf')},
        data={
            'collection': 'company_handbook',
            'type': 'knowledge',
            'title': 'Employee Handbook 2026',
        },
    )

data = res.json()['data']
print(f"Uploaded {data['filename']} → {data['chunks']} chunks")

curl -X POST https://api.60db.com/memory/documents/extract \
  -H "Authorization: Bearer sk_your_api_key" \
  -F "[email protected]" \
  -F "collection=company_handbook" \
  -F "type=knowledge" \
  -F "title=Employee Handbook 2026"

What you can upload

Category	Formats
Documents	PDF, DOCX, DOC, ODT, RTF, TXT, MD, HTML, EPUB
Spreadsheets	XLSX, XLS, CSV, ODS
Presentations	PPTX, PPT, ODP
Email	EML, MSG, PST, MBOX
Images (OCR)	PNG, JPG, JPEG, TIFF, BMP, GIF
Code & structured	JSON, XML, YAML, LaTeX
Archives	ZIP, TAR, GZIP, 7Z (extracted recursively)

Max file size: 200 MB. Max chunks per document: 100 (tune chunk_size for larger docs).

Response

{
  "success": true,
  "data": {
    "collection_id": "company_handbook",
    "collection_label": "Company Handbook",
    "filename": "handbook.pdf",
    "chunks": 18,
    "characters": 24680,
    "memory_type": "knowledge",
    "total_queued": 18,
    "results": [ /* one {id, status, message} per chunk */ ],
    "metadata": {
      "source": "document_upload",
      "mime_type": "application/pdf",
      "page_count": 24,
      "detected_languages": ["eng"],
      "total_chunks": 18
    }
  }
}

The returned metadata.page_count and detected_languages come from the document extraction engine and are useful for displaying upload progress or filtering by source language. Scanned PDFs will list the OCR-detected language codes (eng, fra, spa, etc.).

Chunking controls

Two optional form fields tune how text is split:

Field	Default	Description
`chunk_size`	1500	Max characters per chunk (200–8000)
`chunk_overlap`	200	Characters of overlap between chunks (< chunk_size)

See the API reference for a tuning table by document type.

Searching memories

JavaScript
Python
cURL

const results = await client.memory.search({
  query: "What are my dietary preferences?",
  mode: "fast",
  max_results: 10,
  alpha: 0.8, // 0.8 semantic / 0.2 keyword
  recency_bias: 0.1,
});

results.data.sources.forEach(source => {
  console.log(source.title, source.text, source.score);
});

results = client.memory.search(
    query="What are my dietary preferences?",
    mode="fast",
    max_results=10,
    alpha=0.8,
    recency_bias=0.1,
)

for source in results["data"]["sources"]:
    print(source["title"], source["text"], source["score"])

curl -X POST https://api.60db.com/memory/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are my dietary preferences?",
    "mode": "fast",
    "max_results": 10,
    "alpha": 0.8
  }'

Tuning search

Parameter	Default	Description
`mode`	`fast`	`fast` (single-query) or `thinking` (multi-query reranked)
`alpha`	0.8	Weight of semantic search: 0 = keyword only, 1 = semantic only
`recency_bias`	0.0	Weight given to newer memories (0-1)
`max_results`	10	Max results returned (capped at 50)
`graph_context`	false	Include knowledge-graph relationships in response

Context assembly (RAG for SLM chat)

The /memory/context endpoint is purpose-built for retrieval-augmented generation. Given a user query, it fetches the most relevant memories, recent events, and graph relationships, and returns a pre-formatted context string ready to prepend to your LLM prompt.

JavaScript
Python

// Before calling the SLM, assemble context
const ctx = await client.memory.context({
  query: userMessage,
  session_id: chatSessionId,
  top_k: 8,
  max_context_length: 2000,
  include_timeline: true,
  include_graph: false,
});

// Prepend to system message
const systemMessage = `You are a helpful assistant.

## Relevant memories
${ctx.data.prompt_ready}`;

// Call SLM chat with enriched context
const response = await client.chat.completions.create({
  model: '60db-tiny',
  messages: [
    { role: 'system', content: systemMessage },
    { role: 'user', content: userMessage },
  ],
});

# Before calling the SLM, assemble context
ctx = client.memory.context(
    query=user_message,
    session_id=chat_session_id,
    top_k=8,
    max_context_length=2000,
    include_timeline=True,
)

system_message = f"""You are a helpful assistant.

## Relevant memories
{ctx['data']['prompt_ready']}
"""

response = client.chat.completions.create(
    model="60db-tiny",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message},
    ],
)

Built-in SLM Chat integration

When using 60db’s UI at /app/slm-chat, there’s a Memory toggle next to Auto-clear. When enabled, every message you send is pre-processed:

Your message is sent to /memory/context with your session ID
Relevant memories and recent events are fetched (semantic + keyword + temporal)
The returned prompt_ready string is prepended to the system message
The enriched prompt goes to the SLM

This means your AI chat remembers context across sessions automatically. Toggle it off if you want a fresh, memoryless conversation.

Collections management

// List your collections
const collections = await client.memory.collections.list();

// Create a team collection (admin/owner only)
await client.memory.collections.create({
  collection_id: "customer_support",
  label: "Customer Support KB",
  kind: "team",
  shared: true,
});

Role-based access

Memory operations are gated by your workspace role:

Action	Owner	Admin	Developer	Member	Viewer
Search memories	✓	✓	✓	✓	✓
Create personal memory	✓	✓	✓	✓	—
Delete own memory	✓	✓	✓	✓	—
Delete any memory	✓	✓	—	—	—
Create team collection	✓	✓	—	—	—
Create knowledge/hive	✓	✓	—	—	—
Export memories	✓	✓	✓	✓	—

API key access

To use Memory via an API key (for programmatic access), the key must have the memory scope. When creating an API key in Settings → Developers, check the “Memory & RAG” box.

// API key with memory scope
const response = await fetch('https://api.60db.com/memory/search', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk_your_api_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ query: 'user preferences', max_results: 5 }),
});

Pricing

Memory is pay-as-you-go — no subscription, no seat pricing, no minimum commitment. You pay only for the operations you run, deducted from a single workspace wallet you top up via Stripe, Razorpay, or Dodo Payments.

Operation	Rate
Ingest a memory	$0.0001 per 1,000 characters
Upload a document (extract)	$0.003 per MB
Upload a document (ingest)	$0.0001 per 1,000 extracted characters
Search (hybrid recall)	$0.0003 per query
Context assembly (LLM-ready)	$0.0005 per query

Real-world cost examples:

A knowledge base with 100 MB of docs + 10,000 searches/month → ~$23/month
A personal assistant with 1,000 user memories + 500 searches/day → ~$5/month
A support bot with 1 GB of docs + 100,000 searches/month → ~$53/month

Compared to proprietary memory services at

249–

5,000/month flat, 60db Memory is 5–50x cheaper for most workloads — and you only pay for what you actually use.

Every billable request returns these response headers so you can track spend without polling:

x-credit-balance: 9.465200    ← wallet balance after this charge
x-credit-charged: 0.000300    ← amount charged for this request
x-billing-tx: 84ffd09e-...    ← audit row UUID

Automatic refunds — if a request fails after being charged (upstream outage, corrupt file, etc.), the charge is reversed automatically and logged as a compensating row in transaction_log. No support tickets required. Never billed — listing collections, creating collections, checking memory status, deleting memories, and GET /memory/usage are always free so you can still manage your data when the wallet is empty. See the full Pricing & Billing reference for rate details, refund policy, and the complete header/error reference.

Handling insufficient credits

When the wallet runs out, billable endpoints return HTTP 402:

{
  "success": false,
  "message": "Insufficient credits",
  "error_code": "INSUFFICIENT_CREDITS",
  "details": {
    "required": 0.0003,
    "available": 0.00001,
    "shortfall": 0.00029
  }
}

Your client should catch error_code === "INSUFFICIENT_CREDITS" and prompt the user to top up. Here’s a pattern for the search endpoint:

async function searchWithTopUpPrompt(query) {
  const res = await fetch('https://api.60db.com/memory/search', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer sk_your_api_key',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ query, max_results: 10 }),
  });

  if (res.status === 402) {
    const err = await res.json();
    showTopUpModal({
      shortfall: err.details.shortfall,
      available: err.details.available,
      topUpUrl: 'https://app.60db.com/app/billing',
    });
    return null;
  }

  const data = await res.json();
  console.log(`Balance: $${res.headers.get('x-credit-balance')}`);
  return data;
}

Tracking usage

Call GET /memory/usage to get a monthly spend breakdown by operation type. This is what powers the Spend this month card on the 60db Memory dashboard.

const res = await fetch('https://api.60db.com/memory/usage?period=current_month', {
  headers: { 'Authorization': 'Bearer sk_your_api_key' },
});
const { data } = await res.json();

console.log(`Spent $${data.total.net_spend_usd.toFixed(4)} this month`);
console.log(`Wallet: $${data.billing_owner.current_balance_usd.toFixed(4)}`);

Failure handling

The Memory service is designed to degrade gracefully:

If the memory layer is temporarily unreachable, POST /memory/ingest queues your memory in a retry table, returns 202 Accepted, and automatically refunds the charge so you aren’t billed for work that didn’t happen.
POST /memory/context returns an empty prompt on outage — your SLM chat still works, just without memory context. No charge when context is empty.
POST /memory/search returns 503 — the UI shows a “Memory temporarily unavailable” banner without blocking other features. Auto-refunded.
POST /memory/documents/extract auto-refunds the extract fee if extraction fails (corrupt file, empty PDF, OCR error). If extraction succeeds but the wallet can’t cover the post-extraction ingest fee, the extract fee is refunded and a 402 is returned.

Limits

Ingest batch: Up to 100 memories per request
Memory text: Max 100,000 characters per entry
Query length: Max 2,000 characters
Results: Max 50 per search (refine query for more precise results)
Context length: Max 16,000 tokens assembled per request
Document upload: Max 200 MB per file, max 100 chunks per document
Rate limit: 30 ingests/min per workspace

​Overview

Hybrid Search

Context Assembly

Graph Relationships

Multi-Collection

Document Upload

91+ Formats

​Core concepts

​Memory collections

​Memory types

​Two search modes

​Storing memories

​Response

​Uploading documents

​What you can upload

​Response

​Chunking controls

​Searching memories

​Tuning search

​Context assembly (RAG for SLM chat)

​Built-in SLM Chat integration

​Collections management

​Role-based access

​API key access

​Pricing

​Handling insufficient credits

​Tracking usage

​Failure handling

​Limits

​Further reading

Overview

Core concepts

Memory collections

Memory types

Two search modes

Storing memories

Response

Uploading documents

What you can upload

Response

Chunking controls

Searching memories

Tuning search

Context assembly (RAG for SLM chat)

Built-in SLM Chat integration

Collections management

Role-based access

API key access

Pricing

Handling insufficient credits

Tracking usage

Failure handling

Limits

Further reading