Upload Document

curl -X POST https://api.60db.ai/memory/documents/extract \
  -H "Authorization: Bearer your-api-key" \
  -F "[email protected]" \
  -F "collection=company_handbook" \
  -F "type=knowledge" \
  -F "title=Q4 2026 Report"

{
  "success": true,
  "data": {
    "collection_id": "company_handbook",
    "collection_label": "Company Handbook",
    "filename": "quarterly-report.pdf",
    "chunks": 18,
    "characters": 24680,
    "memory_type": "knowledge",
    "total_queued": 18,
    "results": [
      { "id": "mem_01HV8K...", "status": "pending", "message": "Queued for processing" },
      { "id": "mem_01HV8L...", "status": "pending", "message": "Queued for processing" }
    ],
    "metadata": {
      "source": "document_upload",
      "filename": "quarterly-report.pdf",
      "mime_type": "application/pdf",
      "page_count": 24,
      "detected_languages": ["eng"],
      "total_chunks": 18
    }
  }
}

POST

memory

documents

extract

curl -X POST https://api.60db.ai/memory/documents/extract \
  -H "Authorization: Bearer your-api-key" \
  -F "[email protected]" \
  -F "collection=company_handbook" \
  -F "type=knowledge" \
  -F "title=Q4 2026 Report"

{
  "success": true,
  "data": {
    "collection_id": "company_handbook",
    "collection_label": "Company Handbook",
    "filename": "quarterly-report.pdf",
    "chunks": 18,
    "characters": 24680,
    "memory_type": "knowledge",
    "total_queued": 18,
    "results": [
      { "id": "mem_01HV8K...", "status": "pending", "message": "Queued for processing" },
      { "id": "mem_01HV8L...", "status": "pending", "message": "Queued for processing" }
    ],
    "metadata": {
      "source": "document_upload",
      "filename": "quarterly-report.pdf",
      "mime_type": "application/pdf",
      "page_count": 24,
      "detected_languages": ["eng"],
      "total_chunks": 18
    }
  }
}

Upload a document to have it extracted, chunked, and ingested into a memory collection in a single request. 60db’s document extraction engine handles 91+ file formats and includes built-in OCR for scanned PDFs and images, so you can send the raw file and let the server do the rest. The browser just posts the file and 60db handles format detection, OCR, chunking, and ingestion.

Supported formats (partial list — 91 total):

Documents: PDF, DOCX, DOC, ODT, RTF, TXT, MD, HTML, EPUB
Spreadsheets: XLSX, XLS, CSV, ODS
Presentations: PPTX, PPT, ODP
Email: EML, MSG, PST, MBOX
Images (OCR): PNG, JPG, JPEG, TIFF, BMP, GIF
Code & structured: JSON, XML, YAML, LaTeX, Markdown variants
Archives: ZIP, TAR, GZIP, 7Z (extracted recursively)

Request

Headers

Authorization

string

required

Bearer token with your API key

Content-Type

string

multipart/form-data

Body (multipart/form-data)

file

required

The document to extract. Max 200 MB per file.

collection

string

Collection ID to store the extracted chunks in. Defaults to the caller’s personal collection.

type

string

default:"knowledge"

Memory type for the ingested chunks. One of: user, knowledge, hive. For document uploads, knowledge is almost always the right choice.

title

string

Display title for the document. Defaults to the uploaded filename. When the document produces multiple chunks, each chunk is labeled "{title} (part N/M)".

chunk_size

integer

default:"1500"

Maximum characters per chunk. Larger chunks preserve more context but are less precise for recall. Minimum 200, maximum 8000.

chunk_overlap

integer

default:"200"

Characters of overlap between adjacent chunks. Helps preserve sentences that span chunk boundaries. Must be less than chunk_size.

Response

success

boolean

true on success.

data

object

Show properties

collection_id

string

The collection the chunks were stored in.

collection_label

string

Human-readable collection name.

filename

string

Original filename of the uploaded document.

chunks

integer

Number of chunks produced from the extracted text.

characters

integer

Total characters of extracted text.

memory_type

string

The memory type used for ingest (user, knowledge, or hive).

total_queued

integer

Number of memories queued for processing (equals chunks).

results

array

Array of {id, status, message} — one entry per chunk ingested. Use the IDs with GET /memory/:id/status to poll for processing completion.

metadata

object

Extracted document metadata, including mime_type, filename, page_count (for PDFs), detected_languages, and total_chunks.

Examples

curl -X POST https://api.60db.ai/memory/documents/extract \
  -H "Authorization: Bearer your-api-key" \
  -F "[email protected]" \
  -F "collection=company_handbook" \
  -F "type=knowledge" \
  -F "title=Q4 2026 Report"

{
  "success": true,
  "data": {
    "collection_id": "company_handbook",
    "collection_label": "Company Handbook",
    "filename": "quarterly-report.pdf",
    "chunks": 18,
    "characters": 24680,
    "memory_type": "knowledge",
    "total_queued": 18,
    "results": [
      { "id": "mem_01HV8K...", "status": "pending", "message": "Queued for processing" },
      { "id": "mem_01HV8L...", "status": "pending", "message": "Queued for processing" }
    ],
    "metadata": {
      "source": "document_upload",
      "filename": "quarterly-report.pdf",
      "mime_type": "application/pdf",
      "page_count": 24,
      "detected_languages": ["eng"],
      "total_chunks": 18
    }
  }
}

Pipeline

When you POST a file, 60db runs it through this pipeline:

Validate — file present, type allowed, under 200 MB, collection accessible.
Extract — the document extraction engine detects the format (PDF, DOCX, image, etc.) and returns plain text plus metadata (mime_type, page_count, tables, quality_score). OCR is applied automatically for scanned PDFs and images.
Chunk — split the extracted text into overlapping segments of chunk_size characters with chunk_overlap character overlap.
Register collection — ensure the target collection is ready (idempotent, cached).
Ingest — stream all chunks into the memory layer in a single batch.
Return — the response includes one {id, status, message} entry per chunk. Processing continues asynchronously.

For very large documents, prefer a higher chunk_size (e.g. 3000) to keep the per-chunk memory count low. The endpoint rejects uploads that produce more than 100 chunks with a TOO_MANY_CHUNKS error — split such files before upload or use a larger chunk size.

Tuning

Document type	Recommended `chunk_size`	`chunk_overlap`	Notes
Technical docs / API refs	1500	200	Default. Balances recall precision and context.
Long-form prose / books	2500	300	Fewer chunks, more context per result.
FAQs / short snippets	800	100	Higher precision — each Q&A becomes its own chunk.
Spreadsheet exports	3000	0	Tables should stay contiguous; overlap hurts.
Scanned PDFs (OCR)	2000	250	OCR adds whitespace noise; slightly longer chunks help.

Billing

Document upload is two-stage billing — you pay for the extraction and for the resulting ingest.

Stage	Rate	When charged	Refund on failure
Extract fee	$0.003 per MB	Before extraction runs	Yes, auto
Ingest fee	$0.0001 per 1,000 extracted characters	After extraction, before ingest	Yes, auto

The two charges are separate rows in transaction_log (MEMORY_EXTRACT and MEMORY_INGEST) so you can distinguish extraction cost from storage cost in your reporting. Example — uploading a 2 MB PDF that extracts to 50,000 characters of text:

Extract fee: 2 × $0.003         = $0.006
Ingest fee:  (50,000 / 1000) × $0.0001 = $0.005
Total:                            $0.011

Response headers on success:

Header	Meaning
`x-credit-balance`	Wallet balance after the extract fee was deducted (set before ingest runs)
`x-credit-charged`	Just the extract fee
`x-credit-charged-total`	Extract fee + ingest fee combined
`x-billing-tx`	UUID of the extract audit row (the ingest row is linked via metadata)

Special failure case — if extraction succeeds but your wallet can’t cover the post-extraction ingest charge, the extract fee is automatically refunded and the response is 402 INSUFFICIENT_CREDITS with details.extract_fee_refunded populated. You pay nothing for the failed attempt. See Pricing & Billing for the full policy.

Error responses

Status	Code	Meaning
400	`NO_FILE`	No file was attached to the request.
400	`INVALID_TYPE`	`type` must be `user`, `knowledge`, or `hive`.
402	`INSUFFICIENT_CREDITS`	Wallet cannot cover the extract fee (pre-charge) OR the post-extraction ingest fee (in which case `details.extract_fee_refunded` is populated).
403	`POLICY_DENY`	Your role (e.g. `viewer`) is not allowed to create memories.
404	`COLLECTION_NOT_FOUND`	The specified `collection` doesn’t exist or you can’t access it.
413	`TOO_MANY_CHUNKS`	Document produced more than 100 chunks. Increase `chunk_size` or split the file. Full auto-refund.
422	`EMPTY_EXTRACTION`	Document contains no extractable text. Full auto-refund.
422	`EMPTY_CHUNKS`	Chunking produced zero segments. Full auto-refund.
422	`EXTRACTION_FAILED`	Extraction engine rejected the file. Full auto-refund.
503	`MEMORY_INFRA_NOT_READY`	The workspace’s memory layer is still provisioning. Retry in ~10s. Full auto-refund.
503	`EXTRACTION_SERVICE_UNAVAILABLE`	Document extraction is temporarily unavailable. Full auto-refund.
202	`MEMORY_QUEUED`	Memory layer is temporarily unreachable; chunks were queued and will retry automatically. Both extract and ingest fees are refunded because the work won’t actually happen.

Checking ingestion status

The endpoint returns immediately once chunks are queued — full embedding/indexing happens asynchronously. Poll GET /memory/:id/status with any of the returned chunk IDs to check progress:

curl https://api.60db.com/memory/mem_01HV8K.../status \
  -H "Authorization: Bearer sk_abc123"

Statuses: pending → processing → ready (or failed).

Size limits

Per file: 200 MB
Chunks per document: 100 (use a larger chunk_size to fit bigger files)
Chunk text length: 100,000 characters
Supported languages for OCR: 100+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Hindi, and more
Rate limit: Same as POST /memory/ingest/batch (30 uploads/min per workspace on default plans)

Ingest Memory Search Memories

​Request

​Headers

​Body (multipart/form-data)

​Response

​Examples

​Pipeline

​Tuning

​Billing

​Error responses

​Checking ingestion status

​Size limits

Request

Headers

Body (multipart/form-data)

Response

Examples

Pipeline

Tuning

Billing

Error responses

Checking ingestion status

Size limits