Extract text from a document (PDF, DOCX, XLSX, scanned images…) with built-in OCR and ingest it into a memory collection in a single request
user, knowledge, hive.
For document uploads, knowledge is almost always the right choice."{title} (part N/M)".chunk_size.true on success.mime_type, page_count, tables, quality_score). OCR is applied automatically for scanned PDFs and images.chunk_size characters with chunk_overlap character overlap.{id, status, message} entry per chunk. Processing continues asynchronously.| Document type | Recommended chunk_size | chunk_overlap | Notes |
|---|---|---|---|
| Technical docs / API refs | 1500 | 200 | Default. Balances recall precision and context. |
| Long-form prose / books | 2500 | 300 | Fewer chunks, more context per result. |
| FAQs / short snippets | 800 | 100 | Higher precision — each Q&A becomes its own chunk. |
| Spreadsheet exports | 3000 | 0 | Tables should stay contiguous; overlap hurts. |
| Scanned PDFs (OCR) | 2000 | 250 | OCR adds whitespace noise; slightly longer chunks help. |
| Stage | Rate | When charged | Refund on failure |
|---|---|---|---|
| Extract fee | $0.003 per MB | Before extraction runs | Yes, auto |
| Ingest fee | $0.0001 per 1,000 extracted characters | After extraction, before ingest | Yes, auto |
transaction_log (MEMORY_EXTRACT and MEMORY_INGEST) so you can distinguish extraction cost from storage cost in your reporting.
Example — uploading a 2 MB PDF that extracts to 50,000 characters of text:
| Header | Meaning |
|---|---|
x-credit-balance | Wallet balance after the extract fee was deducted (set before ingest runs) |
x-credit-charged | Just the extract fee |
x-credit-charged-total | Extract fee + ingest fee combined |
x-billing-tx | UUID of the extract audit row (the ingest row is linked via metadata) |
402 INSUFFICIENT_CREDITS with details.extract_fee_refunded populated. You pay nothing for the failed attempt.
See Pricing & Billing for the full policy.
| Status | Code | Meaning |
|---|---|---|
| 400 | NO_FILE | No file was attached to the request. |
| 400 | INVALID_TYPE | type must be user, knowledge, or hive. |
| 402 | INSUFFICIENT_CREDITS | Wallet cannot cover the extract fee (pre-charge) OR the post-extraction ingest fee (in which case details.extract_fee_refunded is populated). |
| 403 | POLICY_DENY | Your role (e.g. viewer) is not allowed to create memories. |
| 404 | COLLECTION_NOT_FOUND | The specified collection doesn’t exist or you can’t access it. |
| 413 | TOO_MANY_CHUNKS | Document produced more than 100 chunks. Increase chunk_size or split the file. Full auto-refund. |
| 422 | EMPTY_EXTRACTION | Document contains no extractable text. Full auto-refund. |
| 422 | EMPTY_CHUNKS | Chunking produced zero segments. Full auto-refund. |
| 422 | EXTRACTION_FAILED | Extraction engine rejected the file. Full auto-refund. |
| 503 | MEMORY_INFRA_NOT_READY | The workspace’s memory layer is still provisioning. Retry in ~10s. Full auto-refund. |
| 503 | EXTRACTION_SERVICE_UNAVAILABLE | Document extraction is temporarily unavailable. Full auto-refund. |
| 202 | MEMORY_QUEUED | Memory layer is temporarily unreachable; chunks were queued and will retry automatically. Both extract and ingest fees are refunded because the work won’t actually happen. |
GET /memory/:id/status with any of the returned chunk IDs to check progress:
pending → processing → ready (or failed).
chunk_size to fit bigger files)POST /memory/ingest/batch (30 uploads/min per workspace on default plans)