Back to APIs

Unified Agent API

Document Extraction

Document Extraction

Extract structured data from any URL or content using LLM-powered analysis with a user-defined schema.

POST/v1/documents/extract

Overview

Your agent extracts structured data from any webpage or raw text. Provide a URL (we scrape it) or raw content, plus either a JSON schema or a natural language extraction prompt. We scrape via Firecrawl, then extract via Claude LLM. Returns clean structured JSON.

Parameters

url

string

URL to scrape and extract from. Either url or content is required.

content

string

Pre-scraped text content to extract from. Use this if you already have the content.

extraction_prompt

string

Natural language instructions for what to extract. Either this or schema is required.

schema

string

JSON schema defining the structure to extract. Either this or extraction_prompt is required.

Example Response

{
  "success": true,
  "data": {
    "extracted": {
      "products": [
        { "name": "Starter", "price": "$29/mo", "features": ["1K API calls", "Email support"] },
        { "name": "Pro", "price": "$99/mo", "features": ["10K API calls", "Priority support"] }
      ]
    },
    "source_url": "https://example.com/pricing",
    "content_length": 4523,
    "truncated": false
  },
  "metadata": {
    "provider_used": "llm_extraction",
    "providers_tried": ["llm_extraction"],
    "response_time_ms": 5200,
    "request_id": "req_doc_001"
  },
  "credits_used": 3
}

Get Started

Use this API through the O-mega platform. Create an API key in your dashboard, then call the endpoint with your key in the Authorization header.

Try Document Extraction

Test Document Extraction in the interactive playground. No setup required.

Open Playground
Document Extraction API | Unified Agent APIs | o-mega