Extract
docs/Utilities/Extract

Extract

Fetch a URL and return clean markdown.

Extract

Fetch any URL and receive clean, LLM-ready markdown along with page metadata and extracted links. KnowledgeSDK handles JavaScript rendering, anti-bot protection, and HTML cleanup automatically.

POST/v1/extractx-api-key

Request body

urlstringrequired

The URL to extract. Must be a valid HTTP or HTTPS URL.

Response

urlstring

The URL that was extracted (after any redirects).

markdownstring

The page content converted to clean markdown. Navigation, footers, scripts, and styles are automatically stripped.

titlestring | null

The page title extracted from the <title> tag.

descriptionstring | null

The page description extracted from the <meta name="description"> or <meta property="og:description"> tag.

linksstring[]

An array of absolute URLs found on the page (up to 500 links).

durationMsnumber

The time in milliseconds the extraction took to complete.

Code examples

Example response

json snippet{}json
{
  "url": "https://example.com",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents...",
  "title": "Example Domain",
  "description": "Example Domain for documentation purposes.",
  "links": [
    "https://www.iana.org/domains/example"
  ],
  "durationMs": 1243
}

Need to extract AND make content searchable? Use /v1/business instead. It runs the full pipeline -- extract, classify, index, and make content available via semantic search -- in a single API call.