Sitemap
docs/Utilities/Sitemap

Sitemap

Discover all URLs on a website.

Sitemap

Discover all URLs on a website by parsing its sitemap. KnowledgeSDK checks multiple common sitemap locations, handles sitemap indexes (nested sitemaps), and automatically tries www/non-www variations.

POST/v1/sitemapx-api-key

Request body

urlstringrequired

The website URL to discover pages for. Must be a valid HTTP or HTTPS URL. You can pass any page on the site -- KnowledgeSDK will automatically locate the sitemap from the domain root.

Response

urlstring

The original URL that was provided.

urlsstring[]

An array of all discovered URLs from the sitemap (up to 5,000 URLs).

countnumber

The total number of URLs discovered.

Code examples

Example response

json snippet{}json
{
  "url": "https://example.com",
  "urls": [
    "https://example.com/",
    "https://example.com/about",
    "https://example.com/pricing",
    "https://example.com/docs",
    "https://example.com/blog",
    "https://example.com/blog/getting-started",
    "https://example.com/blog/api-reference"
  ],
  "count": 7
}

Sitemap discovery

KnowledgeSDK checks the following sitemap locations in order and stops as soon as it finds valid results:

1
Standard paths

Checks /sitemap.xml, /sitemap_index.xml, /sitemap/sitemap.xml, and /wp-sitemap.xml on the given domain.

2
Sitemap indexes

If a sitemap index is found, KnowledgeSDK fetches up to 10 child sitemaps and merges all discovered URLs.

3
www/non-www fallback

If no sitemap is found on the original domain, KnowledgeSDK tries the alternate version (adds or removes www.).

The sitemap endpoint returns up to 5,000 URLs. For websites with larger sitemaps, the results are truncated. The count field reflects the number of URLs returned, not the total on the site.