Sitemap
Discover all URLs on a website by parsing its sitemap. KnowledgeSDK checks multiple common sitemap locations, handles sitemap indexes (nested sitemaps), and automatically tries www/non-www variations.
/v1/sitemapx-api-keyRequest body
urlstringrequiredThe website URL to discover pages for. Must be a valid HTTP or HTTPS URL. You can pass any page on the site -- KnowledgeSDK will automatically locate the sitemap from the domain root.
Response
urlstringThe original URL that was provided.
urlsstring[]An array of all discovered URLs from the sitemap (up to 5,000 URLs).
countnumberThe total number of URLs discovered.
Code examples
Example response
{
"url": "https://example.com",
"urls": [
"https://example.com/",
"https://example.com/about",
"https://example.com/pricing",
"https://example.com/docs",
"https://example.com/blog",
"https://example.com/blog/getting-started",
"https://example.com/blog/api-reference"
],
"count": 7
}Sitemap discovery
KnowledgeSDK checks the following sitemap locations in order and stops as soon as it finds valid results:
Checks /sitemap.xml, /sitemap_index.xml, /sitemap/sitemap.xml, and /wp-sitemap.xml on the given domain.
If a sitemap index is found, KnowledgeSDK fetches up to 10 child sitemaps and merges all discovered URLs.
If no sitemap is found on the original domain, KnowledgeSDK tries the alternate version (adds or removes www.).
The sitemap endpoint returns up to 5,000 URLs. For websites with larger sitemaps, the results are truncated. The count field reflects the number of URLs returned, not the total on the site.