Web Scrape {API}
Discover every page on any website by crawling its sitemap.
Pass a domain name and get back up to 500 deduplicated page URLs. Sitemap index files are crawled recursively. Non-page resources are filtered out automatically.
Ideal for building content indexes, seeding crawlers, or auditing a competitor's full site structure in seconds.
What You Get
Each request crawls a domain's sitemaps and returns all discoverable page URLs.
- Up to 500 page URLs — Deduplicated, page-only results filtered of images and PDFs
- Sitemap index support — Recursively crawls nested sitemap index files automatically
- Crawl metadata — Know how many sitemaps were discovered, fetched, skipped, and errored
- Normalized domain input — Pass just the domain name; protocol handling is automatic
How It Works
- 01
Send a domain
Pass the domain name (e.g., “example.com”) — no protocol needed
- 02
Sitemap files discovered
The API checks robots.txt and common sitemap paths, then recursively follows sitemap index files
- 03
URLs extracted and deduplicated
Non-page resources (images, PDFs) are filtered; duplicate URLs are removed
- 04
Clean URL list returned
Up to 500 page URLs with metadata on how many sitemaps were crawled
API Response
GET /v1/web/scrape/sitemap?domain=brand.dev{
"success": true,
"domain": "brand.dev",
"urls": [
"https://brand.dev/",
"https://brand.dev/pricing",
"https://brand.dev/blog",
"https://brand.dev/data/logo-api",
"https://brand.dev/use-cases/logo-link",
"... up to 500 URLs"
],
"meta": {
"sitemapsDiscovered": 3,
"sitemapsFetched": 3,
"sitemapsSkipped": 0,
"errors": 0
}
}Personalize at scale
Join 4,000+ businesses using Brand.dev to personalize their products.













