Site Metadata Extractor
Site Metadata Extractor pulls major SEO-relevant metadata from multiple pages at once. Enter one URL per line up to 10 URLs, view the results in a table, and download a CSV.
Each input URL is fetched server-side for HTML parsing. Connections to private IPs or localhost are rejected.
Each fetch times out after 8 seconds and reads only the first 2MB of the HTML body.
About Site Metadata Extractor
Site Metadata Extractor pulls major SEO-relevant metadata from multiple pages at once. Enter one URL per line up to 10 URLs, view the results in a table, and download a CSV.
Extracted fields are 10 in total: title, description, robots, canonical URL, HTTP status, final URL (after redirects), lang, charset, viewport, and theme-color. It is handy for sweeping audits of an SEO checklist or comparing the metadata before and after a site refresh.
The page is fetched through the server and the HTML is parsed there, so it works around browser CORS limits. Any site that does not block crawlers should extract cleanly.
How to use
- Paste one URL per line into the input (up to 10).
- Click "Extract" — each URL is fetched, the HTML is parsed, and results appear in a table.
- The table shows title / description / robots / canonical and related values per row.
- Toggle "Show Japanese header names" to switch the table and CSV headers to Japanese.
- Click "Download CSV" to save in a spreadsheet-friendly format (UTF-8 + BOM).
Use cases
- SEO leads sweeping the top, product, careers, and news pages of a corporate site for title and description hygiene.
- Web agencies comparing metadata before and after a site refresh.
- Marketers comparing competitor page titles in one view.
- Operators checking the final URL and HTTP status after redirects.
- SEO consultants auditing multiple pages for accidental noindex / nofollow.
Notes
- Up to 10 URLs per request.
- Connections to private IP addresses or localhost are refused for safety.
- Each URL has an 8-second fetch timeout — slow servers may error.
- Only the first 2 MB of HTML is read; very large pages may not yield metadata.
- Sites that rewrite metadata via JavaScript (SPAs) yield only the values present in the initial HTML, which may differ from what crawlers see.
- Sites behind basic auth, bot blocking, or regional restrictions may not be fetchable.