Site Metadata Extractor

Site Metadata Extractor pulls major SEO-relevant metadata from multiple pages at once. Enter one URL per line up to 10 URLs, view the results in a table, and download a CSV.

URLs (one per line, up to 10)

Entered 0 / 10

Press "Extract" to see the result table here.

Each input URL is fetched server-side for HTML parsing. Connections to private IPs or localhost are rejected.

Each fetch times out after 8 seconds and reads only the first 2MB of the HTML body.

About Site Metadata Extractor

Site Metadata Extractor pulls major SEO-relevant metadata from multiple pages at once. Enter one URL per line up to 10 URLs, view the results in a table, and download a CSV.

Extracted fields are 10 in total: title, description, robots, canonical URL, HTTP status, final URL (after redirects), lang, charset, viewport, and theme-color. It is handy for sweeping audits of an SEO checklist or comparing the metadata before and after a site refresh.

The page is fetched through the server and the HTML is parsed there, so it works around browser CORS limits. Any site that does not block crawlers should extract cleanly.

How to use

Paste one URL per line into the input (up to 10).
Click "Extract" — each URL is fetched, the HTML is parsed, and results appear in a table.
The table shows title / description / robots / canonical and related values per row.
Toggle "Show Japanese header names" to switch the table and CSV headers to Japanese.
Click "Download CSV" to save in a spreadsheet-friendly format (UTF-8 + BOM).

Use cases

SEO leads sweeping the top, product, careers, and news pages of a corporate site for title and description hygiene.
Web agencies comparing metadata before and after a site refresh.
Marketers comparing competitor page titles in one view.
Operators checking the final URL and HTTP status after redirects.
SEO consultants auditing multiple pages for accidental noindex / nofollow.

Notes

Up to 10 URLs per request.
Connections to private IP addresses or localhost are refused for safety.
Each URL has an 8-second fetch timeout — slow servers may error.
Only the first 2 MB of HTML is read; very large pages may not yield metadata.
Sites that rewrite metadata via JavaScript (SPAs) yield only the values present in the initial HTML, which may differ from what crawlers see.
Sites behind basic auth, bot blocking, or regional restrictions may not be fetchable.

FAQ

How many URLs can I process at once?

Up to 10 — fixed at 10 to balance concurrent fetch load and UX. Run multiple batches for larger jobs.

What apps can open the CSV?

UTF-8 (BOM) output opens cleanly without mojibake in Microsoft Excel, Google Sheets, LibreOffice Calc, Numbers, and similar apps. Delimiter is comma, line ending is CRLF.

Can I extract metadata from pages behind a login?

No. The fetcher has no credentials and treats every page as public, so login-only or session-only pages are not reachable.

Are JavaScript-injected meta tags captured?

Only meta tags present in the initial HTML are captured. SPAs that swap meta tags client-side may differ from the final values. Note that Google's crawler executes JavaScript, so its understanding can differ from these results.

Can I detect noindex?

The robots column shows the meta robots value. noindex / nofollow / max-snippet etc. are reported verbatim, which is useful for catching misconfigurations or accidental noindex flags.