URL Extractor — Extract All Links from Text or HTML
Extract every URL (http, https, ftp, mailto) from pasted text. Dedupe, validate, classify by domain. Free, in-browser.
About URL Extractor
A URL extractor scans text or HTML and pulls every URL it contains — http(s), ftp, mailto, tel, and custom schemes — producing a clean, deduplicated list ready for link-checking, archival, redirect testing, or competitive analysis. The ZTools URL Extractor runs entirely in the browser, recognises both bare URLs and HTML <a href="…">, deduplicates exact matches and normalised matches (trailing slash, scheme), and exports plain text or CSV with domain classification.
Use cases
- Link audit on a long article. Paste the full HTML or markdown of a published article; extractor lists every outgoing link. Run them through a link-checker tool to catch 404s before readers do.
- Backlink list assembly. Forum thread, comments page, or directory listing with many references to other sites. Extract for SEO research or partner outreach.
- Archiving research bookmarks. Long email thread with embedded URLs. Extract once for a flat reading list rather than scrolling back through quoted replies.
- Cleaning markdown / converting to plain links. Convert mixed [text](url) and bare URLs into a single normalised list for migration to a different format.
How it works
- Paste source. Plain text, HTML, markdown, JSON. Tool tokenises preserving URL boundaries.
- Match URL patterns. Regex matches: scheme://host[:port][/path][?query][#fragment]. Plus mailto:, tel:, ftp:, ws://, wss:// schemes.
- Normalise. Optional: collapse trailing slashes, downcase host, strip default ports (:80, :443). Normalisation increases dedup hits.
- Classify by domain. Each URL tagged with its registrable domain (example.com from sub.example.com/path). Useful for grouping.
- Export. Plain list, CSV with domain + path + query columns, or JSON for downstream processing.
Examples
Input: "See https://a.com and visit b.com/page or read www.c.org/article"
Output: 3 URLs: https://a.com, b.com/page, www.c.org/article (with optional auto-prefix to https://).
Input: Markdown: "[click](https://x.com) or https://y.com"
Output: https://x.com, https://y.com.
Input: HTML: <a href="https://a.com">a</a> + plain "see https://a.com"
Output: https://a.com (deduplicated).
Frequently asked questions
Does it find bare domains without scheme?
Optional. "Strict" mode requires explicit scheme; "loose" mode catches "example.com/path". Loose mode has more false positives (e.g. file paths matching domain pattern).
How are tracking parameters handled?
Optional UTM stripping to canonicalise URLs (drop utm_source, utm_medium, etc.). Useful for dedup and clean archives.
Is the input uploaded?
No — client-side only. Privacy by design.
Can I extract from a live URL?
No — extractor reads pasted text. To extract from a live page, save the page HTML and paste it.
Why do mailto: links show up?
They are URIs (mailto: is a URI scheme). Filter by scheme if you only want web URLs (http/https).
How do I sort by domain frequency?
CSV export includes domain column; sort in spreadsheet to find most-linked sites.
Pro tips
- Run extracted lists through a link-checker before publishing — broken outbound links hurt SEO and reader trust.
- Strip UTM parameters before archival; they obscure the canonical resource.
- For competitive analysis, extract URLs from competitor blog posts and find common backlink targets.
- When extracting from forums, dedupe before opening — many threads quote the same URL repeatedly.
- Combine with domain-extractor for high-level domain frequency analysis.
Reviewed by Ahsan Mahmood · Last updated 2026-05-05 · Part of ZTools.
For the full,
formatted version of this page, please enable JavaScript and reload
https://ztools.zaions.com/url-extractor.