Extracting Emails from HTML Source Code: A Complete Guide
The visible text on a webpage is only half the story. Email addresses frequently hide in HTML source code — in mailto: links, data attributes, JSON-LD blocks, and even HTML comments. Knowing where to look unlocks contacts that surface-level extraction misses entirely.
Where Emails Hide in HTML
1. Mailto Links
The most common hiding spot. A link that looks like “Contact Us” on the page often contains the actual email in its href attribute:
<a href="mailto:sales@company.com">Contact Us</a>
Some developers add subject lines and CC addresses:
<a href="mailto:sales@company.com?subject=Inquiry&cc=support@company.com">
Get in Touch
</a>
That’s two emails from a single link — and neither is visible as text on the page.
2. Data Attributes
Modern web applications store emails in custom data attributes for JavaScript to reference:
<button data-email="john@company.com" data-team="sales">
Send Message
</button>
These attributes are invisible to users but fully present in the source code.
3. JSON-LD Structured Data
Search engines encourage websites to include structured data. This often contains contact information:
<script type="application/ld+json">
{
"@type": "Organization",
"name": "Company Inc",
"email": "info@company.com",
"contactPoint": {
"@type": "ContactPoint",
"email": "support@company.com"
}
}
</script>
These blocks are embedded in the HTML <head> or <body> but never rendered visually.
4. HTML Comments
Developers sometimes leave contact info in comments during development:
<!-- Contact: webmaster@company.com for site issues -->
Comments are stripped from the rendered page but remain in the source code.
5. Form Action URLs
Contact forms sometimes reveal email endpoints:
<form action="https://formspree.io/f/contact@company.com" method="POST">
6. Meta Tags
Some pages include contact emails in meta tags:
<meta name="author" content="John Smith (john@company.com)">
How to Extract Emails from HTML
Manual Method
- Right-click the page → View Page Source (or press Ctrl+U)
- Press Ctrl+F and search for
@— this finds most email addresses - Search for
mailto:to find linked emails - Search for
"email"to find JSON-LD and data attributes
Limitation: Tedious for large pages, easy to miss obfuscated addresses.
Automated Method
Paste the entire HTML source into extractor.email’s extraction tool. The engine scans:
- Visible text content
- All HTML attributes (href, data-*, content, etc.)
- Embedded JSON and script blocks
- Comments and meta tags
Results are deduplicated and validated automatically.
For multiple pages, use URL fetching — the tool fetches the raw HTML server-side and processes it in your browser, extracting from all hidden locations.
Common Obfuscation Techniques
Website owners sometimes obfuscate emails to prevent automated collection. Understanding these techniques helps you identify whether addresses are being hidden:
HTML Entity Encoding
john@company.com
This renders as john@company.com in the browser but looks like random numbers in the source.
Text Direction Tricks
<span style="unicode-bidi:bidi-override;direction:rtl">moc.ynapmoc@nhoj</span>
The text is reversed in the source code but displays correctly due to CSS direction override.
JavaScript Assembly
<script>
var user = "john";
var domain = "company.com";
document.write(user + "@" + domain);
</script>
The email doesn’t exist in the HTML at all — it’s constructed at runtime by JavaScript.
At-Symbol Replacement
john [at] company [dot] com
john(at)company(dot)com
john AT company DOT com
These are human-readable substitutions that prevent simple regex matching. Advanced extraction tools can detect and normalize these patterns.
Best Practices for HTML Extraction
- Always check the full source — Don’t just copy visible text
- Look for structured data — JSON-LD blocks are goldmines for contact info
- Check multiple pages — Team pages, About pages, and footer partials often contain different addresses
- Use the right tool — Manual Ctrl+F works for one page; automated tools scale to hundreds
- Validate results — Not every
@match is a real email address
Try It Now
Grab any page’s HTML source and paste it into the free email extractor. The engine finds emails in all the hidden locations described above — mailto: links, data attributes, JSON-LD, comments, and more.
No sign-up required. Your HTML never leaves your browser.
Related Articles
How to Clean and Validate an Email List for Free
Step-by-step guide to cleaning and validating email lists. Remove duplicates, filter invalid addresses, and prepare your list for outreach — all with free tools.
Browser-Based vs Server-Based Email Extraction: Privacy Comparison
Compare browser-side and server-side email extraction approaches. Learn why client-side processing is safer for privacy and how hybrid tools offer the best of both worlds.
Email Extraction Best Practices: Legal & Ethical Guidelines
Understand the legal and ethical boundaries of email extraction. Learn about GDPR, CAN-SPAM, and responsible data collection practices.