Extracting Emails from HTML Source Code: A Complete Guide

by extractor.email Technical 6 min read

The visible text on a webpage is only half the story. Email addresses frequently hide in HTML source code — in mailto: links, data attributes, JSON-LD blocks, and even HTML comments. Knowing where to look unlocks contacts that surface-level extraction misses entirely.

Where Emails Hide in HTML

The most common hiding spot. A link that looks like “Contact Us” on the page often contains the actual email in its href attribute:

<a href="mailto:sales@company.com">Contact Us</a>

Some developers add subject lines and CC addresses:

<a href="mailto:sales@company.com?subject=Inquiry&cc=support@company.com">
  Get in Touch
</a>

That’s two emails from a single link — and neither is visible as text on the page.

2. Data Attributes

Modern web applications store emails in custom data attributes for JavaScript to reference:

<button data-email="john@company.com" data-team="sales">
  Send Message
</button>

These attributes are invisible to users but fully present in the source code.

3. JSON-LD Structured Data

Search engines encourage websites to include structured data. This often contains contact information:

<script type="application/ld+json">
{
  "@type": "Organization",
  "name": "Company Inc",
  "email": "info@company.com",
  "contactPoint": {
    "@type": "ContactPoint",
    "email": "support@company.com"
  }
}
</script>

These blocks are embedded in the HTML <head> or <body> but never rendered visually.

4. HTML Comments

Developers sometimes leave contact info in comments during development:

<!-- Contact: webmaster@company.com for site issues -->

Comments are stripped from the rendered page but remain in the source code.

5. Form Action URLs

Contact forms sometimes reveal email endpoints:

<form action="https://formspree.io/f/contact@company.com" method="POST">

6. Meta Tags

Some pages include contact emails in meta tags:

<meta name="author" content="John Smith (john@company.com)">

How to Extract Emails from HTML

Manual Method

  1. Right-click the page → View Page Source (or press Ctrl+U)
  2. Press Ctrl+F and search for @ — this finds most email addresses
  3. Search for mailto: to find linked emails
  4. Search for "email" to find JSON-LD and data attributes

Limitation: Tedious for large pages, easy to miss obfuscated addresses.

Automated Method

Paste the entire HTML source into extractor.email’s extraction tool. The engine scans:

Results are deduplicated and validated automatically.

For multiple pages, use URL fetching — the tool fetches the raw HTML server-side and processes it in your browser, extracting from all hidden locations.

Common Obfuscation Techniques

Website owners sometimes obfuscate emails to prevent automated collection. Understanding these techniques helps you identify whether addresses are being hidden:

HTML Entity Encoding

&#106;&#111;&#104;&#110;&#64;&#99;&#111;&#109;&#112;&#97;&#110;&#121;&#46;&#99;&#111;&#109;

This renders as john@company.com in the browser but looks like random numbers in the source.

Text Direction Tricks

<span style="unicode-bidi:bidi-override;direction:rtl">moc.ynapmoc@nhoj</span>

The text is reversed in the source code but displays correctly due to CSS direction override.

JavaScript Assembly

<script>
  var user = "john";
  var domain = "company.com";
  document.write(user + "@" + domain);
</script>

The email doesn’t exist in the HTML at all — it’s constructed at runtime by JavaScript.

At-Symbol Replacement

john [at] company [dot] com
john(at)company(dot)com
john AT company DOT com

These are human-readable substitutions that prevent simple regex matching. Advanced extraction tools can detect and normalize these patterns.

Best Practices for HTML Extraction

  1. Always check the full source — Don’t just copy visible text
  2. Look for structured data — JSON-LD blocks are goldmines for contact info
  3. Check multiple pages — Team pages, About pages, and footer partials often contain different addresses
  4. Use the right tool — Manual Ctrl+F works for one page; automated tools scale to hundreds
  5. Validate results — Not every @ match is a real email address

Try It Now

Grab any page’s HTML source and paste it into the free email extractor. The engine finds emails in all the hidden locations described above — mailto: links, data attributes, JSON-LD, comments, and more.

No sign-up required. Your HTML never leaves your browser.

Share this article

Ready to extract emails?

Free, private, no sign-up required.

Open Email Extractor