Skip to main content
API Reference v1

API Documentation

Fetch HTML content from any public URL through Cloudflare's edge network. Built-in rate limiting, SSRF protection, and response size controls.

1
Quick Start

POST https://extractor.email/api/fetch-page
fetch('https://extractor.email/api/fetch-page', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ url: 'https://example.com/contact' })
})
.then(res => res.json())
.then(data => console.log(data));

Currently available for extractor.email only

This API is used internally by our extraction tool. Authenticated API access for external integrations is coming in the Pro and Business plans.

2
Base URL

https://extractor.email/api

All API endpoints are served over HTTPS via Cloudflare's edge network. HTTP requests are rejected.

3
Authentication Coming Soon

The API currently validates requests by origin. Authenticated access via API keys will be available in Pro and Business plans.

Planned Authentication Header
Authorization: Bearer ex_live_abc123...

Free

Origin-only

Internal use via extractor.email

Pro — $9/mo

API Key

100 requests/day

Business — $29/mo

API Key

1,000 requests/day

4
Endpoint Reference

Currently one endpoint is available. More endpoints will be added as the platform grows.

POST /api/fetch-page

Fetches the HTML content of a given URL through Cloudflare's edge network. The response contains the raw HTML that you can parse client-side to extract emails, links, or any other data.

Request

Field Type Required Description
url string Yes The URL to fetch. Must be a public HTTP or HTTPS address. If the protocol is omitted, https:// is prepended automatically.

Request Headers

Header Required Description
Content-Type Yes Must be application/json
Origin Soft Validated against allowed origins. Requests from unrecognized origins receive a restricted CORS policy.
Authorization Future API key authentication for Pro/Business plans (not yet required).

Success Response (200)

JSON Response
{
  "html": "<!DOCTYPE html><html>...",
  "url": "https://example.com/contact",
  "success": true,
  "truncated": false
}
Field Type Description
html string The raw HTML content of the fetched page. Maximum 2 MB.
url string The final URL after any redirects were followed.
success boolean true if the page was fetched successfully.
truncated boolean true if the response was larger than 2 MB and was truncated.

Error Response

Error JSON
{
  "error": "Page not found (404)",
  "success": false
}

5
Error Codes & Messages

HTTP Status Error Message Cause
400 Missing or invalid URL No url field in the request body, or the value is not a string.
400 URL too long The URL exceeds 2,048 characters.
400 Invalid URL format The URL cannot be parsed as a valid address.
400 Internal addresses are not allowed The URL points to a private IP range, localhost, or cloud metadata endpoint (SSRF protection).
400 Only HTTP and HTTPS protocols are allowed The URL uses an unsupported protocol (e.g., ftp://).
400 Skipped binary or media file type The URL points to a non-text file (images, videos, archives, etc.).
403 Unauthorized request origin The request came from an unrecognized origin or referrer.
429 Rate limit exceeded More than 100 requests in 10 minutes from the same IP. Includes a Retry-After header.
500 Request timed out / DNS failure / Connection refused The target server could not be reached within 10 seconds, or the domain does not exist.

6
Rate Limits & Security

Rate Limiting

  • 100 requests per IP per 10-minute window
  • In-memory counter per Worker instance
  • Returns 429 with Retry-After header
  • Pro/Business plans will offer higher limits

Security Controls

  • SSRF protection — blocks private IPs, localhost, cloud metadata
  • URL length cap — max 2,048 characters
  • Response cap — max 2 MB, truncated if larger
  • Protocol filter — only HTTP/HTTPS allowed

Response Headers

Header Value
Content-Type application/json
Access-Control-Allow-Origin Restricted to https://extractor.email
Cache-Control no-store, no-cache, must-revalidate
X-Robots-Tag noindex, nofollow
X-Content-Type-Options nosniff

7
Code Examples

JavaScript (Fetch)
async function fetchPage(url) {
  const response = await fetch('https://extractor.email/api/fetch-page', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ url })
  });

  const data = await response.json();

  if (!data.success) {
    throw new Error(data.error);
  }

  // Extract emails from HTML
  const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
  const emails = [...new Set(data.html.match(emailRegex) || [])];

  return { emails, url: data.url, truncated: data.truncated };
}
cURL
curl -X POST https://extractor.email/api/fetch-page \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/contact"}'
Python (requests)
import requests
import re

response = requests.post(
    'https://extractor.email/api/fetch-page',
    json={'url': 'https://example.com/contact'}
)

data = response.json()

if data.get('success'):
    emails = set(re.findall(
        r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
        data['html']
    ))
    print(f'Found {len(emails)} emails')
else:
    print(f'Error: {data.get("error")}')

8
Try It Coming Soon

An interactive API playground will be available for authenticated users.

Roadmap

JavaScript SDK

An npm package wrapping the API with typed responses, retries, and email parsing built in.

Planned

POST /api/extract-emails

A combined endpoint that fetches, parses, validates, and returns structured email data — all server-side.

Planned

POST /api/monitor

Schedule recurring extractions and receive webhooks when new emails are detected on monitored URLs.

Planned

Batch Endpoint

Submit multiple URLs in a single request and receive results as they complete via streaming or polling.

Planned

Ready to Extract Emails?

Use the full-featured extraction tool right now — no API key required.