API Documentation
Fetch HTML content from any public URL through Cloudflare's edge network. Built-in rate limiting, SSRF protection, and response size controls.
1
Quick Start
https://extractor.email/api/fetch-page fetch('https://extractor.email/api/fetch-page', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url: 'https://example.com/contact' })
})
.then(res => res.json())
.then(data => console.log(data)); Currently available for extractor.email only
This API is used internally by our extraction tool. Authenticated API access for external integrations is coming in the Pro and Business plans.
2
Base URL
https://extractor.email/api All API endpoints are served over HTTPS via Cloudflare's edge network. HTTP requests are rejected.
3
Authentication
Coming Soon
The API currently validates requests by origin. Authenticated access via API keys will be available in Pro and Business plans.
Authorization: Bearer ex_live_abc123... Free
Origin-only
Internal use via extractor.email
Pro — $9/mo
API Key
100 requests/day
Business — $29/mo
API Key
1,000 requests/day
4
Endpoint Reference
Currently one endpoint is available. More endpoints will be added as the platform grows.
/api/fetch-page Fetches the HTML content of a given URL through Cloudflare's edge network. The response contains the raw HTML that you can parse client-side to extract emails, links, or any other data.
Request
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to fetch. Must be a public HTTP or HTTPS address. If the protocol is omitted, https:// is prepended automatically. |
Request Headers
| Header | Required | Description |
|---|---|---|
Content-Type | Yes | Must be application/json |
Origin | Soft | Validated against allowed origins. Requests from unrecognized origins receive a restricted CORS policy. |
Authorization | Future | API key authentication for Pro/Business plans (not yet required). |
Success Response (200)
{
"html": "<!DOCTYPE html><html>...",
"url": "https://example.com/contact",
"success": true,
"truncated": false
} | Field | Type | Description |
|---|---|---|
html | string | The raw HTML content of the fetched page. Maximum 2 MB. |
url | string | The final URL after any redirects were followed. |
success | boolean | true if the page was fetched successfully. |
truncated | boolean | true if the response was larger than 2 MB and was truncated. |
Error Response
{
"error": "Page not found (404)",
"success": false
} 5
Error Codes & Messages
| HTTP Status | Error Message | Cause |
|---|---|---|
| 400 | Missing or invalid URL | No url field in the request body, or the value is not a string. |
| 400 | URL too long | The URL exceeds 2,048 characters. |
| 400 | Invalid URL format | The URL cannot be parsed as a valid address. |
| 400 | Internal addresses are not allowed | The URL points to a private IP range, localhost, or cloud metadata endpoint (SSRF protection). |
| 400 | Only HTTP and HTTPS protocols are allowed | The URL uses an unsupported protocol (e.g., ftp://). |
| 400 | Skipped binary or media file type | The URL points to a non-text file (images, videos, archives, etc.). |
| 403 | Unauthorized request origin | The request came from an unrecognized origin or referrer. |
| 429 | Rate limit exceeded | More than 100 requests in 10 minutes from the same IP. Includes a Retry-After header. |
| 500 | Request timed out / DNS failure / Connection refused | The target server could not be reached within 10 seconds, or the domain does not exist. |
6
Rate Limits & Security
Rate Limiting
- • 100 requests per IP per 10-minute window
- • In-memory counter per Worker instance
- • Returns
429withRetry-Afterheader - • Pro/Business plans will offer higher limits
Security Controls
- • SSRF protection — blocks private IPs, localhost, cloud metadata
- • URL length cap — max 2,048 characters
- • Response cap — max 2 MB, truncated if larger
- • Protocol filter — only HTTP/HTTPS allowed
Response Headers
| Header | Value |
|---|---|
Content-Type | application/json |
Access-Control-Allow-Origin | Restricted to https://extractor.email |
Cache-Control | no-store, no-cache, must-revalidate |
X-Robots-Tag | noindex, nofollow |
X-Content-Type-Options | nosniff |
7
Code Examples
async function fetchPage(url) {
const response = await fetch('https://extractor.email/api/fetch-page', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url })
});
const data = await response.json();
if (!data.success) {
throw new Error(data.error);
}
// Extract emails from HTML
const emailRegex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
const emails = [...new Set(data.html.match(emailRegex) || [])];
return { emails, url: data.url, truncated: data.truncated };
} curl -X POST https://extractor.email/api/fetch-page \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/contact"}' import requests
import re
response = requests.post(
'https://extractor.email/api/fetch-page',
json={'url': 'https://example.com/contact'}
)
data = response.json()
if data.get('success'):
emails = set(re.findall(
r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
data['html']
))
print(f'Found {len(emails)} emails')
else:
print(f'Error: {data.get("error")}') 8
Try It
Coming Soon
An interactive API playground will be available for authenticated users.
Roadmap
JavaScript SDK
An npm package wrapping the API with typed responses, retries, and email parsing built in.
PlannedPOST /api/extract-emails
A combined endpoint that fetches, parses, validates, and returns structured email data — all server-side.
PlannedPOST /api/monitor
Schedule recurring extractions and receive webhooks when new emails are detected on monitored URLs.
PlannedBatch Endpoint
Submit multiple URLs in a single request and receive results as they complete via streaming or polling.
PlannedReady to Extract Emails?
Use the full-featured extraction tool right now — no API key required.