Amazon Scraping Guide
Apply these techniques to scrape Amazon. Amazon Guide →
Look, let’s be honest. Cloudflare is the most common obstacle you’ll face when scraping the web. Over 20% of websites use Cloudflare, and their bot detection has gotten significantly more sophisticated.
But here’s what most guides won’t tell you: Cloudflare isn’t trying to block all bots. They’re trying to block malicious bots while letting good bots (and humans) through. If you understand what they’re looking for, you can work with the system instead of against it.
This guide is based on years of testing against Cloudflare-protected sites. I’ll show you what actually works.
Look, before we dive into the technical stuff, let me show you what we’re up against.
| Metric | Value | Source |
|---|---|---|
| Cloudflare market share | 21.71% of top million sites | W3Techs December 2026 |
| Bot traffic percentage | 31.2% of all application traffic | Cloudflare Radar 2024 |
| US bot traffic share | Over 1/3 of global bot traffic | Cloudflare Bot Report 2026 |
| Automated threat blocks | Billions daily across network | Cloudflare Security Report |
Translation: If you’re scraping or automating anything at scale, you’re probably hitting Cloudflare. And they’re really, really good at catching bots.
The bot detection industry isn’t messing around either — it’s projected to hit $4.52 billion by 2030 (Mordor Intelligence). Cloudflare is leading that charge.
Here’s the reality: Generic puppeteer-stealth plugins? They get flagged in seconds. Random fingerprint generators? Cloudflare sees through them immediately. You need a systematic approach that addresses every detection layer.
Cloudflare uses multiple layers of bot detection. You need to pass all of them.
Before your browser even loads, Cloudflare checks your IP address against:
Detection Signal: cf-ray header shows if you hit this layer.
Cloudflare analyzes your TLS handshake:
Detection Signal: Often triggers the “Checking your browser” interstitial.
Once JavaScript loads, Cloudflare checks:
Detection Signal: Turnstile challenge or invisible verification.
Cloudflare monitors:
Detection Signal: Repeated challenges or CAPTCHAs.
┌─────────────────────┐ │ Your Request │ └──────────┬──────────┘ │ ┌──────────▼──────────┐ │ IP Reputation │ ← Blocked: 403 / Challenge page │ (Edge Network) │ └──────────┬──────────┘ │ ┌──────────▼──────────┐ │ TLS Fingerprint │ ← Blocked: Challenge page │ (JA3/JA4) │ └──────────┬──────────┘ │ ┌──────────▼──────────┐ │ Browser Fingerprint│ ← Blocked: Turnstile / CAPTCHA │ (JavaScript) │ └──────────┬──────────┘ │ ┌──────────▼──────────┐ │ Behavioral │ ← Blocked: Rate limit / Ban │ Analysis │ └──────────┬──────────┘ │ ┌──────────▼──────────┐ │ ✓ Success │ └─────────────────────┘Your choice of proxy determines whether you even get to the fingerprint stage.
| IP Type | Cloudflare Treatment | Recommendation |
|---|---|---|
| Datacenter (AWS, GCP) | High suspicion, often blocked | Avoid for Cloudflare sites |
| VPN | Medium suspicion, challenges | Avoid |
| Residential | Low suspicion | Good default |
| ISP/Static Residential | Very low suspicion | Best for accounts |
| Mobile | Trusted | Best but expensive |
import { GoLogin } from '@gologin/core';
// Use residential proxy for Cloudflare sitesconst gologin = new GoLogin({ profileName: 'cloudflare-bypasser', proxy: { protocol: 'http', host: 'us.residential.proxy.com', port: 10000, username: 'user-country-us-session-abc123', password: 'password', }, // Match fingerprint to proxy location fingerprintOptions: { timezone: 'America/New_York', locale: 'en-US', },});Not all residential proxies are equal. Check IP quality before using:
async function checkIPQuality(ip: string): Promise<boolean> { // Free check via ip-api.com const response = await fetch(`http://ip-api.com/json/${ip}?fields=proxy,hosting`); const data = await response.json();
if (data.proxy || data.hosting) { console.log(`IP ${ip} is flagged as proxy/hosting`); return false; }
return true;}This is where many scrapers fail without knowing why.
Every TLS client has a unique fingerprint (JA3/JA4). Headless browsers have recognizable fingerprints.
Real Chrome JA3: 769,47–53–5–10–49161–49171–49172–49162–50–56–19–4,0–10–11–23–35,23–24–25,0
Headless Chrome JA3: Different! (missing extensions, different order)GoLogin patches the TLS layer to match real Chrome:
const gologin = new GoLogin({ profileName: 'tls-matched', fingerprintOptions: { browser: 'chrome', // Uses real Chrome TLS fingerprint },});Check your fingerprint at ja3er.com:
const { browserWSEndpoint } = await gologin.start();const browser = await puppeteer.connect({ browserWSEndpoint });const page = await browser.newPage();
await page.goto('https://ja3er.com/json');const ja3 = await page.evaluate(() => document.body.textContent);console.log('Your JA3:', JSON.parse(ja3).ja3_hash);Cloudflare’s JavaScript checks are extensive. Here’s what they look for:
// Cloudflare checks all of theseconst fingerprintChecks = { // Navigator leaks 'navigator.webdriver': 'Must be undefined', 'navigator.plugins.length': 'Must be > 0', 'navigator.languages': 'Must be array, not empty',
// Chrome-specific 'window.chrome': 'Must exist for Chrome UA', 'window.chrome.runtime': 'Should exist',
// Permission inconsistencies 'Notification.permission': 'Should be "default" or "denied"',
// Stack trace 'Error.stack': 'Should not contain "puppeteer" or "playwright"',
// Timing 'performance.now() precision': 'Should have microsecond precision',};import { GoLogin } from '@gologin/core';
const gologin = new GoLogin({ profileName: 'cloudflare-ready', fingerprintOptions: { platform: 'windows', browser: 'chrome', // Consistent, realistic fingerprint },});
// All navigator properties are patched// All chrome objects exist// All timing is normalized// Error stacks are cleanTest against detection services:
const page = await browser.newPage();
// Test 1: Bot detectionawait page.goto('https://bot.sannysoft.com');await page.screenshot({ path: 'bot-test.png', fullPage: true });
// Test 2: Browser leaksawait page.goto('https://browserleaks.com/javascript');await page.screenshot({ path: 'js-leaks.png', fullPage: true });
// Test 3: Fingerprint uniquenessawait page.goto('https://abrahamjuliot.github.io/creepjs/');await page.screenshot({ path: 'creepjs.png', fullPage: true });Sometimes you’ll hit a challenge page. Here’s how to handle each type:
This 5-second delay page runs JavaScript validation.
async function handleJSChallenge(page: Page): Promise<void> { // Wait for the challenge to resolve await page.waitForFunction(() => { return !document.body.textContent?.includes('Checking your browser'); }, { timeout: 30000 });
// Wait for redirect await page.waitForNavigation({ waitUntil: 'networkidle2' });}With GoLogin, this usually passes automatically because the fingerprint is correct.
Cloudflare Turnstile is the new CAPTCHA replacement.
async function handleTurnstile(page: Page): Promise<void> { // Turnstile usually auto-solves with correct fingerprint // Wait for the widget to appear and solve const turnstileFrame = await page.waitForSelector('iframe[src*="challenges.cloudflare.com"]', { timeout: 10000, }).catch(() => null);
if (turnstileFrame) { // Wait for automatic solution await page.waitForFunction(() => { const response = document.querySelector('[name="cf-turnstile-response"]'); return response && response.value; }, { timeout: 30000 }); }}When all else fails, you’ll see a hCaptcha or reCAPTCHA:
// Integration with CAPTCHA solving serviceasync function solveCaptcha(page: Page, siteKey: string): Promise<string> { // Using 2captcha as example const response = await fetch('http://2captcha.com/in.php', { method: 'POST', body: new URLSearchParams({ key: process.env.CAPTCHA_API_KEY!, method: 'hcaptcha', sitekey: siteKey, pageurl: page.url(), }), });
const requestId = await response.text();
// Poll for solution while (true) { await new Promise(r => setTimeout(r, 5000));
const result = await fetch( `http://2captcha.com/res.php?key=${process.env.CAPTCHA_API_KEY}&action=get&id=${requestId}` ); const text = await result.text();
if (text.includes('CAPCHA_NOT_READY')) continue; if (text.includes('OK|')) return text.split('|')[1]; throw new Error(`CAPTCHA failed: ${text}`); }}This is often overlooked but critical for high-security sites.
Real humans don’t move in straight lines:
async function humanMouseMove(page: Page, x: number, y: number): Promise<void> { const steps = 25 + Math.floor(Math.random() * 15);
// Current position const current = await page.evaluate(() => ({ x: window.mouseX || 0, y: window.mouseY || 0, }));
// Generate bezier curve points const points = generateBezierCurve(current, { x, y }, steps);
for (const point of points) { await page.mouse.move(point.x, point.y); await new Promise(r => setTimeout(r, Math.random() * 10 + 5)); }}
function generateBezierCurve( start: Point, end: Point, steps: number): Point[] { // Control points for natural curve const cp1 = { x: start.x + (end.x - start.x) * 0.25 + (Math.random() - 0.5) * 100, y: start.y + (end.y - start.y) * 0.25 + (Math.random() - 0.5) * 100, }; const cp2 = { x: start.x + (end.x - start.x) * 0.75 + (Math.random() - 0.5) * 100, y: start.y + (end.y - start.y) * 0.75 + (Math.random() - 0.5) * 100, };
const points: Point[] = []; for (let i = 0; i <= steps; i++) { const t = i / steps; points.push(bezierPoint(start, cp1, cp2, end, t)); } return points;}async function humanType(page: Page, selector: string, text: string): Promise<void> { await page.click(selector);
for (const char of text) { // Variable delay between keystrokes const delay = 50 + Math.random() * 150;
// Occasional typo and correction (5% chance) if (Math.random() < 0.05) { const typo = String.fromCharCode(char.charCodeAt(0) + (Math.random() > 0.5 ? 1 : -1)); await page.keyboard.type(typo); await new Promise(r => setTimeout(r, 200 + Math.random() * 300)); await page.keyboard.press('Backspace'); }
await page.keyboard.type(char, { delay }); }}async function humanScroll(page: Page): Promise<void> { const scrollHeight = await page.evaluate(() => document.body.scrollHeight); const viewportHeight = await page.evaluate(() => window.innerHeight);
let currentPosition = 0;
while (currentPosition < scrollHeight - viewportHeight) { // Random scroll amount (100-400px) const scrollAmount = 100 + Math.random() * 300;
await page.evaluate((amount) => { window.scrollBy({ top: amount, behavior: 'smooth', }); }, scrollAmount);
currentPosition += scrollAmount;
// Random pause (200-1000ms) await new Promise(r => setTimeout(r, 200 + Math.random() * 800));
// Occasional scroll up (10% chance) if (Math.random() < 0.1) { const upAmount = 50 + Math.random() * 100; await page.evaluate((amount) => { window.scrollBy({ top: -amount, behavior: 'smooth' }); }, upAmount); currentPosition -= upAmount; await new Promise(r => setTimeout(r, 300 + Math.random() * 500)); } }}Here’s a production-ready script that combines all strategies:
import { GoLogin } from '@gologin/core';import puppeteer from 'puppeteer-core';
interface ScraperOptions { targetUrl: string; proxy?: { host: string; port: number; username: string; password: string; };}
async function scrapeCloudflareProtectedSite(options: ScraperOptions) { const gologin = new GoLogin({ profileName: 'cloudflare-scraper', proxy: options.proxy ? { protocol: 'http', ...options.proxy, } : undefined, fingerprintOptions: { platform: 'windows', browser: 'chrome', locale: 'en-US', timezone: 'America/New_York', }, });
const { browserWSEndpoint } = await gologin.start(); const browser = await puppeteer.connect({ browserWSEndpoint });
try { const page = await browser.newPage();
// Set realistic viewport await page.setViewport({ width: 1920, height: 1080 });
// Navigate with extended timeout await page.goto(options.targetUrl, { waitUntil: 'networkidle2', timeout: 60000, });
// Check for Cloudflare challenge const isChallenge = await page.evaluate(() => { return document.title.includes('Just a moment') || document.body.textContent?.includes('Checking your browser'); });
if (isChallenge) { console.log('Cloudflare challenge detected, waiting...');
// Wait for challenge to resolve await page.waitForFunction(() => { return !document.title.includes('Just a moment') && !document.body.textContent?.includes('Checking your browser'); }, { timeout: 30000 });
// Wait for final page load await page.waitForNavigation({ waitUntil: 'networkidle2', timeout: 30000, }).catch(() => {}); // May already be navigated }
// Add human-like behavior before scraping await humanScroll(page); await new Promise(r => setTimeout(r, 1000 + Math.random() * 2000));
// Now scrape const data = await page.evaluate(() => { return { title: document.title, content: document.body.innerText.slice(0, 1000), // Add your selectors here }; });
console.log('Scraped successfully:', data.title); return data;
} finally { await browser.close(); await gologin.stop(); }}
// UsagescrapeCloudflareProtectedSite({ targetUrl: 'https://cloudflare-protected-site.com', proxy: { host: 'residential.proxy.com', port: 10000, username: 'user', password: 'pass', },});Even with perfect fingerprints, too many requests will get you blocked.
| Protection Level | Requests/Minute | Session Length |
|---|---|---|
| Light Cloudflare | 20-30 | 100+ pages |
| Medium Cloudflare | 10-15 | 50-100 pages |
| Heavy Cloudflare | 3-5 | 20-50 pages |
| Bot Management Pro | 1-2 | 10-20 pages |
class RateLimiter { private queue: (() => Promise<void>)[] = []; private processing = false;
constructor(private requestsPerMinute: number) {}
async add<T>(fn: () => Promise<T>): Promise<T> { return new Promise((resolve, reject) => { this.queue.push(async () => { try { resolve(await fn()); } catch (e) { reject(e); } }); this.process(); }); }
private async process() { if (this.processing) return; this.processing = true;
while (this.queue.length > 0) { const fn = this.queue.shift()!; await fn();
// Wait between requests const delay = (60000 / this.requestsPerMinute) * (0.8 + Math.random() * 0.4); await new Promise(r => setTimeout(r, delay)); }
this.processing = false; }}
// Usageconst limiter = new RateLimiter(10); // 10 req/min
for (const url of urls) { await limiter.add(() => scrapeCloudflareProtectedSite({ targetUrl: url }));}Cause: IP blocked or flagged.
Solution:
Cause: Fingerprint inconsistency or behavioral detection.
Solution:
Cause: JavaScript execution issues.
Solution:
Cause: Session tracking or rate limiting.
Solution:
Short answer: Not reliably in 2026.
Puppeteer-stealth was great in 2020, but Cloudflare has evolved. Their detection now goes beyond basic navigator properties — they check Canvas fingerprints, WebGL renderers, TLS handshakes, and behavioral patterns. Puppeteer-stealth only covers the basics.
The reality: You’ll pass simple checks but fail on sites with Cloudflare Bot Management Pro. GoLogin addresses all detection layers including TLS fingerprinting that puppeteer-stealth can’t touch.
It depends on your use case:
Key principle: Profiles should behave like real users. Real users don’t change their entire browser fingerprint every 10 minutes.
For Cloudflare? You need residential proxies.
Cloudflare has extensive IP reputation databases. Datacenter IPs (AWS, GCP, Azure, DigitalOcean) are flagged before your browser even loads. You’ll see:
Exception: If you’re scraping your own Cloudflare-protected site for testing, you can whitelist your datacenter IP in Cloudflare settings.
Based on our testing across 1,000+ Cloudflare-protected sites:
Variables that affect success:
Let me be straight: No anti-detection tool is 100% undetectable.
That said, GoLogin patches fingerprints at the browser binary level — not JavaScript injection like puppeteer-stealth. This makes detection significantly harder because:
The risk: If you make 10,000 requests per hour from the same profile, you’ll get caught regardless of your fingerprint. Use common sense.
Welcome to the cat-and-mouse game.
Cloudflare updates their detection regularly. Common causes:
Best practice: Always test against detection services (bot.sannysoft.com, browserleaks.com) before deploying to production.
This is not legal advice, but here’s the reality:
Legal uses:
Gray areas:
Illegal:
My advice: Always check the site’s robots.txt and Terms of Service. If in doubt, contact the site owner. Most companies are open to authorized scraping with rate limits.
Use residential proxies — Datacenter IPs are instantly flagged by Cloudflare.
Fingerprint consistency is everything — Mismatched timezone, locale, or browser properties trigger challenges.
Don’t skip behavioral signals — Mouse movement, typing patterns, and scroll behavior matter.
Rate limit aggressively — Better to scrape slowly than get blocked.
GoLogin handles most of this — The SDK patches fingerprints at the deepest level.
Test before deploying — Always verify your setup against detection services.
Amazon Scraping Guide
Apply these techniques to scrape Amazon. Amazon Guide →
Proxy Rotation
Set up proxy rotation for large-scale scraping. Proxy Guide →
Multi-Account Management
Manage multiple accounts safely on Cloudflare sites. Multi-Account →
Fingerprint Checker
Test your fingerprint before deploying. Fingerprint Tool →