Multi-Account Management
Manage multiple Amazon seller accounts. Multi-Account →
Amazon is the holy grail of web scraping targets. Over 350 million products. Real-time pricing. Millions of reviews. Seller data. Everything an e-commerce business needs.
It’s also one of the hardest sites to scrape.
Amazon runs one of the most sophisticated bot detection systems on the internet. They block millions of bots daily. Their detection includes fingerprinting, behavioral analysis, machine learning, and dedicated anti-fraud teams.
But here’s the thing: legitimate Amazon scraping is possible. Millions of price comparison sites, market research tools, and analytics platforms scrape Amazon every day.
This guide shows you how to do it right.
Here’s why Amazon is worth the effort to scrape:
| Metric | Value | Source |
|---|---|---|
| Total products | 600+ million | Digital Commerce 360 |
| Active sellers | 9.7 million worldwide | MobiLoud Stats 2025 |
| New sellers ( 2026) | 900,000 (2,000/day) | EcommerceDB |
| Third-party seller GMV | $480 billion (2023) | Analyzer Tools |
| US marketplace GMV | $362.7 billion | FinancesOnline |
| Total revenue ( 2026) | $638 billion | SalesDuo |
The opportunity: With 60% of Amazon’s sales coming from third-party sellers, price monitoring, competitor research, and market analysis are essential for any e-commerce business. Amazon doesn’t provide this data freely — you have to extract it.
Before we dive into solutions, understand the problem:
Amazon monitors:├── IP reputation scores├── Datacenter IP detection├── Request volume per IP├── Geographic consistency└── Known proxy/VPN detectionAmazon checks:├── User agent consistency├── JavaScript execution timing├── Canvas fingerprint├── WebGL fingerprint├── Audio fingerprint├── Screen/window properties└── Chrome/Firefox specific objectsAmazon analyzes:├── Mouse movement patterns├── Scroll behavior├── Click patterns├── Request timing├── Page dwell time├── Navigation patterns└── Cart/search behaviorAmazon tracks:├── Cookie consistency├── Session duration├── Cross-page patterns├── Device fingerprint persistence└── Account behavior history| Page Type | Difficulty | Detection Level |
|---|---|---|
| Homepage | Easy | Low |
| Search results | Medium | Medium |
| Product pages | Medium | Medium |
| Product reviews | Hard | High |
| Seller pages | Hard | High |
| Deals/Lightning | Very Hard | Very High |
| Cart/Checkout | Extremely Hard | Maximum |
Focus your scraping on product pages and search results. Avoid cart and checkout flows unless absolutely necessary.
import { GoLogin } from '@gologin/core';
const gologin = new GoLogin({ profileName: 'amazon-scraper', fingerprintOptions: { platform: 'windows', browser: 'chrome', locale: 'en-US', timezone: 'America/New_York', }, // CRITICAL: Use residential proxy for Amazon proxy: { protocol: 'http', host: 'residential.proxy.com', port: 10000, username: 'user-country-us', password: 'password', }, geolocation: { latitude: 40.7128, longitude: -74.0060, timezone: 'America/New_York', },});Amazon tracks new sessions closely. Warm up your profile:
async function warmupProfile(page: Page): Promise<void> { // Visit homepage first await page.goto('https://www.amazon.com', { waitUntil: 'networkidle2', });
await humanDelay(2000, 4000);
// Do a generic search await page.type('#twotabsearchtextbox', 'laptop', { delay: 100 }); await page.click('#nav-search-submit-button');
await page.waitForNavigation({ waitUntil: 'networkidle2' }); await humanDelay(3000, 5000);
// Scroll through results await humanScroll(page);
// Click on a random product const products = await page.$$('[data-asin]:not([data-asin=""])'); if (products.length > 0) { const randomIndex = Math.floor(Math.random() * Math.min(5, products.length)); await products[randomIndex].click(); await page.waitForNavigation({ waitUntil: 'networkidle2' }); }
await humanDelay(2000, 4000);
console.log('Profile warmed up');}
function humanDelay(min: number, max: number): Promise<void> { const delay = min + Math.random() * (max - min); return new Promise(r => setTimeout(r, delay));}Block unnecessary resources to speed up scraping:
async function setupRequestInterception(page: Page): Promise<void> { await page.setRequestInterception(true);
page.on('request', (req) => { const resourceType = req.resourceType(); const url = req.url();
// Block tracking and non-essential resources if ( resourceType === 'image' || resourceType === 'font' || url.includes('amazon-adsystem') || url.includes('fls-na.amazon') || url.includes('cloudfront.net/s/') || url.includes('analytics') || url.includes('doubleclick') ) { req.abort(); } else { req.continue(); } });}Amazon product pages have a complex but predictable structure:
interface AmazonProduct { asin: string; title: string; price: { current: number; currency: string; original?: number; discount?: number; }; rating: { average: number; count: number; }; features: string[]; images: string[]; availability: string; seller: string; brand?: string; category?: string;}async function scrapeProductPage(page: Page, asin: string): Promise<AmazonProduct | null> { const url = `https://www.amazon.com/dp/${asin}`;
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000, });
// Check for CAPTCHA const hasCaptcha = await checkForCaptcha(page); if (hasCaptcha) { console.log('CAPTCHA detected, handling...'); await handleCaptcha(page); }
// Check if product exists const notFound = await page.$('#d'); if (notFound) { console.log(`Product ${asin} not found`); return null; }
// Wait for critical elements await page.waitForSelector('#productTitle', { timeout: 10000 }).catch(() => null);
// Extract data const product = await page.evaluate(() => { const getText = (selector: string): string => { const el = document.querySelector(selector); return el?.textContent?.trim() || ''; };
const getPrice = (): { current: number; currency: string; original?: number } => { // Try different price selectors const priceSelectors = [ '.a-price .a-offscreen', '#priceblock_ourprice', '#priceblock_dealprice', '#priceblock_saleprice', '.a-price-whole', ];
for (const selector of priceSelectors) { const el = document.querySelector(selector); if (el) { const text = el.textContent || ''; const match = text.match(/[\d,.]+/); if (match) { return { current: parseFloat(match[0].replace(',', '')), currency: 'USD', }; } } }
return { current: 0, currency: 'USD' }; };
const getRating = (): { average: number; count: number } => { const ratingEl = document.querySelector('#acrPopover'); const countEl = document.querySelector('#acrCustomerReviewText');
const ratingText = ratingEl?.getAttribute('title') || ''; const ratingMatch = ratingText.match(/([\d.]+)/);
const countText = countEl?.textContent || ''; const countMatch = countText.match(/([\d,]+)/);
return { average: ratingMatch ? parseFloat(ratingMatch[1]) : 0, count: countMatch ? parseInt(countMatch[1].replace(',', ''), 10) : 0, }; };
const getFeatures = (): string[] => { const features: string[] = []; document.querySelectorAll('#feature-bullets li span').forEach(el => { const text = el.textContent?.trim(); if (text && !text.includes('Make sure this fits')) { features.push(text); } }); return features; };
const getImages = (): string[] => { const images: string[] = []; document.querySelectorAll('#altImages img').forEach(img => { const src = img.getAttribute('src'); if (src && !src.includes('play-icon')) { // Convert thumbnail to full size const fullSize = src.replace(/\._[A-Z]+\d+_\./, '.'); images.push(fullSize); } }); return images; };
const getAvailability = (): string => { const el = document.querySelector('#availability span'); return el?.textContent?.trim() || 'Unknown'; };
return { asin: window.location.pathname.split('/dp/')[1]?.split('/')[0] || '', title: getText('#productTitle'), price: getPrice(), rating: getRating(), features: getFeatures(), images: getImages(), availability: getAvailability(), seller: getText('#sellerProfileTriggerId') || 'Amazon', brand: getText('#bylineInfo')?.replace('Brand: ', '').replace('Visit the ', '').replace(' Store', ''), }; });
return product as AmazonProduct;}interface SearchResult { asin: string; title: string; price: number; rating: number; reviewCount: number; sponsored: boolean; url: string;}
async function scrapeSearchResults( page: Page, query: string, maxPages: number = 3): Promise<SearchResult[]> { const results: SearchResult[] = []; let currentPage = 1;
// Encode query const encodedQuery = encodeURIComponent(query); let url = `https://www.amazon.com/s?k=${encodedQuery}`;
while (currentPage <= maxPages) { console.log(`Scraping page ${currentPage} for "${query}"`);
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000, });
// Check for CAPTCHA if (await checkForCaptcha(page)) { await handleCaptcha(page); continue; // Retry the same page }
// Wait for results await page.waitForSelector('[data-asin]', { timeout: 10000 }).catch(() => null);
// Human-like delay await humanDelay(1000, 2000); await humanScroll(page);
// Extract results const pageResults = await page.evaluate(() => { const items: SearchResult[] = [];
document.querySelectorAll('[data-asin]:not([data-asin=""])').forEach(el => { const asin = el.getAttribute('data-asin'); if (!asin) return;
const titleEl = el.querySelector('h2 a span'); const priceEl = el.querySelector('.a-price .a-offscreen'); const ratingEl = el.querySelector('.a-icon-star-small'); const reviewEl = el.querySelector('[aria-label*="stars"] + span'); const linkEl = el.querySelector('h2 a');
// Check if sponsored const sponsored = !!el.querySelector('[data-component-type="sp-sponsored-result"]');
const priceText = priceEl?.textContent || ''; const priceMatch = priceText.match(/([\d,.]+)/);
const ratingText = ratingEl?.getAttribute('aria-label') || ''; const ratingMatch = ratingText.match(/([\d.]+)/);
const reviewText = reviewEl?.textContent || ''; const reviewMatch = reviewText.match(/([\d,]+)/);
items.push({ asin, title: titleEl?.textContent?.trim() || '', price: priceMatch ? parseFloat(priceMatch[1].replace(',', '')) : 0, rating: ratingMatch ? parseFloat(ratingMatch[1]) : 0, reviewCount: reviewMatch ? parseInt(reviewMatch[1].replace(',', ''), 10) : 0, sponsored, url: `https://www.amazon.com/dp/${asin}`, }); });
return items; });
results.push(...pageResults); console.log(`Found ${pageResults.length} products on page ${currentPage}`);
// Check for next page const nextButton = await page.$('.s-pagination-next:not(.s-pagination-disabled)'); if (!nextButton || currentPage >= maxPages) break;
// Click next page await nextButton.click(); await page.waitForNavigation({ waitUntil: 'domcontentloaded' });
currentPage++; await humanDelay(2000, 4000); }
return results;}Amazon uses their own CAPTCHA system. Here’s how to handle it:
async function checkForCaptcha(page: Page): Promise<boolean> { const captchaIndicators = [ 'input[name="captcha"]', '#captchacharacters', 'img[src*="captcha"]', '.a-box-inner h4', // "Enter the characters you see below" ];
for (const selector of captchaIndicators) { const element = await page.$(selector); if (element) return true; }
// Check page content const content = await page.content(); return content.includes('Type the characters you see') || content.includes('Enter the characters you see below');}async function handleCaptchaRetry(gologin: GoLogin, page: Page): Promise<void> { console.log('CAPTCHA detected, rotating session...');
// Close current browser await page.browser().close(); await gologin.stop();
// Generate new fingerprint await gologin.regenerateFingerprint();
// Wait before retry await new Promise(r => setTimeout(r, 5000));
// Restart const { browserWSEndpoint } = await gologin.start(); // Continue with new session...}async function handleCaptchaWithService(page: Page): Promise<void> { // Get CAPTCHA image const imgSrc = await page.$eval( 'img[src*="captcha"]', el => el.getAttribute('src') );
if (!imgSrc) throw new Error('CAPTCHA image not found');
// Send to solving service (2captcha example) const response = await fetch('http://2captcha.com/in.php', { method: 'POST', body: new URLSearchParams({ key: process.env.CAPTCHA_API_KEY!, method: 'base64', body: await imageToBase64(imgSrc), json: '1', }), });
const { request: requestId } = await response.json();
// Poll for solution let solution = ''; while (!solution) { await new Promise(r => setTimeout(r, 5000));
const result = await fetch( `http://2captcha.com/res.php?key=${process.env.CAPTCHA_API_KEY}&action=get&id=${requestId}&json=1` ); const data = await result.json();
if (data.status === 1) { solution = data.request; } }
// Enter solution await page.type('#captchacharacters', solution); await page.click('button[type="submit"]'); await page.waitForNavigation({ waitUntil: 'networkidle2' });}| Action | Requests/Hour | Sessions/Day |
|---|---|---|
| Search results | 30-50 | 5-10 |
| Product pages | 60-100 | 10-20 |
| Reviews | 20-30 | 3-5 |
class AmazonRateLimiter { private requestCounts = new Map<string, number>(); private lastRequest = 0;
constructor(private requestsPerMinute: number = 10) {}
async throttle(): Promise<void> { const now = Date.now(); const minInterval = (60 * 1000) / this.requestsPerMinute; const timeSinceLastRequest = now - this.lastRequest;
if (timeSinceLastRequest < minInterval) { const waitTime = minInterval - timeSinceLastRequest; // Add randomization const randomizedWait = waitTime * (0.8 + Math.random() * 0.4); await new Promise(r => setTimeout(r, randomizedWait)); }
this.lastRequest = Date.now(); }}
// Usageconst limiter = new AmazonRateLimiter(10);
for (const asin of asins) { await limiter.throttle(); const product = await scrapeProductPage(page, asin); // Process product...}async function scrapeWithSessionRotation( asins: string[], productsPerSession: number = 50): Promise<AmazonProduct[]> { const results: AmazonProduct[] = []; let currentIndex = 0;
while (currentIndex < asins.length) { // Create new session const gologin = new GoLogin({ profileName: `amazon-session-${Date.now()}`, // ... config });
const { browserWSEndpoint } = await gologin.start(); const browser = await puppeteer.connect({ browserWSEndpoint }); const page = await browser.newPage();
try { // Warm up session await warmupProfile(page);
// Scrape batch const batchEnd = Math.min(currentIndex + productsPerSession, asins.length);
for (let i = currentIndex; i < batchEnd; i++) { try { const product = await scrapeProductPage(page, asins[i]); if (product) results.push(product); } catch (error) { console.error(`Failed to scrape ${asins[i]}:`, error); }
await humanDelay(3000, 6000); }
currentIndex = batchEnd; } finally { await browser.close(); await gologin.stop(); }
// Wait between sessions if (currentIndex < asins.length) { console.log('Waiting between sessions...'); await new Promise(r => setTimeout(r, 30000 + Math.random() * 30000)); } }
return results;}import { GoLogin } from '@gologin/core';import puppeteer, { Page, Browser } from 'puppeteer-core';
interface AmazonScraperConfig { proxy: { host: string; port: number; username: string; password: string; }; productsPerSession: number; requestsPerMinute: number;}
class AmazonScraper { private gologin: GoLogin | null = null; private browser: Browser | null = null; private page: Page | null = null; private limiter: AmazonRateLimiter;
constructor(private config: AmazonScraperConfig) { this.limiter = new AmazonRateLimiter(config.requestsPerMinute); }
async init(): Promise<void> { this.gologin = new GoLogin({ profileName: `amazon-${Date.now()}`, fingerprintOptions: { platform: 'windows', browser: 'chrome', locale: 'en-US', timezone: 'America/New_York', }, proxy: { protocol: 'http', ...this.config.proxy, }, });
const { browserWSEndpoint } = await this.gologin.start(); this.browser = await puppeteer.connect({ browserWSEndpoint }); this.page = await this.browser.newPage();
await setupRequestInterception(this.page); await warmupProfile(this.page); }
async scrapeProducts(asins: string[]): Promise<AmazonProduct[]> { if (!this.page) throw new Error('Scraper not initialized');
const results: AmazonProduct[] = [];
for (const asin of asins) { await this.limiter.throttle();
try { const product = await scrapeProductPage(this.page, asin); if (product) { results.push(product); console.log(`Scraped: ${product.title.slice(0, 50)}...`); } } catch (error) { console.error(`Failed to scrape ${asin}:`, error);
// Check if we need to rotate session if (await checkForCaptcha(this.page)) { console.log('Session compromised, rotating...'); await this.rotateSession(); } } }
return results; }
async rotateSession(): Promise<void> { await this.close(); await new Promise(r => setTimeout(r, 10000)); await this.init(); }
async close(): Promise<void> { if (this.browser) await this.browser.close(); if (this.gologin) await this.gologin.stop(); }}
// Usageasync function main() { const scraper = new AmazonScraper({ proxy: { host: 'residential.proxy.com', port: 10000, username: 'user', password: 'pass', }, productsPerSession: 50, requestsPerMinute: 10, });
await scraper.init();
try { const products = await scraper.scrapeProducts([ 'B09V3KXJPB', // Product ASINs 'B0BSHF7WHW', // ... more ASINs ]);
console.log(`Scraped ${products.length} products`);
// Save results await fs.writeFile( 'products.json', JSON.stringify(products, null, 2) ); } finally { await scraper.close(); }}
main().catch(console.error);Let me be direct: Amazon’s Terms of Service prohibit automated scraping. That said, the legality is nuanced.
Generally legal:
Gray areas:
Illegal:
My advice: Use Amazon’s Product Advertising API for commercial applications. For legitimate research, scrape responsibly and respect rate limits. Consult legal counsel for commercial use cases.
Based on testing with thousands of scraping sessions:
Conservative (safest):
Moderate (balanced):
Aggressive (risky):
The key factor: It’s not just volume — it’s behavioral consistency. Scraping 100 products with proper warming, human delays, and realistic behavior is safer than blasting through 50 products like a robot.
When to use Amazon Product Advertising API:
When direct scraping makes sense:
The reality: Many legitimate businesses scrape Amazon because the API has severe limitations — no review content, limited historical data, and requires affiliate participation. Just do it responsibly.
Let’s troubleshoot:
Common causes:
Quick fix: If you’re hitting CAPTCHAs on your first request, your fingerprint is burned before you even start. Problem is likely IP or browser fingerprint mismatch.
Raw Puppeteer: You’ll get blocked on request #1-5. Amazon detects headless Chrome instantly.
Puppeteer + puppeteer-extra-plugin-stealth: You’ll get 20-50 requests before CAPTCHAs. Better, but not reliable.
Puppeteer + GoLogin: You’ll get 50-200+ requests per session with proper rate limiting. This is what actually works.
Why GoLogin makes the difference:
The math: If you’re scraping 1,000+ products, the time saved avoiding CAPTCHAs pays for GoLogin in the first day.
Peak hours (avoid):
Off-peak hours (better):
Here’s the truth: Time matters less than behavior. A well-configured scraper with proper fingerprints works 24/7. Bad fingerprints get blocked at 3am just as fast as 3pm.
Better strategy: Focus on behavioral realism (delays, scrolling, warmup) instead of trying to scrape when Amazon’s “asleep” — they’re never asleep.
Reviews are harder than product pages. Amazon protects them more aggressively because review manipulation is a bigger problem.
Best approach:
Selectors (as of December 2026):
const reviews = await page.$$eval('[data-hook="review"]', elements => { return elements.map(el => ({ author: el.querySelector('[data-hook="review-author"]')?.textContent, rating: el.querySelector('[data-hook="review-star-rating"]')?.textContent, title: el.querySelector('[data-hook="review-title"]')?.textContent, text: el.querySelector('[data-hook="review-body"]')?.textContent, date: el.querySelector('[data-hook="review-date"]')?.textContent, }));});Warning: Amazon frequently updates review page structure. Expect selectors to break every 2-3 months.
Always use residential proxies — Amazon blocks datacenter IPs aggressively.
Warm up sessions — Visit homepage and search before scraping products.
Respect rate limits — 10 requests/minute is a safe baseline.
Rotate sessions — Don’t scrape more than 50-100 products per session.
Handle CAPTCHAs gracefully — Either rotate or use solving services.
Match fingerprint to proxy — Geographic consistency is crucial.
Block unnecessary resources — Faster scraping, less detection.
Multi-Account Management
Manage multiple Amazon seller accounts. Multi-Account →
Proxy Rotation
Set up proxy rotation for scale. Proxy Guide →
Bypass Cloudflare
Apply these techniques to other protected sites. Cloudflare Guide →
Fingerprint Checker
Verify your setup before scraping. Fingerprint Tool →