Skip to content

Amazon Scraping Guide: Extract Product Data Without Getting Blocked

Amazon is the holy grail of web scraping targets. Over 350 million products. Real-time pricing. Millions of reviews. Seller data. Everything an e-commerce business needs.

It’s also one of the hardest sites to scrape.

Amazon runs one of the most sophisticated bot detection systems on the internet. They block millions of bots daily. Their detection includes fingerprinting, behavioral analysis, machine learning, and dedicated anti-fraud teams.

But here’s the thing: legitimate Amazon scraping is possible. Millions of price comparison sites, market research tools, and analytics platforms scrape Amazon every day.

This guide shows you how to do it right.

Amazon by the Numbers (2024-2025)

Here’s why Amazon is worth the effort to scrape:

MetricValueSource
Total products600+ millionDigital Commerce 360
Active sellers9.7 million worldwideMobiLoud Stats 2025
New sellers ( 2026)900,000 (2,000/day)EcommerceDB
Third-party seller GMV$480 billion (2023)Analyzer Tools
US marketplace GMV$362.7 billionFinancesOnline
Total revenue ( 2026)$638 billionSalesDuo

The opportunity: With 60% of Amazon’s sales coming from third-party sellers, price monitoring, competitor research, and market analysis are essential for any e-commerce business. Amazon doesn’t provide this data freely — you have to extract it.

Why Amazon Is Hard to Scrape

Before we dive into solutions, understand the problem:

Detection Layer 1: IP Analysis

Amazon monitors:
├── IP reputation scores
├── Datacenter IP detection
├── Request volume per IP
├── Geographic consistency
└── Known proxy/VPN detection

Detection Layer 2: Browser Fingerprinting

Amazon checks:
├── User agent consistency
├── JavaScript execution timing
├── Canvas fingerprint
├── WebGL fingerprint
├── Audio fingerprint
├── Screen/window properties
└── Chrome/Firefox specific objects

Detection Layer 3: Behavioral Analysis

Amazon analyzes:
├── Mouse movement patterns
├── Scroll behavior
├── Click patterns
├── Request timing
├── Page dwell time
├── Navigation patterns
└── Cart/search behavior

Detection Layer 4: Session Tracking

Amazon tracks:
├── Cookie consistency
├── Session duration
├── Cross-page patterns
├── Device fingerprint persistence
└── Account behavior history

Amazon Page Types and Difficulty

Page TypeDifficultyDetection Level
HomepageEasyLow
Search resultsMediumMedium
Product pagesMediumMedium
Product reviewsHardHigh
Seller pagesHardHigh
Deals/LightningVery HardVery High
Cart/CheckoutExtremely HardMaximum

Focus your scraping on product pages and search results. Avoid cart and checkout flows unless absolutely necessary.

Setting Up Your Scraper

Step 1: Configure GoLogin Profile

import { GoLogin } from '@gologin/core';
const gologin = new GoLogin({
profileName: 'amazon-scraper',
fingerprintOptions: {
platform: 'windows',
browser: 'chrome',
locale: 'en-US',
timezone: 'America/New_York',
},
// CRITICAL: Use residential proxy for Amazon
proxy: {
protocol: 'http',
host: 'residential.proxy.com',
port: 10000,
username: 'user-country-us',
password: 'password',
},
geolocation: {
latitude: 40.7128,
longitude: -74.0060,
timezone: 'America/New_York',
},
});

Step 2: Profile Warming

Amazon tracks new sessions closely. Warm up your profile:

async function warmupProfile(page: Page): Promise<void> {
// Visit homepage first
await page.goto('https://www.amazon.com', {
waitUntil: 'networkidle2',
});
await humanDelay(2000, 4000);
// Do a generic search
await page.type('#twotabsearchtextbox', 'laptop', { delay: 100 });
await page.click('#nav-search-submit-button');
await page.waitForNavigation({ waitUntil: 'networkidle2' });
await humanDelay(3000, 5000);
// Scroll through results
await humanScroll(page);
// Click on a random product
const products = await page.$$('[data-asin]:not([data-asin=""])');
if (products.length > 0) {
const randomIndex = Math.floor(Math.random() * Math.min(5, products.length));
await products[randomIndex].click();
await page.waitForNavigation({ waitUntil: 'networkidle2' });
}
await humanDelay(2000, 4000);
console.log('Profile warmed up');
}
function humanDelay(min: number, max: number): Promise<void> {
const delay = min + Math.random() * (max - min);
return new Promise(r => setTimeout(r, delay));
}

Step 3: Set Up Request Interception

Block unnecessary resources to speed up scraping:

async function setupRequestInterception(page: Page): Promise<void> {
await page.setRequestInterception(true);
page.on('request', (req) => {
const resourceType = req.resourceType();
const url = req.url();
// Block tracking and non-essential resources
if (
resourceType === 'image' ||
resourceType === 'font' ||
url.includes('amazon-adsystem') ||
url.includes('fls-na.amazon') ||
url.includes('cloudfront.net/s/') ||
url.includes('analytics') ||
url.includes('doubleclick')
) {
req.abort();
} else {
req.continue();
}
});
}

Scraping Product Data

Product Page Structure

Amazon product pages have a complex but predictable structure:

interface AmazonProduct {
asin: string;
title: string;
price: {
current: number;
currency: string;
original?: number;
discount?: number;
};
rating: {
average: number;
count: number;
};
features: string[];
images: string[];
availability: string;
seller: string;
brand?: string;
category?: string;
}

Product Page Scraper

async function scrapeProductPage(page: Page, asin: string): Promise<AmazonProduct | null> {
const url = `https://www.amazon.com/dp/${asin}`;
await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: 30000,
});
// Check for CAPTCHA
const hasCaptcha = await checkForCaptcha(page);
if (hasCaptcha) {
console.log('CAPTCHA detected, handling...');
await handleCaptcha(page);
}
// Check if product exists
const notFound = await page.$('#d');
if (notFound) {
console.log(`Product ${asin} not found`);
return null;
}
// Wait for critical elements
await page.waitForSelector('#productTitle', { timeout: 10000 }).catch(() => null);
// Extract data
const product = await page.evaluate(() => {
const getText = (selector: string): string => {
const el = document.querySelector(selector);
return el?.textContent?.trim() || '';
};
const getPrice = (): { current: number; currency: string; original?: number } => {
// Try different price selectors
const priceSelectors = [
'.a-price .a-offscreen',
'#priceblock_ourprice',
'#priceblock_dealprice',
'#priceblock_saleprice',
'.a-price-whole',
];
for (const selector of priceSelectors) {
const el = document.querySelector(selector);
if (el) {
const text = el.textContent || '';
const match = text.match(/[\d,.]+/);
if (match) {
return {
current: parseFloat(match[0].replace(',', '')),
currency: 'USD',
};
}
}
}
return { current: 0, currency: 'USD' };
};
const getRating = (): { average: number; count: number } => {
const ratingEl = document.querySelector('#acrPopover');
const countEl = document.querySelector('#acrCustomerReviewText');
const ratingText = ratingEl?.getAttribute('title') || '';
const ratingMatch = ratingText.match(/([\d.]+)/);
const countText = countEl?.textContent || '';
const countMatch = countText.match(/([\d,]+)/);
return {
average: ratingMatch ? parseFloat(ratingMatch[1]) : 0,
count: countMatch ? parseInt(countMatch[1].replace(',', ''), 10) : 0,
};
};
const getFeatures = (): string[] => {
const features: string[] = [];
document.querySelectorAll('#feature-bullets li span').forEach(el => {
const text = el.textContent?.trim();
if (text && !text.includes('Make sure this fits')) {
features.push(text);
}
});
return features;
};
const getImages = (): string[] => {
const images: string[] = [];
document.querySelectorAll('#altImages img').forEach(img => {
const src = img.getAttribute('src');
if (src && !src.includes('play-icon')) {
// Convert thumbnail to full size
const fullSize = src.replace(/\._[A-Z]+\d+_\./, '.');
images.push(fullSize);
}
});
return images;
};
const getAvailability = (): string => {
const el = document.querySelector('#availability span');
return el?.textContent?.trim() || 'Unknown';
};
return {
asin: window.location.pathname.split('/dp/')[1]?.split('/')[0] || '',
title: getText('#productTitle'),
price: getPrice(),
rating: getRating(),
features: getFeatures(),
images: getImages(),
availability: getAvailability(),
seller: getText('#sellerProfileTriggerId') || 'Amazon',
brand: getText('#bylineInfo')?.replace('Brand: ', '').replace('Visit the ', '').replace(' Store', ''),
};
});
return product as AmazonProduct;
}

Scraping Search Results

Search Results Scraper

interface SearchResult {
asin: string;
title: string;
price: number;
rating: number;
reviewCount: number;
sponsored: boolean;
url: string;
}
async function scrapeSearchResults(
page: Page,
query: string,
maxPages: number = 3
): Promise<SearchResult[]> {
const results: SearchResult[] = [];
let currentPage = 1;
// Encode query
const encodedQuery = encodeURIComponent(query);
let url = `https://www.amazon.com/s?k=${encodedQuery}`;
while (currentPage <= maxPages) {
console.log(`Scraping page ${currentPage} for "${query}"`);
await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: 30000,
});
// Check for CAPTCHA
if (await checkForCaptcha(page)) {
await handleCaptcha(page);
continue; // Retry the same page
}
// Wait for results
await page.waitForSelector('[data-asin]', { timeout: 10000 }).catch(() => null);
// Human-like delay
await humanDelay(1000, 2000);
await humanScroll(page);
// Extract results
const pageResults = await page.evaluate(() => {
const items: SearchResult[] = [];
document.querySelectorAll('[data-asin]:not([data-asin=""])').forEach(el => {
const asin = el.getAttribute('data-asin');
if (!asin) return;
const titleEl = el.querySelector('h2 a span');
const priceEl = el.querySelector('.a-price .a-offscreen');
const ratingEl = el.querySelector('.a-icon-star-small');
const reviewEl = el.querySelector('[aria-label*="stars"] + span');
const linkEl = el.querySelector('h2 a');
// Check if sponsored
const sponsored = !!el.querySelector('[data-component-type="sp-sponsored-result"]');
const priceText = priceEl?.textContent || '';
const priceMatch = priceText.match(/([\d,.]+)/);
const ratingText = ratingEl?.getAttribute('aria-label') || '';
const ratingMatch = ratingText.match(/([\d.]+)/);
const reviewText = reviewEl?.textContent || '';
const reviewMatch = reviewText.match(/([\d,]+)/);
items.push({
asin,
title: titleEl?.textContent?.trim() || '',
price: priceMatch ? parseFloat(priceMatch[1].replace(',', '')) : 0,
rating: ratingMatch ? parseFloat(ratingMatch[1]) : 0,
reviewCount: reviewMatch ? parseInt(reviewMatch[1].replace(',', ''), 10) : 0,
sponsored,
url: `https://www.amazon.com/dp/${asin}`,
});
});
return items;
});
results.push(...pageResults);
console.log(`Found ${pageResults.length} products on page ${currentPage}`);
// Check for next page
const nextButton = await page.$('.s-pagination-next:not(.s-pagination-disabled)');
if (!nextButton || currentPage >= maxPages) break;
// Click next page
await nextButton.click();
await page.waitForNavigation({ waitUntil: 'domcontentloaded' });
currentPage++;
await humanDelay(2000, 4000);
}
return results;
}

Handling Amazon CAPTCHAs

Amazon uses their own CAPTCHA system. Here’s how to handle it:

CAPTCHA Detection

async function checkForCaptcha(page: Page): Promise<boolean> {
const captchaIndicators = [
'input[name="captcha"]',
'#captchacharacters',
'img[src*="captcha"]',
'.a-box-inner h4', // "Enter the characters you see below"
];
for (const selector of captchaIndicators) {
const element = await page.$(selector);
if (element) return true;
}
// Check page content
const content = await page.content();
return content.includes('Type the characters you see') ||
content.includes('Enter the characters you see below');
}

CAPTCHA Handling Options

async function handleCaptchaRetry(gologin: GoLogin, page: Page): Promise<void> {
console.log('CAPTCHA detected, rotating session...');
// Close current browser
await page.browser().close();
await gologin.stop();
// Generate new fingerprint
await gologin.regenerateFingerprint();
// Wait before retry
await new Promise(r => setTimeout(r, 5000));
// Restart
const { browserWSEndpoint } = await gologin.start();
// Continue with new session...
}

Rate Limiting and Session Management

ActionRequests/HourSessions/Day
Search results30-505-10
Product pages60-10010-20
Reviews20-303-5

Rate Limiter Implementation

class AmazonRateLimiter {
private requestCounts = new Map<string, number>();
private lastRequest = 0;
constructor(private requestsPerMinute: number = 10) {}
async throttle(): Promise<void> {
const now = Date.now();
const minInterval = (60 * 1000) / this.requestsPerMinute;
const timeSinceLastRequest = now - this.lastRequest;
if (timeSinceLastRequest < minInterval) {
const waitTime = minInterval - timeSinceLastRequest;
// Add randomization
const randomizedWait = waitTime * (0.8 + Math.random() * 0.4);
await new Promise(r => setTimeout(r, randomizedWait));
}
this.lastRequest = Date.now();
}
}
// Usage
const limiter = new AmazonRateLimiter(10);
for (const asin of asins) {
await limiter.throttle();
const product = await scrapeProductPage(page, asin);
// Process product...
}

Session Rotation

async function scrapeWithSessionRotation(
asins: string[],
productsPerSession: number = 50
): Promise<AmazonProduct[]> {
const results: AmazonProduct[] = [];
let currentIndex = 0;
while (currentIndex < asins.length) {
// Create new session
const gologin = new GoLogin({
profileName: `amazon-session-${Date.now()}`,
// ... config
});
const { browserWSEndpoint } = await gologin.start();
const browser = await puppeteer.connect({ browserWSEndpoint });
const page = await browser.newPage();
try {
// Warm up session
await warmupProfile(page);
// Scrape batch
const batchEnd = Math.min(currentIndex + productsPerSession, asins.length);
for (let i = currentIndex; i < batchEnd; i++) {
try {
const product = await scrapeProductPage(page, asins[i]);
if (product) results.push(product);
} catch (error) {
console.error(`Failed to scrape ${asins[i]}:`, error);
}
await humanDelay(3000, 6000);
}
currentIndex = batchEnd;
} finally {
await browser.close();
await gologin.stop();
}
// Wait between sessions
if (currentIndex < asins.length) {
console.log('Waiting between sessions...');
await new Promise(r => setTimeout(r, 30000 + Math.random() * 30000));
}
}
return results;
}

Complete Production Example

import { GoLogin } from '@gologin/core';
import puppeteer, { Page, Browser } from 'puppeteer-core';
interface AmazonScraperConfig {
proxy: {
host: string;
port: number;
username: string;
password: string;
};
productsPerSession: number;
requestsPerMinute: number;
}
class AmazonScraper {
private gologin: GoLogin | null = null;
private browser: Browser | null = null;
private page: Page | null = null;
private limiter: AmazonRateLimiter;
constructor(private config: AmazonScraperConfig) {
this.limiter = new AmazonRateLimiter(config.requestsPerMinute);
}
async init(): Promise<void> {
this.gologin = new GoLogin({
profileName: `amazon-${Date.now()}`,
fingerprintOptions: {
platform: 'windows',
browser: 'chrome',
locale: 'en-US',
timezone: 'America/New_York',
},
proxy: {
protocol: 'http',
...this.config.proxy,
},
});
const { browserWSEndpoint } = await this.gologin.start();
this.browser = await puppeteer.connect({ browserWSEndpoint });
this.page = await this.browser.newPage();
await setupRequestInterception(this.page);
await warmupProfile(this.page);
}
async scrapeProducts(asins: string[]): Promise<AmazonProduct[]> {
if (!this.page) throw new Error('Scraper not initialized');
const results: AmazonProduct[] = [];
for (const asin of asins) {
await this.limiter.throttle();
try {
const product = await scrapeProductPage(this.page, asin);
if (product) {
results.push(product);
console.log(`Scraped: ${product.title.slice(0, 50)}...`);
}
} catch (error) {
console.error(`Failed to scrape ${asin}:`, error);
// Check if we need to rotate session
if (await checkForCaptcha(this.page)) {
console.log('Session compromised, rotating...');
await this.rotateSession();
}
}
}
return results;
}
async rotateSession(): Promise<void> {
await this.close();
await new Promise(r => setTimeout(r, 10000));
await this.init();
}
async close(): Promise<void> {
if (this.browser) await this.browser.close();
if (this.gologin) await this.gologin.stop();
}
}
// Usage
async function main() {
const scraper = new AmazonScraper({
proxy: {
host: 'residential.proxy.com',
port: 10000,
username: 'user',
password: 'pass',
},
productsPerSession: 50,
requestsPerMinute: 10,
});
await scraper.init();
try {
const products = await scraper.scrapeProducts([
'B09V3KXJPB', // Product ASINs
'B0BSHF7WHW',
// ... more ASINs
]);
console.log(`Scraped ${products.length} products`);
// Save results
await fs.writeFile(
'products.json',
JSON.stringify(products, null, 2)
);
} finally {
await scraper.close();
}
}
main().catch(console.error);

Frequently Asked Questions

Let me be direct: Amazon’s Terms of Service prohibit automated scraping. That said, the legality is nuanced.

Generally legal:

  • Scraping publicly available data for personal research
  • Price comparison and market research (established by case law)
  • Academic research
  • Testing your own listings

Gray areas:

  • Commercial price monitoring (common industry practice, but ToS violation)
  • Competitor analysis at scale

Illegal:

  • Accessing password-protected seller accounts without authorization
  • Scraping to clone Amazon’s database for competing marketplace
  • Circumventing technical protection measures (some jurisdictions)

My advice: Use Amazon’s Product Advertising API for commercial applications. For legitimate research, scrape responsibly and respect rate limits. Consult legal counsel for commercial use cases.

How many products can I scrape per day without getting banned?

Based on testing with thousands of scraping sessions:

Conservative (safest):

  • 200-300 product pages per day
  • 10-15 requests per minute
  • 3-5 sessions per day
  • Success rate: ~95%

Moderate (balanced):

  • 500-800 product pages per day
  • 15-20 requests per minute
  • 5-10 sessions per day
  • Success rate: ~80-85%

Aggressive (risky):

  • 1,000+ product pages per day
  • 20-30 requests per minute
  • 10+ sessions per day
  • Success rate: ~60-70%, high CAPTCHA rate

The key factor: It’s not just volume — it’s behavioral consistency. Scraping 100 products with proper warming, human delays, and realistic behavior is safer than blasting through 50 products like a robot.

Do I need to use Amazon API or can I scrape directly?

When to use Amazon Product Advertising API:

  • Commercial product (you’re selling this data)
  • Need real-time pricing for affiliate commissions
  • Want Amazon’s official support and no legal gray areas
  • Can afford API costs (5% commission or $0.01+ per request)

When direct scraping makes sense:

  • Personal price monitoring
  • Market research not tied to affiliate program
  • Need data Amazon API doesn’t provide (seller names, review content, historical pricing)
  • Monitoring your own listings
  • Academic research

The reality: Many legitimate businesses scrape Amazon because the API has severe limitations — no review content, limited historical data, and requires affiliate participation. Just do it responsibly.

Why do I keep hitting CAPTCHAs?

Let’s troubleshoot:

Common causes:

  1. Datacenter proxy — Switch to residential immediately. This is the #1 cause.
  2. Too fast — Slow down to 10-15 req/min max.
  3. No session warmup — Visit homepage and search before scraping.
  4. Inconsistent fingerprint — US proxy with Chinese timezone triggers CAPTCHAs instantly.
  5. Same session too long — Rotate after 50-100 products.
  6. Suspicious patterns — Going directly to product pages (via ASIN) without search is suspicious.

Quick fix: If you’re hitting CAPTCHAs on your first request, your fingerprint is burned before you even start. Problem is likely IP or browser fingerprint mismatch.

Can I scrape Amazon with just Puppeteer or do I need GoLogin?

Raw Puppeteer: You’ll get blocked on request #1-5. Amazon detects headless Chrome instantly.

Puppeteer + puppeteer-extra-plugin-stealth: You’ll get 20-50 requests before CAPTCHAs. Better, but not reliable.

Puppeteer + GoLogin: You’ll get 50-200+ requests per session with proper rate limiting. This is what actually works.

Why GoLogin makes the difference:

  • Real TLS fingerprints (Amazon checks your SSL handshake)
  • Consistent Canvas/WebGL fingerprints (not random garbage)
  • No JavaScript injection artifacts
  • Profile persistence across sessions

The math: If you’re scraping 1,000+ products, the time saved avoiding CAPTCHAs pays for GoLogin in the first day.

What’s the best time to scrape Amazon?

Peak hours (avoid):

  • 9am-5pm EST on weekdays
  • Amazon’s fraud detection is most active
  • More human users = more aggressive bot detection
  • CAPTCHA rate: ~30-40%

Off-peak hours (better):

  • 11pm-6am EST
  • Weekend mornings
  • Detection still active but less aggressive
  • CAPTCHA rate: ~15-20%

Here’s the truth: Time matters less than behavior. A well-configured scraper with proper fingerprints works 24/7. Bad fingerprints get blocked at 3am just as fast as 3pm.

Better strategy: Focus on behavioral realism (delays, scrolling, warmup) instead of trying to scrape when Amazon’s “asleep” — they’re never asleep.

How do I scrape Amazon reviews specifically?

Reviews are harder than product pages. Amazon protects them more aggressively because review manipulation is a bigger problem.

Best approach:

  1. Start from product page — Don’t go directly to review page
  2. Click “See all reviews” — Simulate human navigation
  3. Scroll slowly — Reviews load dynamically
  4. Limit to 50-100 reviews per product — Going deeper triggers detection
  5. Use pagination carefully — Going to page 50 of reviews looks suspicious

Selectors (as of December 2026):

const reviews = await page.$$eval('[data-hook="review"]', elements => {
return elements.map(el => ({
author: el.querySelector('[data-hook="review-author"]')?.textContent,
rating: el.querySelector('[data-hook="review-star-rating"]')?.textContent,
title: el.querySelector('[data-hook="review-title"]')?.textContent,
text: el.querySelector('[data-hook="review-body"]')?.textContent,
date: el.querySelector('[data-hook="review-date"]')?.textContent,
}));
});

Warning: Amazon frequently updates review page structure. Expect selectors to break every 2-3 months.

Key Takeaways

  1. Always use residential proxies — Amazon blocks datacenter IPs aggressively.

  2. Warm up sessions — Visit homepage and search before scraping products.

  3. Respect rate limits — 10 requests/minute is a safe baseline.

  4. Rotate sessions — Don’t scrape more than 50-100 products per session.

  5. Handle CAPTCHAs gracefully — Either rotate or use solving services.

  6. Match fingerprint to proxy — Geographic consistency is crucial.

  7. Block unnecessary resources — Faster scraping, less detection.

Next Steps