
The era of simple requests.get() web scraping is dead. A developer's journey building a Vinted scraper for Pokémon card resellers reveals why modern anti-bot systems like Datadome have fundamentally changed the scraping landscape—and what techniques actually work in 2024.
The Real Challenge: 85,000 ML Models Per Site
When tasked with monitoring Pokémon card drops on Vinted Spain, the developer quickly discovered that basic scrapers fail after ~10 requests. The culprit? Datadome's per-site machine learning models—85,000+ of them—that analyze everything from TLS fingerprints to mouse movement patterns.
<> "Datadome uses JA3/TLS fingerprinting, HTTP header analysis, and behavioral detection. It's not just blocking bots—it's learning from them."/>
This represents a seismic shift in web scraping. Sites like Vinted, Leboncoin, and others now deploy enterprise-grade defenses that treat scraping as an adversarial ML problem, not just a rate-limiting exercise.
The Multi-Layer Evasion Stack
Successful modern scraping requires a sophisticated approach combining multiple evasion techniques:
1. Residential Proxy Rotation
Static datacenter IPs are instantly flagged. The solution involves rotating residential or mobile proxies every 5-10 requests:
1from seleniumbase import BaseCase
2
3class VintedScraper(BaseCase):
4 def setUp(self):
5 super().setUp()
6 # Rotate residential proxies
7 self.proxy_list = ['residential_ip_1:port', 'residential_ip_2:port']
8 self.current_proxy = 0
9
10 def get_with_rotation(self, url):
11 if self.request_count % 5 == 0: # Rotate every 5 requests
12 self.switch_proxy(self.proxy_list[self.current_proxy])
13 self.current_proxy = (self.current_proxy + 1) % len(self.proxy_list)2. Headless Browser Fingerprint Matching
Pure HTTP libraries expose themselves through TLS signatures. Headless browsers with proper configuration survive longer:
1import nodriver as uc
2
3async def scrape_vinted():
4 browser = await uc.start(
5 user_data_dir="./browser_profile",
6 headless=False, # Sometimes non-headless appears more human
7 no_sandbox=True
8 )3. The "Cookie Factory" Approach
Rather than scraping web pages directly, successful scrapers target less-protected app APIs using a "cookie factory"—browser automation that generates authentication tokens:
1def create_cookie_factory():
2 """Generate tokens for API access"""
3 driver = uc.Chrome()
4 driver.get("https://www.vinted.es/member/login")
5
6 # Automated login process
7 # Extract: access_token_web, refresh_token_web, datadome cookies
8 tokens = {Geographic Complexity: 26-Country Redirect Maze
Vinted's challenge extends beyond bot detection. The platform operates across 26 countries with aggressive geo-redirects that can trap scrapers in redirect loops. Successful scrapers must:
- Detect target country via headers and cookies
- Route requests through matching proxy locations (Spanish proxy for Vinted Spain)
- Handle multiple TLDs (.es, .fr, .de, etc.) with different API endpoints
The No-Code Alternative
The complexity has spawned a new generation of no-code scraping tools. Apify's Vinted Turbo Scraper, for example, promises "zero bans via auto-proxy rotation" and covers 19+ European sites. This reflects a broader trend: scraping expertise is moving from individual developers to specialized platforms.
1# Quick-start with existing tools
2npx apify run vinted-scraper \
3 --build-tag latest \
4 --input '{"searchUrls": ["https://www.vinted.es/catalog?search_text=pokemon"], "maxItems": 100}'The Ethical and Legal Dimension
Modern scraping operates in a gray area. While gathering public data isn't inherently illegal, bypassing technical measures like Datadome may violate terms of service. The safest approaches:
- Use official APIs when available
- Rate-limit aggressively (<1 request/second per IP)
- Consider third-party data services like Lobstr.io for maintenance-free access
- Focus on public, non-personal data
Performance Reality Check
Even with sophisticated evasion, expect:
- Success rates around 60-80% (not 100%)
- Significant infrastructure costs (residential proxies aren't cheap)
- Constant maintenance as detection evolves
- Rate limits of 100-500 items/hour for sustainable scraping
Why This Matters
This isn't just about scraping Vinted. The techniques here apply to any site with sophisticated anti-bot defenses:
- E-commerce monitoring (price tracking, inventory alerts)
- Market research (competitor analysis, trend detection)
- Real estate (listing aggregation, price analysis)
- Job boards (automated application systems)
The key insight: modern web scraping is becoming an adversarial ML problem. Simple automation fails against systems designed to detect and adapt to bot behavior.
For developers, this means investing in proper tooling, understanding the legal implications, and often reconsidering whether scraping is the right approach compared to official APIs or third-party data services. The "move fast and scrape things" era is over—replaced by sophisticated, expensive, and legally complex solutions that require enterprise-grade planning.
