Scrapling: An Adaptive Web Scraping Framework That Handles Everything from Single Requests to Full-Scale Crawls
A Python framework that combines anti-bot bypass, adaptive element tracking, a Scrapy-like spider API, and an MCP server for AI-assisted scraping — all in one library. Built by web scrapers for web scrapers, with 10.6k stars and 38 releases.
An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
The Python web scraping ecosystem is fragmented. You use BeautifulSoup or Parsel for parsing, httpx or requests for HTTP, Playwright or Selenium for dynamic pages, and then you wire it all together yourself. Anti-bot bypass? That's another library. Adaptive element tracking when websites change? Write it yourself. Full crawling with concurrency, pause/resume, and proxy rotation? Time for Scrapy.
Scrapling tries to be the single library that covers the entire scraping pipeline. Created by Karim Shoair (D4Vinci), it bundles an adaptive parser, multiple fetcher backends (HTTP, headless browser, stealth browser), a full spider framework, CLI tools, and an MCP server for AI-assisted scraping — all under one pip install. With 10.6k stars and 1,124 commits across 38 releases, it's one of the more mature entries in the space.
// The Four Layers
// Fetcher Architecture
Scrapling's fetcher system is one of its strongest differentiators. Instead of picking a single approach, it offers three fetcher classes that share the same response interface but use different backends:
| Fetcher | Backend | Use Case |
|---|---|---|
| Fetcher | HTTP (httpx-based) | Fast requests with TLS fingerprint impersonation, HTTP/3 support |
| DynamicFetcher | Playwright (Chromium/Chrome) | JavaScript-rendered pages, full browser automation |
| StealthyFetcher | Stealth browser | Anti-bot bypass, Cloudflare Turnstile/Interstitial, fingerprint spoofing |
All three support persistent sessions (FetcherSession, DynamicSession, StealthySession), async variants, proxy rotation via the built-in ProxyRotator, and domain blocking for browser-based fetchers. The spider framework can mix multiple session types in a single crawl — route protected pages through the stealth session while fast-tracking everything else through HTTP.
// Adaptive Element Tracking
auto_save=True to build a fingerprint. Later, if the website changes its HTML structure, pass adaptive=True and Scrapling relocates the elements using intelligent similarity algorithms — no manual selector updates needed.This is probably the most distinctive feature. Traditional scrapers break when a website changes its class names, restructures its DOM, or moves elements around. Scrapling's adaptive mode fingerprints elements on first scrape and then uses similarity matching to relocate them after changes. It's not foolproof — major redesigns will still break things — but for the common case of incremental website changes, it means significantly less scraper maintenance.
// Spider Framework
The spider API follows the Scrapy pattern: define start_urls, write async parse callbacks, yield items or follow-up requests. But it adds several features that Scrapy doesn't offer out of the box: multi-session support (mix HTTP and headless browsers in one spider), streaming mode via async for item in spider.stream(), checkpoint-based pause/resume with crawldir, and automatic blocked request detection with retry logic.
Pause/resume works by passing a crawldir path. Press Ctrl+C for graceful shutdown, and progress is checkpointed automatically. Restart with the same directory to resume from where it stopped. This is particularly useful for long-running crawls against rate-limited targets.
// Performance Benchmarks
The project includes parser benchmarks against popular Python scraping libraries. Scrapling's parser matches Parsel/Scrapy speed and significantly outperforms BeautifulSoup, PyQuery, and Selectolax on text extraction across 5,000 nested elements:
| Library | Time (ms) | vs Scrapling |
|---|---|---|
| Scrapling | 2.02 | 1.0x |
| Parsel / Scrapy | 2.04 | 1.01x |
| Raw lxml | 2.54 | 1.26x |
| PyQuery | 24.17 | ~12x |
| Selectolax | 82.63 | ~41x |
| BS4 with lxml | 1,584 | ~784x |
| BS4 with html5lib | 3,392 | ~1,679x |
For adaptive element similarity searching, Scrapling clocks 2.39ms vs AutoScraper's 12.45ms — about 5x faster. All benchmarks represent averages of 100+ runs.
// CLI and Developer Tools
Beyond the library API, Scrapling includes a CLI for scraping without writing code. The scrapling extract command fetches a URL and outputs content as Markdown, plain text, or HTML — useful for quick data extraction or piping into other tools. There's also an interactive IPython-based scraping shell (scrapling shell) with built-in shortcuts for converting curl commands to Scrapling requests and previewing results in the browser.
// Considerations
pip install scrapling[fetchers], [ai], [shell]) helps, but the full install pulls in Playwright, browser binaries, and significant infrastructure.Browser install required. Using any of the browser-based fetchers requires running scrapling install after pip install, which downloads Chromium and system dependencies. This adds significant disk space and isn't always feasible in constrained environments, though a ready-made Docker image is available.
5 contributors. Despite 10.6k stars and 1,124 commits, the project has a very small contributor base. The vast majority of development appears to come from the creator. This is common for solo-driven projects but raises bus-factor questions for production dependencies.
Adaptive tracking limits. The adaptive element relocating is powerful for incremental changes, but has limits. Major website redesigns, complete structural overhauls, or fundamentally different page layouts can still break the similarity matching. It reduces maintenance, it doesn't eliminate it.
Anti-bot bypass legality. The StealthyFetcher's ability to bypass Cloudflare Turnstile and other anti-bot systems is technically impressive but operates in a legal gray area depending on jurisdiction and the target website's terms of service. The project includes appropriate disclaimers.
// Bottom Line
Scrapling's ambition is to be the one library you need for web scraping in Python. The breadth is impressive: from a two-line HTTP request to a full concurrent spider with pause/resume, stealth browser sessions, adaptive element tracking, and AI-assisted extraction via MCP. The parser performance matches the fastest Python options, and the adaptive element tracking is a genuine innovation that addresses one of scraping's biggest pain points — website changes breaking selectors.
At 10.6k stars, 38 releases, and active development (latest release February 2026), it's well past the experimental stage. For teams already invested in Scrapy who just need parsing, it might be overkill. But for new projects that want a single framework covering the full scraping pipeline — from anti-bot bypass to concurrent crawling to AI integration — Scrapling is a strong option worth evaluating.