Tool Spotlight Web Scraping Open Source Feb 23, 2026

Scrapling: An Adaptive Web Scraping Framework That Handles Everything from Single Requests to Full-Scale Crawls

A Python framework that combines anti-bot bypass, adaptive element tracking, a Scrapy-like spider API, and an MCP server for AI-assisted scraping — all in one library. Built by web scrapers for web scrapers, with 10.6k stars and 38 releases.

D4Vinci / Scrapling

An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

Python ★ 10.6k stars v0.4 BSD-3-Clause 678 forks 1,124 commits 5 contributors

The Python web scraping ecosystem is fragmented. You use BeautifulSoup or Parsel for parsing, httpx or requests for HTTP, Playwright or Selenium for dynamic pages, and then you wire it all together yourself. Anti-bot bypass? That's another library. Adaptive element tracking when websites change? Write it yourself. Full crawling with concurrency, pause/resume, and proxy rotation? Time for Scrapy.

Scrapling tries to be the single library that covers the entire scraping pipeline. Created by Karim Shoair (D4Vinci), it bundles an adaptive parser, multiple fetcher backends (HTTP, headless browser, stealth browser), a full spider framework, CLI tools, and an MCP server for AI-assisted scraping — all under one pip install. With 10.6k stars and 1,124 commits across 38 releases, it's one of the more mature entries in the space.

// The Four Layers

🔍

Adaptive Parser

CSS, XPath, and BeautifulSoup-style selectors. Smart element tracking that relocates elements after website design changes using similarity algorithms.

🌐

Multiple Fetchers

Fetcher (HTTP with TLS fingerprinting), DynamicFetcher (Playwright), StealthyFetcher (anti-bot bypass with Cloudflare Turnstile support).

🕷️

Spider Framework

Scrapy-like spider API with concurrent crawling, multi-session support, pause/resume checkpoints, streaming mode, and built-in export.

🤖

MCP Server

Built-in MCP server for AI-assisted scraping with Claude/Cursor. Extracts targeted content before passing to the AI to reduce token usage.

// Fetcher Architecture

Scrapling's fetcher system is one of its strongest differentiators. Instead of picking a single approach, it offers three fetcher classes that share the same response interface but use different backends:

Fetcher	Backend	Use Case
Fetcher	HTTP (httpx-based)	Fast requests with TLS fingerprint impersonation, HTTP/3 support
DynamicFetcher	Playwright (Chromium/Chrome)	JavaScript-rendered pages, full browser automation
StealthyFetcher	Stealth browser	Anti-bot bypass, Cloudflare Turnstile/Interstitial, fingerprint spoofing

All three support persistent sessions (FetcherSession, DynamicSession, StealthySession), async variants, proxy rotation via the built-in ProxyRotator, and domain blocking for browser-based fetchers. The spider framework can mix multiple session types in a single crawl — route protected pages through the stealth session while fast-tracking everything else through HTTP.

// Adaptive Element Tracking

🔄 Elements That Survive Website Redesigns

Scrape elements with auto_save=True to build a fingerprint. Later, if the website changes its HTML structure, pass adaptive=True and Scrapling relocates the elements using intelligent similarity algorithms — no manual selector updates needed.

This is probably the most distinctive feature. Traditional scrapers break when a website changes its class names, restructures its DOM, or moves elements around. Scrapling's adaptive mode fingerprints elements on first scrape and then uses similarity matching to relocate them after changes. It's not foolproof — major redesigns will still break things — but for the common case of incremental website changes, it means significantly less scraper maintenance.

// Spider Framework

The spider API follows the Scrapy pattern: define start_urls, write async parse callbacks, yield items or follow-up requests. But it adds several features that Scrapy doesn't offer out of the box: multi-session support (mix HTTP and headless browsers in one spider), streaming mode via async for item in spider.stream(), checkpoint-based pause/resume with crawldir, and automatic blocked request detection with retry logic.

start_urls → Concurrent fetching → parse() callback → Yield items / requests → Export JSON/JSONL

Pause/resume works by passing a crawldir path. Press Ctrl+C for graceful shutdown, and progress is checkpointed automatically. Restart with the same directory to resume from where it stopped. This is particularly useful for long-running crawls against rate-limited targets.

// Performance Benchmarks

The project includes parser benchmarks against popular Python scraping libraries. Scrapling's parser matches Parsel/Scrapy speed and significantly outperforms BeautifulSoup, PyQuery, and Selectolax on text extraction across 5,000 nested elements:

Library	Time (ms)	vs Scrapling
Scrapling	2.02	1.0x
Parsel / Scrapy	2.04	1.01x
Raw lxml	2.54	1.26x
PyQuery	24.17	~12x
Selectolax	82.63	~41x
BS4 with lxml	1,584	~784x
BS4 with html5lib	3,392	~1,679x

For adaptive element similarity searching, Scrapling clocks 2.39ms vs AutoScraper's 12.45ms — about 5x faster. All benchmarks represent averages of 100+ runs.

// CLI and Developer Tools

Beyond the library API, Scrapling includes a CLI for scraping without writing code. The scrapling extract command fetches a URL and outputs content as Markdown, plain text, or HTML — useful for quick data extraction or piping into other tools. There's also an interactive IPython-based scraping shell (scrapling shell) with built-in shortcuts for converting curl commands to Scrapling requests and previewing results in the browser.

// Considerations

⚠️ Scope and Dependencies

Scrapling tries to be everything — parser, fetcher, spider, CLI, and MCP server. This breadth means a large dependency tree when using all features. The modular install (pip install scrapling[fetchers], [ai], [shell]) helps, but the full install pulls in Playwright, browser binaries, and significant infrastructure.

Browser install required. Using any of the browser-based fetchers requires running scrapling install after pip install, which downloads Chromium and system dependencies. This adds significant disk space and isn't always feasible in constrained environments, though a ready-made Docker image is available.

5 contributors. Despite 10.6k stars and 1,124 commits, the project has a very small contributor base. The vast majority of development appears to come from the creator. This is common for solo-driven projects but raises bus-factor questions for production dependencies.

Adaptive tracking limits. The adaptive element relocating is powerful for incremental changes, but has limits. Major website redesigns, complete structural overhauls, or fundamentally different page layouts can still break the similarity matching. It reduces maintenance, it doesn't eliminate it.

Anti-bot bypass legality. The StealthyFetcher's ability to bypass Cloudflare Turnstile and other anti-bot systems is technically impressive but operates in a legal gray area depending on jurisdiction and the target website's terms of service. The project includes appropriate disclaimers.

// Bottom Line

Scrapling's ambition is to be the one library you need for web scraping in Python. The breadth is impressive: from a two-line HTTP request to a full concurrent spider with pause/resume, stealth browser sessions, adaptive element tracking, and AI-assisted extraction via MCP. The parser performance matches the fastest Python options, and the adaptive element tracking is a genuine innovation that addresses one of scraping's biggest pain points — website changes breaking selectors.

At 10.6k stars, 38 releases, and active development (latest release February 2026), it's well past the experimental stage. For teams already invested in Scrapy who just need parsing, it might be overkill. But for new projects that want a single framework covering the full scraping pipeline — from anti-bot bypass to concurrent crawling to AI integration — Scrapling is a strong option worth evaluating.

GitHub Repository Documentation

Scrapling: An Adaptive Web Scraping Framework That Handles Everything from Single Requests to Full-Scale Crawls

Scrapling: An Adaptive Web Scraping Framework That Handles Everything from Single Requests to Full-Scale Crawls

// The Four Layers

// Fetcher Architecture

// Adaptive Element Tracking

// Spider Framework

// Performance Benchmarks

// CLI and Developer Tools

// Considerations

// Bottom Line

Latest

Alleged Breach of Therapeutes Exposes 71,500 Patient Records and 199,000 Therapy Appointments From French Mental Health Platform

Threat Actor Selling Alleged Databases From Crypto, AI, and Finance Platforms Including MagicSlides, TLDR.Tech, and 365.loans

Alleged Data Leak Exposes 30 Million Colombian Citizens From ICFES National Education Database

Daily Dose of Dark Web Informer - March 12th, 2026

Scrapling: An Adaptive Web Scraping Framework That Handles Everything from Single Requests to Full-Scale Crawls

Scrapling: An Adaptive Web Scraping Framework That Handles Everything from Single Requests to Full-Scale Crawls

// The Four Layers

// Fetcher Architecture

// Adaptive Element Tracking

// Spider Framework

// Performance Benchmarks

// CLI and Developer Tools

// Considerations

// Bottom Line

Related

Latest