ποΈ Architecture overview
An overview of the core components of the Crawlee library and its architecture.
ποΈ Avoid getting blocked
How to avoid getting blocked when scraping
ποΈ Logging in with a crawler
How to log in to websites with Crawlee.
ποΈ Creating web archive
How to create a Web ARChive (WARC) with Crawlee
ποΈ Error handling
How to handle errors that occur during web crawling.
ποΈ HTTP clients
Learn about Crawlee's HTTP client architecture, how to switch between different implementations, and create custom HTTP clients for specialized web scraping needs.
ποΈ HTTP crawlers
Learn about Crawlee's HTTP crawlers including BeautifulSoup, Parsel, and raw HTTP crawlers for efficient server-rendered content extraction without JavaScript execution.
ποΈ Playwright crawler
Learn how to use PlaywrightCrawler for browser-based web scraping.
ποΈ Adaptive Playwright crawler
Learn how to use the Adaptive Playwright crawler to automatically switch between browser-based and HTTP-only crawling.
ποΈ Playwright with Stagehand
How to integrate Stagehand AI-powered automation with PlaywrightCrawler.
ποΈ Proxy management
Using proxies to get around those annoying IP-blocks
ποΈ Request loaders
How to manage the requests your crawler will go through.
ποΈ Request router
Learn how to use the Router class to organize request handlers, error handlers, and pre-navigation hooks in Crawlee.
ποΈ Running in web server
Running in web server
ποΈ Scaling crawlers
Learn how to scale your crawlers by controlling concurrency and limiting requests per minute.
ποΈ Service locator
Crawlee's service locator is a central registry for global services, managing and providing access to them throughout the whole framework.
ποΈ Session management
How to manage your cookies, proxy IP rotations and more.
ποΈ Storage clients
How to work with storage clients in Crawlee, including the built-in clients and how to create your own.
ποΈ Storages
How to work with storages in Crawlee, how to manage requests and how to store and retrieve scraping results.
ποΈ Trace and monitor crawlers
Learn how to instrument your crawlers with OpenTelemetry to trace request handling, identify bottlenecks, monitor performance, and visualize telemetry data using Jaeger for performance optimization.