Crawlee 🕸 The web scraping and browser automation library

The world's most extensive open-source library for building fast and reliable web crawlers in Node.js

Jan Čurn

2 years ago

#ai#b2b#api#data_science#developer_tools

Hello world 👋

We're Jan and Jakub, founders of Apify, the web scraping and automation platform. Drawing on our team's years of experience, today we're launching Crawlee, the web scraping and browser automation library for Node.js that's designed for the fastest development and maximum reliability in production.

For details, see the short video below or read the announcement blog post.

Main features

🖼 Supports headless browsers with Playwright or Puppeteer

⚡️ Supports raw HTTP crawling with Cheerio or JSDOM

🎛 Automated parallelization and scaling of crawlers for best performance

🐾 Avoids blocking using smart sessions, proxies, and browser fingerprints

🗃 Simple management and persistence of queues of URLs to crawl

🗜 Written completely in TypeScript for type safety and code autocompletion

📚 Comprehensive documentation, code examples, and tutorials

💪🏼 Actively maintained and developed by Apify—we use it ourselves!

🤝 Lively community on Discord

Getting started

Visit crawlee.dev or run the following command:

npx crawlee create my-crawler

Show me the code

uploaded image

Liked Crawlee?

💛 You can support the project on GitHub and Product Hunt, or discuss it on Hacker News.