爬虫 · Web前端工程化

[TOC] # osmosis Web scraper for NodeJS https://github.com/rchipka/node-osmosis # scrape-it 🔮 A Node.js scraper for humans. https://github.com/IonicaBizau/scrape-it # node-crawler https://github.com/bda-research/node-crawler # supercrawler https://github.com/brendonboshell/supercrawler A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits. # x-ray https://github.com/matthewmueller/x-ray The next web scraper. See through the noise. # headless-chrome-crawler Distributed crawler powered by Headless Chrome https://github.com/yujiosaka/headless-chrome-crawler # simplecrawler Flexible event driven crawler for node. https://github.com/simplecrawler/simplecrawler # goose-parser Universal scrapping tool, which allows you to extract data using multiple environments https://github.com/redco/goose-parser # apify https://github.com/apify/apify-js the scalable web crawling and scraping library for JavaScript # webster https://github.com/zhuyingda/webster a reliable high-level web crawling & scraping framework for Node.js. # schabbi-webscraper Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites. https://github.com/PatrickSchababerle/schabbi-webscraper # website-scraper Download website to local directory (including all css, images, js, etc. https://github.com/website-scraper/node-website-scraper # cheerio-httpcli https://www.npmjs.com/package/cheerio-httpcli # ScrapingBee If you want to learn how to avoid getting blocked, read our[complete guide](https://www.scrapingbee.com/blog/web-scraping-without-getting-blocked/), and if you don't want to deal with this, you can always use our[web scraping API](https://www.scrapingbee.com/). Happy Scraping! ## Resources Would you like to read more? Check these links out: * [NodeJS Website](https://nodejs.org/en/about/)\- Contains documentation and a lot of information on how to get started. * [Puppeteer's Docs](https://developers.google.com/web/tools/puppeteer)\- Contains the API reference and guides for getting started. * [Playright](https://www.scrapingbee.com/blog/playwright-web-scraping/)An alternative to Puppeteer, backed by Microsoft. * [ScrapingBee's Blog](https://www.scrapingbee.com/blog/)\- Contains a lot of information about Web Scraping goodies on multiple platforms. * [Handling infinite scroll with Puppeteer](https://www.scrapingbee.com/blog/infinite-scroll-puppeteer/) * [Node-unblocker](https://www.scrapingbee.com/blog/node-unblocker/)\- a Node.js package to facilitate web scraping through proxies. # github scrapper crawler [Search · scrapper (github.com)](https://github.com/search?l=JavaScript&o=desc&p=97&q=scrapper&s=&type=Repositories)