[TOC]
# osmosis
Web scraper for NodeJS
https://github.com/rchipka/node-osmosis
# scrape-it
🔮 A Node.js scraper for humans. https://github.com/IonicaBizau/scrape-it
# node-crawler
https://github.com/bda-research/node-crawler
# supercrawler
https://github.com/brendonboshell/supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
# x-ray
https://github.com/matthewmueller/x-ray
The next web scraper. See through the noise.
# headless-chrome-crawler
Distributed crawler powered by Headless Chrome https://github.com/yujiosaka/headless-chrome-crawler
# simplecrawler
Flexible event driven crawler for node.
https://github.com/simplecrawler/simplecrawler
# goose-parser
Universal scrapping tool, which allows you to extract data using multiple environments
https://github.com/redco/goose-parser
# apify
https://github.com/apify/apify-js
the scalable web crawling and scraping library for JavaScript
# webster
https://github.com/zhuyingda/webster
a reliable high-level web crawling & scraping framework for Node.js.
# schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites. https://github.com/PatrickSchababerle/schabbi-webscraper
# website-scraper
Download website to local directory (including all css, images, js, etc.
https://github.com/website-scraper/node-website-scraper
# cheerio-httpcli
https://www.npmjs.com/package/cheerio-httpcli
# ScrapingBee
If you want to learn how to avoid getting blocked, read our[complete guide](https://www.scrapingbee.com/blog/web-scraping-without-getting-blocked/), and if you don't want to deal with this, you can always use our[web scraping API](https://www.scrapingbee.com/).
Happy Scraping!
## Resources
Would you like to read more? Check these links out:
* [NodeJS Website](https://nodejs.org/en/about/)\- Contains documentation and a lot of information on how to get started.
* [Puppeteer's Docs](https://developers.google.com/web/tools/puppeteer)\- Contains the API reference and guides for getting started.
* [Playright](https://www.scrapingbee.com/blog/playwright-web-scraping/)An alternative to Puppeteer, backed by Microsoft.
* [ScrapingBee's Blog](https://www.scrapingbee.com/blog/)\- Contains a lot of information about Web Scraping goodies on multiple platforms.
* [Handling infinite scroll with Puppeteer](https://www.scrapingbee.com/blog/infinite-scroll-puppeteer/)
* [Node-unblocker](https://www.scrapingbee.com/blog/node-unblocker/)\- a Node.js package to facilitate web scraping through proxies.
# github scrapper crawler
[Search · scrapper (github.com)](https://github.com/search?l=JavaScript&o=desc&p=97&q=scrapper&s=&type=Repositories)
- 讲解 Markdown
- 示例
- SVN
- Git笔记
- github 相关
- DESIGNER'S GUIDE TO DPI
- JS 模块化
- CommonJS、AMD、CMD、UMD、ES6
- AMD
- RequrieJS
- r.js
- 模块化打包
- 学习Chrome DevTools
- chrome://inspect
- Chrome DevTools 之 Elements
- Chrome DevTools 之 Console
- Chrome DevTools 之 Sources
- Chrome DevTools 之 Network
- Chrome DevTools 之 Memory
- Chrome DevTools 之 Performance
- Chrome DevTools 之 Resources
- Chrome DevTools 之 Security
- Chrome DevTools 之 Audits
- 技巧
- Node.js
- 基础知识
- package.json 详解
- corepack
- npm
- yarn
- pnpm
- yalc
- 库处理
- Babel
- 相关库
- 转译基础
- 插件
- AST
- Rollup
- 基础
- 插件
- Webpack
- 详解配置
- 实现 loader
- webpack 进阶
- plugin 用法
- 辅助工具
- 解答疑惑
- 开发工具集合
- 花样百出的打包工具
- 纷杂的构建系统
- monorepo
- 前端工作流
- 爬虫
- 测试篇
- 综合
- Jest
- playwright
- Puppeteer
- cypress
- webdriverIO
- TestCafe
- 其他
- 工程开发
- gulp篇
- Building With Gulp
- Sass篇
- PostCSS篇
- combo服务
- 编码规范检查
- 前端优化
- 优化策略
- 高性能HTML5
- 浏览器端性能
- 前后端分离篇
- 分离部署
- API 文档框架
- 项目开发环境
- 基于 JWT 的 Token 认证
- 扯皮时间
- 持续集成及后续服务
- 静态服务器搭建
- mock与调试
- browserslist
- Project Starter
- Docker
- 文档网站生成
- ddd