- 找到pipelines.py里的类名MeituanPipeline
- settings.py文件激活中间件MeituanPipeline
```
ITEM_PIPELINES = {
'meituan.pipelines.MeituanPipeline': 300,
}
```
- cake.py
```
# -*- coding: utf-8 -*-
import scrapy
from ..items import MeituanItem # 引入items的类,数据通过items传入
class CakeSpider(scrapy.Spider):
name = 'cake'
allowed_domains = ['meituan.com']
start_urls = ['http://i.meituan.com/s/changsha-蛋糕/']
def parse(self, response):
mt = MeituanItem() # 实例化
title_list = response.xpath('//*[@id="deals"]/dl/dd/dl/dd[1]/a/span[1]/text()').extract()
money_list = response.xpath('//*[@id="deals"]/dl/dd[1]/dl/dd[2]/dl/dd[1]/a/div/div[2]/div[2]/span[1]/text()').extract()
for i,j in zip(title_list,money_list):
# print(i+"-------------"+j)
mt['title'] = i # 把数据丢给管道items, mt['title']等同于 items中的title = scrapy.Field()
mt['money'] = j
yield mt
```
- items.py
```
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class MeituanItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
money = scrapy.Field()
# pass
```
- 在pipelines.py里打印测试
```
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
class MeituanPipeline(object):
def process_item(self, item, spider):
print(spider.name)
return item
```
>scrapy crawl cake
- 工欲善其事必先利其器
- 请求库
- 解析库
- 数据库
- 存储库
- Web库
- app爬取相关库
- 爬虫框架
- 部署相关库
- ipython
- 基础
- 数学函数
- 随机函数
- 三角函数
- 字符串内建函数
- 列表方法
- 字典内置方法
- 正则表达式
- os
- 字符串及数字的判断
- 常用魔术方法
- db
- mongodb
- mysql
- redis
- ORM
- ODM
- mongodb操作方法
- sqlite3
- access
- files
- Excel
- xml文件
- Python环境
- anaconda
- pip常用命令
- virtualenv
- pyenv
- cmder
- 远程开发
- Jupyter
- crawler
- appium环境搭建
- adb工具
- uiautomator
- 运行Appium+Python Clinet + 夜神模拟器
- DesiredCapabilities参数大全
- requests
- scrapy
- gerapy
- scrapyd
- 请求头fake_useragent库
- 数据传递过程
- 数据清洗及入库pipelines.py
- scrapy调用阿布云代理
- 图片下载
- PyQt5
- pyinstaller
- 攻防
- xss
- xss反射
- Chrome模拟微信浏览器
- flask
- 注册app
- 蓝图Blueprint
- 表单验证wtforms
- Flask-SQLAlchemy
- 数据处理
- json
- tornado
- settings
- 工具
- fiddler
- ab压力测试工具
- 高阶
- 队列
- 多线程
- 消息队列
- 定时任务框架APScheduler
- Django
- 路由分离
- 模型
- admin
- Android
- apk逆向工程