多应用+插件架构,代码干净,二开方便,首家独创一键云编译技术,文档视频完善,免费商用码云13.8K 广告
## 可以直接保存为文件 ``` scrapy crawl cake -o cake.csv scrapy crawl cake -o cake.xml scrapy crawl cake -o cake.json scrapy crawl cake -o cake.pickle scrapy crawl cake -o cake.marshal scrapy crawl cake -o ftp://user:pass@ftp.example.com/path/to/cake.csv ``` - scrapy输出的json文件中显示中文 scrapy用-o filename.json 输出时,会默认使用unicode编码,当内容为中文时,输出的json文件不便于查看 可以在setting.py文件中修改默认的输出编码方式,只需要在setting.py中增加如下语句(默认似乎是没有指定的,所以要增加,如果默认有,就直接修改) >FEED_EXPORT_ENCODING = 'utf-8' - pipelines.py ``` # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html import sqlite3 class MeituanPipeline(object): def open_spider(self,spider): # 爬虫启动时,连接数据库 self.con = sqlite3.connect("meituan.sqlite") # self.cu用来执行sql语句 self.cu = self.con.cursor() def process_item(self, item, spider): # print(spider.name) # 插入数据库,format格式化values insert_sql = "insert into cake (title, money) values('{}', '{}')".format(item['title'], item['money']) print(insert_sql) self.cu.execute(insert_sql) # 所有的数据修改需要提交 self.con.commit() return item # 爬虫结束时,关闭连接 def spider_close(self,spider): self.con.close() ```