数据清洗及入库pipelines.py · Python秘籍

## 可以直接保存为文件 ``` scrapy crawl cake -o cake.csv scrapy crawl cake -o cake.xml scrapy crawl cake -o cake.json scrapy crawl cake -o cake.pickle scrapy crawl cake -o cake.marshal scrapy crawl cake -o ftp://user:pass@ftp.example.com/path/to/cake.csv ``` - scrapy输出的json文件中显示中文 scrapy用-o filename.json 输出时，会默认使用unicode编码，当内容为中文时，输出的json文件不便于查看可以在setting.py文件中修改默认的输出编码方式，只需要在setting.py中增加如下语句（默认似乎是没有指定的，所以要增加，如果默认有，就直接修改） >FEED_EXPORT_ENCODING = 'utf-8' - pipelines.py ``` # -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html import sqlite3 class MeituanPipeline(object): def open_spider(self,spider): # 爬虫启动时,连接数据库 self.con = sqlite3.connect("meituan.sqlite") # self.cu用来执行sql语句 self.cu = self.con.cursor() def process_item(self, item, spider): # print(spider.name) # 插入数据库,format格式化values insert_sql = "insert into cake (title, money) values('{}', '{}')".format(item['title'], item['money']) print(insert_sql) self.cu.execute(insert_sql) # 所有的数据修改需要提交 self.con.commit() return item # 爬虫结束时,关闭连接 def spider_close(self,spider): self.con.close() ```