🔥码云GVP开源项目 12k star Uniapp+ElementUI 功能强大 支持多语言、二开方便! 广告
创建爬虫项目后,会生成一个`items.py`文件,可以将爬取到的数据保存到 Item 对象中。 **1. 在`items.py`中定义需要的Field** ```python # Define here the models for your scraped items # # See documentation in: # https://docs.scrapy.org/en/latest/topics/items.html """ @Date 2021/4/26 """ import scrapy class MyspiderItem(scrapy.Item): # 为将要提取的数据创建对应的Field title = scrapy.Field() url = scrapy.Field() pass ``` <br/> **2. 在你的爬虫的产生数据的地方引用 MyspiderItem** ```python """ books.py @Date 2021/4/26 """ import scrapy from mySpider.items import MyspiderItem class BooksSpider(scrapy.Spider): name = 'books' allowed_domains = ['book.jd.com'] start_urls = ['http://book.jd.com/'] def parse(self, response): item = MyspiderItem() # title和url必须是MyspiderItem定义好变量 item['title'] = response.xpath('//title/text()').extract_first() item['url'] = response.url yield item ``` <br/> **3. 我们在`pipelines.py`中提取数据** ```python class MyspiderPipeline: def process_item(self, item, spider): # {'title': '京东图书_图书_畅销书_电子书_文娱_教育培训_低价书-京东', 'url': 'https://book.jd.com/'} print(item) return item ```