多应用+插件架构,代码干净,二开方便,首家独创一键云编译技术,文档视频完善,免费商用码云13.8K 广告
![](https://box.kancloud.cn/4eec068d817dd460cc83000ccf882fe4_709x395.png) ![](https://box.kancloud.cn/de4f23298125a5629537dfbf7dab413f_838x262.png) ![](https://box.kancloud.cn/9b2611cd2adea3efbf8fb64cc851a637_826x294.png) 1. | 或 `//tr[@class="odd"]|//tr[@class="even"]` ![](https://box.kancloud.cn/b941179398cb9d6451f08993e0b45aaf_1344x429.png) 2. 在xpath中取数组的值,从1开始 例如 : `// 表示任意路径` ~~~ <tr class="even"> <td class="l square"> # //tr[@class="odd"]|//tr[@class="even"]/td[1] <a target="_blank" href="position_detail.php?id=33102&keywords=&tid=0&lid=0">SNG04-后台开发工程师(上海)</a> <span class="hot"/> </td> <td>技术类</td> # //tr[@class="odd"]|//tr[@class="even"]/td[2] <td>1</td> # //tr[@class="odd"]|//tr[@class="even"]/td[3] <td>上海</td> # //tr[@class="odd"]|//tr[@class="even"]/td[4] <td>2017-09-27</td> # //tr[@class="odd"]|//tr[@class="even"]/td[5] </tr> <tr class="odd"> <td class="l square"> <a target="_blank" href="position_detail.php?id=33104&keywords=&tid=0&lid=0">21228-MMORPG运营分析(深圳)</a> </td> <td>产品/项目类</td> <td>1</td> <td>深圳</td> <td>2017-09-27</td> </tr> ~~~ ~~~ def parse(self, response): teacher_list = response.xpath('//tr[@class="odd"]|//tr[@class="even"]') for each in teacher_list: item = tecentItem() # 不加extract() 结果为xpath匹配对象 try: # 从1开始提取列表 position_name = each.xpath('./td[1]/a/text()').extract()[0] position_type = each.xpath('./td[2]/text()').extract()[0] # title location = each.xpath('./td[4]/text()').extract()[0] # info time = each.xpath('./td[5]/text()').extract()[0] detail = each.xpath('./td[1]/a/@href').extract()[0] item['position_type'] = position_type item['position_name'] = position_name item['location'] = location item['publish_time'] = time item['detail'] = "http://hr.tencent.com/" + detail yield item # 生成器 except: pass if self.offset < 2250: self.offset +=10 print("第几页" + self.url+str(self.offset)) else: os._exit(0) # 向引擎请求,引擎把请求发送给调度器,入队列,出队列,交给下载器下载 #从其实url开始,然后请求其他页的数据,返回数据是生成器 yield scrapy.Request(url=self.url+str(self.offset),callback=self.parse) # callback 是指有响应,就交给parse方法 ~~~ *