python爬虫抓取全国pm2.5的空气质量（2015.12.21版） · Scrapy爬虫教程

这个编码格式真的是很闹心啊，看来真的得深入学习一下编码格式，要不这各种格式错误。这个编码还和编辑器有关系，最开始的时候实在sublime Text里编辑的代码，运行起来卡卡的，特别顺畅，但突然发现它不支持raw_input和input，所以令临时换到了python官方提供的idle中。之后就出现了各种奇葩编码错误。。。。。。程序大概意思就是，你输入一个城市的拼音，它就会返回这个城市的空气污染情况啊，有些城市可能会没有，这个完全取决与网站上有那些城市啊，当你想退出的时候就输入quit，就退出来了。里面还有一个多线程的写法，可以体验一下单线程和多线程之间的速度是有很大差距的。。。 ~~~ <span style="font-size:14px;"># -*- coding: utf-8 -*-> import urllib2 import threading from time import ctime import BeautifulSoup #besutifulsoup的第三版 import re def getPM25(cityname): site = 'http://www.pm25.com/city/' + cityname + '.html' html = urllib2.urlopen(site) soup = BeautifulSoup.BeautifulSoup(html) city = soup.find("span",{"class":"city_name"}) # 城市名称 aqi = soup.find("a",{"class":"cbol_aqi_num"}) # AQI指数 pm25 = soup.find("span",{"class":"cbol_nongdu_num_1"}) # pm25指数 pm25danwei = soup.find("span",{"class":"cbol_nongdu_num_2"}) # pm25指数单位 quality = soup.find("span",{"class":re.compile('cbor_gauge_level\d$')}) # 空气质量等级 result = soup.find("div",{"class":'cbor_tips'}) # 空气质量描述 replacechar = re.compile("<.*?>") #为了将<>全部替换成空 space = re.compile(" ") print city.string + u'\nAQI指数：' + aqi.string+ u'\nPM2.5浓度：' + pm25.string + pm25danwei.string + u'\n空气质量：' + quality.string + space.sub("",replacechar.sub('',str(result))).decode('utf-8') print '*'*20 + ctime() + '*'*20 def one_thread(cityname1): # 单线程 print 'One_thread Start: ' + ctime() + '\n' getPM25(cityname1) def two_thread(): # 多线程 print 'Two_thread Start: ' + ctime() + '\n' threads = [] t1 = threading.Thread(target=getPM25,args=('beijing',)) threads.append(t1) t2 = threading.Thread(target=getPM25,args=('shenyang',)) threads.append(t2) for t in threads: # t.setDaemon(True) t.start() if __name__ == '__main__': print "*"*20+"welcome to 京东放养的爬虫"+"*"*20 while True: cityname1 = raw_input("请输入想要查看的城市名称:(例如:beijing)") if cityname1 == 'quit': break one_thread(cityname1) #print '\n' * 2 #two_thread(cityname) </span> ~~~