## **爬取当当前500数据-如何设计一个完整的请求程序**
实战:爬取当当网 Top 500 本五星好评书籍
#url样式
[http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1)
......
[http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-3](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-3)
......
[http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-25](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-25)
<span style="color:red;">1.首次尝试</span>
~~~
import requests
import random
url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1"
user_agent_list =[
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999",
]
header = {
"User-Agent":random.choice(user_agent_list)
}
print(header)
response = requests.get(url,headers=header)
print(response.status_code)
print(response.text)
~~~
<span style="color:red;">2.二次改进循环遍历获取全部</span>
~~~
import requests
import random
user_agent_list =[
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999",
]
for x in range(26):
url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-{}"
header = {
"User-Agent": random.choice(user_agent_list)
}
print(header)
url = url.format(x)
print(url)
response = requests.get(url,headers=header)
print(response.status_code)
~~~
<span style="color:red;">3.三次改进,增加捕获异常,增加时间等待</span>
~~~
import requests
import random
user_agent_list =[
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50",
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999",
]
for x in range(26):
url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-{}"
header = {
"User-Agent": random.choice(user_agent_list)
}
print(header)
url = url.format(x)
print(url)
#对请求做错误兼容,这样就不会造成程序的异常终止
try:
response = requests.get(url,headers=header)
except Exception as e:
print(e) # 打印错误
continue
print(response.status_code)
~~~
<span style="color:red;">4.知识点</span>
1.Python2.6 开始,新增了一种格式化字符串的函数str.format(),它增强了字符串格式化的功能。
基本语法是通过{},和:来代替以前的%。
例如:
~~~
res = "{} {}".format("hello", "world") # 不设置指定位置,按默认顺序
print(res)
hello world
~~~
~~~
res = "{0} {1}".format("hello", "world") # 设置指定位置
print(res)
'hello world'
~~~
~~~
res = "{1} {0} {1}".format("hello", "world") # 设置指定位置
print(res)
'world hello world'
~~~
~~~
res = "我叫:{name}, 我家在 {add}".format( add="Shanxi",name="sunyuwei")
print(res)
~~~
2. try-except异常处理
参考网址:
[https://www.runoob.com/python/python-exceptions.html](https://www.runoob.com/python/python-exceptions.html)
~~~
try:
正常的操作
......................
except:
发生异常,执行这块代码
......................
else:
如果没有异常执行这块代码
~~~
3.try-finally 语句
try-finally 语句无论是否发生异常都将执行最后的代码。
~~~
try:
正常的操作
......................
finally:
无论是否发生异常,都执行这块代码
......................
~~~