## 如何把采集结果存入mysql
<http://www.jishubu.net/yunwei/python/424.html>
pyspider是个非常强大简单易用的爬虫框架,但是默认软件会把采集的所有字段打包保存到默认的数据库中,和其它软件没法整合。现在需求是需要把采集的字段做为单独的字段保存到自定义的mysql数据库中,本人技术能力有限,个人感觉实现方法不是最优的,大家有能力的请自行改进,没能力的凑合着用吧。或是直接下载py脚本:把 pyspider的结果存入自定义的mysql数据库中[mysqldb.zip](http://www.jishubu.net/wp-content/plugins/wp-ueditor/ueditor/php/upload/8521423797887.zip)
~~~
pyspider结果保存到数据库简单样例。
使用方法:
1,把本文件放到pyspider/pyspider/database/mysql/目录下命名为mysqldb.py。
2,修改本文件的数据库配置参数及建立相应的表和库。
3,在脚本文件里使用from pyspider.database.mysql.mysqldb import SQL引用本代码.
4,重写on_result方法,实例化sql并调用replace(replace方法参数第一个是表名,第二个是结果。)。简单例子如下:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2015-01-26 13:12:04
# Project: jishubu.net
from pyspider.libs.base_handler import *
from pyspider.database.mysql.mysqldb import SQL
class Handler(BaseHandler):
crawl_config = {
}
@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://www.jishubu.net/', callback=self.index_page)
@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('p.pic a[href^="http"]').items():
print each.attr.href
@config(priority=2)
def detail_page(self, response):
return {
"url": response.url,
"title": response.doc('HTML>BODY#presc>DIV.main>DIV.prices_box.wid980.clearfix>DIV.detail_box>DL.assort.tongyong>DD>A').text(),
}
def on_result(self, result):
#print result
if not result or not result['title']:
return
sql = SQL()
sql.replace('info',**result)
'''
from six import itervalues
import mysql.connector
from datetime import date, datetime, timedelta
class SQL:
username = 'pyspider' #数据库用户名
password = 'pyspider' #数据库密码
database = 'result' #数据库
host = 'localhost' #数据库主机地址
connection = ''
connect = True
placeholder = '%s'
def __init__(self):
if self.connect:
SQL.connect(self)
def escape(self,string):
return '`%s`' % string
def connect(self):
config = {
'user':SQL.username,
'password':SQL.password,
'host':SQL.host
}
if SQL.database != None:
config['database'] = SQL.database
try:
cnx = mysql.connector.connect(**config)
SQL.connection = cnx
return True
except mysql.connector.Error as err:
if (err.errno == errorcode.ER_ACCESS_DENIED_ERROR):
print "The credentials you provided are not correct."
elif (err.errno == errorcode.ER_BAD_DB_ERROR):
print "The database you provided does not exist."
else:
print "Something went wrong: " , err
return False
def replace(self,tablename=None,**values):
if SQL.connection == '':
print "Please connect first"
return False
tablename = self.escape(tablename )
if values:
_keys = ", ".join(self.escape(k) for k in values)
_values = ", ".join([self.placeholder, ] * len(values))
sql_query = "REPLACE INTO %s (%s) VALUES (%s)" % (tablename, _keys, _values)
else:
sql_query = "REPLACE INTO %s DEFAULT VALUES" % tablename
cur = SQL.connection.cursor()
try:
if values:
cur.execute(sql_query, list(itervalues(values)))
else:
cur.execute(sql_query)
SQL.connection.commit()
return True
except mysql.connector.Error as err:
print ("An error occured: {}".format(err))
return False
~~~
## module :No module named mysqldb
`http://ftp.ntu.edu.tw/MySQL/Downloads/Connector-Python/`