企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
## Windows环境 - 安装lxml >pip install lxml-3.7.3-cp36-cp36m-win_amd64.whl >https://pypi.org/project/lxml/3.7.3/ - 安装zope.interface >pip install zope.interface-4.4.3-cp36-cp36m-win_amd64.whl >https://pypi.org/project/zope.interface/ - 安装pyOpenSSL >pip install pyOpenSSL-19.0.0-py2.py3-none-any.whl >https://pypi.org/project/pyOpenSSL/ - 安装Twisted >pip install Twisted-17.1.0-cp36-cp36m-win_amd64.whl >https://pypi.org/project/Twisted/17.1.0/ - 安装pywin32 >https://sourceforge.net/projects/pywin32/files/pywin32/Build%20220/ >pywin32-220.win-amd64-py3.6.exe - 或安装 >pip install pypiwin32-220-cp36-none-win_amd64.whl >https://pypi.org/project/pypiwin32/220/ - 安装scrapy >pip install scrapy ## CentOS环境 - 依赖库安装 >yum groupinstall development tools >yum install python36-devel epel-release libxslt-devel libxml2-devel openssl-devel - 安装scrapy >pip install scrapy ## scrapy shell - scrapy shell https://m.zhaopin.com/changsha-749/?keyword=php&order=0&maprange=3&ishome=0 - response 查看状态 - view(response) 调用默认浏览器查看 - xpath不放参数会报错,如果字符串是/结尾会报错 >response.xpath("//a/div/div/div[2]/div").extract() ## 项目流程 - 新建项目 >scrapy startproject tencent - 新建一个Spider >cd tencent >scrapy genspider zhaopin tencent.com - 查看爬虫列表 >scrapy list - 分析网站,在管道items.py里定义想要的字段 > - 启动爬虫 >scrapy crawl cake ***** ## 安装pywin32报错 根据自己python版本下载64位或32位((注意:pywin32版本跟随Python版本,即如果win是64位,但python是32位,pywin32要装32位的,与win无关)) 双击安装(可能会遇到下列错误是注册表问题) 安装第三方库出现Python version 3.6 required, which was not found in the registry错误解决 ![](https://box.kancloud.cn/f5e6af37c7154bcc58502a8dd93b1a52_688x470.jpg) 建立一个文件 register.py 内容如下. 然后执行该脚本. ``` import sys from winreg import * # tweak as necessary version = sys.version[:3] installpath = sys.prefix regpath = "SOFTWARE\\Python\\Pythoncore\\%s\\" % (version) installkey = "InstallPath" pythonkey = "PythonPath" pythonpath = "%s;%s\\Lib\\;%s\\DLLs\\" % ( installpath, installpath, installpath ) def RegisterPy(): try: reg = OpenKey(HKEY_CURRENT_USER, regpath) except EnvironmentError as e: try: reg = CreateKey(HKEY_CURRENT_USER, regpath) SetValue(reg, installkey, REG_SZ, installpath) SetValue(reg, pythonkey, REG_SZ, pythonpath) CloseKey(reg) except: print("*** Unable to register!") return print("--- Python", version, "is now registered!") return if (QueryValue(reg, installkey) == installpath and QueryValue(reg, pythonkey) == pythonpath): CloseKey(reg) print("=== Python", version, "is already registered!") return CloseKey(reg) print("*** Unable to register!") print("*** You probably have another Python installation!") if __name__ == "__main__": RegisterPy() ``` 打开注册表,win+R键,之后输入regedit 然后找到 >HKEY_CURRENT_USER\Software\Python\Pythoncore\3.6 将3.6改为3.6-32,这样就可以进行安装了 >HKEY_CURRENT_USER\Software\Python\Pythoncore\3.6-32