企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
# querylist采集库 官方 <https://querylist.cc> 采集的属性有text,html,href,src,name,data- `composer require jaeger/querylist` ``` <pre class="calibre16">``` use QL\QueryList<span class="token2">;</span> <span class="token6">//采集某页面所有的图片</span> $data <span class="token">=</span> QueryList<span class="token2">:</span><span class="token2">:</span><span class="token1">get</span><span class="token2">(</span><span class="token4">'http://cms.querylist.cc/bizhi/453.html'</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">find</span><span class="token2">(</span><span class="token4">'img'</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">attrs</span><span class="token2">(</span><span class="token4">'src'</span><span class="token2">)</span><span class="token2">;</span> <span class="token6">//打印结果</span> <span class="token1">print_r</span><span class="token2">(</span>$data<span class="token">-</span><span class="token">></span><span class="token1">all</span><span class="token2">(</span><span class="token2">)</span><span class="token2">)</span><span class="token2">;</span> <span class="token6">//采集某页面所有的超链接和超链接文本内容</span> <span class="token6">//可以先手动获取要采集的页面源码</span> $html <span class="token">=</span> <span class="token1">file_get_contents</span><span class="token2">(</span><span class="token4">'http://cms.querylist.cc/google/list_1.html'</span><span class="token2">)</span><span class="token2">;</span> <span class="token6">//然后可以把页面源码或者HTML片段传给QueryList</span> $data <span class="token">=</span> QueryList<span class="token2">:</span><span class="token2">:</span><span class="token1">html</span><span class="token2">(</span>$html<span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">rules</span><span class="token2">(</span><span class="token2">[</span> <span class="token6">//设置采集规则</span> <span class="token6">// 采集所有a标签的href属性</span> <span class="token4">'link'</span> <span class="token">=</span><span class="token">></span> <span class="token2">[</span><span class="token4">'a'</span><span class="token2">,</span><span class="token4">'href'</span><span class="token2">]</span><span class="token2">,</span> <span class="token6">// 采集所有a标签的文本内容</span> <span class="token4">'text'</span> <span class="token">=</span><span class="token">></span> <span class="token2">[</span><span class="token4">'a'</span><span class="token2">,</span><span class="token4">'text'</span><span class="token2">]</span> <span class="token2">]</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">query</span><span class="token2">(</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">getData</span><span class="token2">(</span><span class="token2">)</span><span class="token2">;</span> <span class="token6">// 采集该页面文章列表中所有[文章]的超链接和超链接文本内容</span> $data <span class="token">=</span> QueryList<span class="token2">:</span><span class="token2">:</span><span class="token1">get</span><span class="token2">(</span><span class="token4">'http://cms.querylist.cc/google/list_1.html'</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">rules</span><span class="token2">(</span><span class="token2">[</span> <span class="token4">'link'</span> <span class="token">=</span><span class="token">></span> <span class="token2">[</span><span class="token4">'h2>a'</span><span class="token2">,</span><span class="token4">'href'</span><span class="token2">,</span><span class="token4">''</span><span class="token2">,</span><span class="token5">function</span><span class="token2">(</span>$content<span class="token2">)</span><span class="token2">{</span> <span class="token6">//利用回调函数补全相对链接</span> $baseUrl <span class="token">=</span> <span class="token4">'http://cms.querylist.cc'</span><span class="token2">;</span> <span class="token5">return</span> $baseUrl<span class="token2">.</span>$content<span class="token2">;</span> <span class="token2">}</span><span class="token2">]</span><span class="token2">,</span> <span class="token4">'text'</span> <span class="token">=</span><span class="token">></span> <span class="token2">[</span><span class="token4">'h2>a'</span><span class="token2">,</span><span class="token4">'text'</span><span class="token2">]</span> <span class="token2">]</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">range</span><span class="token2">(</span><span class="token4">'.cate_list li'</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">query</span><span class="token2">(</span><span class="token2">)</span><span class="token">-</span><span class="token">></span><span class="token1">getData</span><span class="token2">(</span><span class="token2">)</span><span class="token2">;</span> <span class="token6">//打印结果</span> <span class="token1">print_r</span><span class="token2">(</span>$data<span class="token">-</span><span class="token">></span><span class="token1">all</span><span class="token2">(</span><span class="token2">)</span><span class="token2">)</span><span class="token2">;</span> ``` ```