企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
# HtmlParser介绍 <div><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">1、相关资料</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 官方文档:http://htmlparser.sourceforge.net/samples.html</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> API:http://htmlparser.sourceforge.net/javadoc/index.html</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 其它HTML 解释器:jsoup等。由于HtmlParser自2006年以后就再没更新,目前很多人推荐使用jsoup代替它。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">2、使用HtmlPaser的关键步骤</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (1)通过Parser类创建一个解释器</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (2)创建Filter或者Visitor</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (3)使用parser根据filter或者visitor来取得所有符合条件的节点</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (4)对节点内容进行处理</p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">3、使用Parser的构造函数创建解释器</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><table border="1" cellpadding="2" cellspacing="0" width="100%" style="color: rgb(0, 0, 0); font-family: Simsun; font-size: 14px;"><tbody><tr style="background-color:rgb(238,238,238);"><td style="height: 41px;"><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>()</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Zero argument constructor.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28org.htmlparser.lexer.Lexer%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class in org.htmlparser.lexer" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/lexer/Lexer.html" target="_blank" style="color:rgb(106,57,6);">Lexer</a>&nbsp;lexer)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Construct a parser using the provided lexer.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28org.htmlparser.lexer.Lexer,%20org.htmlparser.util.ParserFeedback%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class in org.htmlparser.lexer" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/lexer/Lexer.html" target="_blank" style="color:rgb(106,57,6);">Lexer</a>&nbsp;lexer,&nbsp;<a title="interface in org.htmlparser.util" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/util/ParserFeedback.html" target="_blank" style="color:rgb(106,57,6);">ParserFeedback</a>&nbsp;fb)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Construct a parser using the provided lexer and feedback object.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.lang.String%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.lang" href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html" target="_blank" style="color:rgb(106,57,6);">String</a>&nbsp;resource)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Creates a Parser object with the location of the resource (URL or file).</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.lang.String,%20org.htmlparser.util.ParserFeedback%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.lang" href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html" target="_blank" style="color:rgb(106,57,6);">String</a>&nbsp;resource,&nbsp;<a title="interface in org.htmlparser.util" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/util/ParserFeedback.html" target="_blank" style="color:rgb(106,57,6);">ParserFeedback</a>&nbsp;feedback)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Creates a Parser object with the location of the resource (URL or file) You would typically create a DefaultHTMLParserFeedback object and pass it in.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.net.URLConnection%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.net" href="http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLConnection.html" target="_blank" style="color:rgb(106,57,6);">URLConnection</a>&nbsp;connection)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Construct a parser using the provided URLConnection.</td></tr><tr style="background-color:rgb(238,238,238);"><td><code><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/Parser.html#Parser%28java.net.URLConnection,%20org.htmlparser.util.ParserFeedback%29" target="_blank" style="color:rgb(106,57,6);">Parser</a></strong>(<a title="class or interface in java.net" href="http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLConnection.html" target="_blank" style="color:rgb(106,57,6);">URLConnection</a>&nbsp;connection,&nbsp;<a title="interface in org.htmlparser.util" href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/util/ParserFeedback.html" target="_blank" style="color:rgb(106,57,6);">ParserFeedback</a>&nbsp;fb)</code>&nbsp;<br> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Constructor for custom HTTP access.</td></tr></tbody></table><span style="color: rgb(54, 46, 43); font-family: Arial;">&nbsp; &nbsp; &nbsp; &nbsp; 对于大多数使用者来说,使用最多的是通过一个</span><span style="color: blue; font-family: Arial;">URLConnection</span><span style="color: rgb(54, 46, 43); font-family: Arial;">或者一个保存有网页内容的字符串来初始化Parser,或者使用静态函数来生成一个Parser对象。</span><span style="color: blue; font-family: Arial;">ParserFeedback</span><span style="color: rgb(54, 46, 43); font-family: Arial;">的代码很简单,是针对调试和跟踪分析过程的,一般不需要改变。而使用</span><span style="color: green; font-family: Arial;">Lexer</span><span style="color: rgb(54, 46, 43); font-family: Arial;">则是一个相对比较高级的话题,放到以后再讨论吧。</span><br style="color: rgb(54, 46, 43); font-family: Arial;"><span style="color: rgb(54, 46, 43); font-family: Arial;">&nbsp; &nbsp; &nbsp; &nbsp; 这里比较有趣的一点是,如果需要设置页面的编码方式的话,不使用Lexer就只有静态函数一个方法了。对于大多数中文页面来说,好像这是应该用得比较多的一个方法。</span><br style="color: rgb(54, 46, 43); font-family: Arial;"><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">4、HtmlPaser使用Node对象保存各节点信息</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><img src="http://note.youdao.com/yws/res/10738/977917BD60E34D578F9EB0747420F7BB" data-media-type="image" /><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (1)访问各个节点的方法<br> Node&nbsp;<span style="color:blue;">getParent</span>&nbsp;():取得父节点<br> NodeList&nbsp;<span style="color:blue;">getChildren</span>&nbsp;():取得子节点的列表<br> Node&nbsp;<span style="color:blue;">getFirstChild</span>&nbsp;():取得第一个子节点<br> Node&nbsp;<span style="color:blue;">getLastChild</span>&nbsp;():取得最后一个子节点<br> Node&nbsp;<span style="color:blue;">getPreviousSibling</span>&nbsp;():取得前一个兄弟(不好意思,英文是兄弟姐妹,直译太麻烦而且不符合习惯,对不起女同胞了)<br> Node&nbsp;<span style="color:blue;">getNextSibling</span>&nbsp;():取得下一个兄弟节点<br> (2)取得<span style="color:fuchsia;">Node</span>内容的函数<br> String&nbsp;<span style="color:blue;">getText</span>&nbsp;():取得文本<br> String&nbsp;<span style="color:blue;">toPlainTextString</span>():取得纯文本信息。<br> String&nbsp;<span style="color:blue;">toHtml</span>&nbsp;()&nbsp;:取得<span style="color:green;">HTML</span>信息(原始<span style="color:green;">HTML</span>)<br> String&nbsp;<span style="color:blue;">toHtml</span>&nbsp;(boolean verbatim):取得<span style="color:green;">HTML</span>信息(原始<span style="color:green;">HTML</span>)<br> String&nbsp;<span style="color:blue;">toString</span>&nbsp;():取得字符串信息(原始<span style="color:green;">HTML</span>)<br> Page&nbsp;<span style="color:blue;">getPage</span>&nbsp;():取得这个<span style="color:green;">Node</span>对应的<span style="color:green;">Page</span>对象<br> int&nbsp;<span style="color:blue;">getStartPosition</span>&nbsp;():取得这个<span style="color:green;">Node</span>在<span style="color:green;">HTML</span>页面中的起始位置<br> int&nbsp;<span style="color:blue;">getEndPosition</span>&nbsp;():取得这个<span style="color:green;">Node</span>在<span style="color:green;">HTML</span>页面中的结束位置</p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:24px;">5、使用Filter访问Node节点及其内容</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><span style="font-size:18px;">(1)Filter的种类</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 顾名思义,Filter就是对于结果进行过滤,取得需要的内容。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 所有的Filter均实现了NodeFilter接口,此接口只有一个方法Boolean accept(Node node),用于确定某个节点是否属于此Filter过滤的范围。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> HTMLParser在org.htmlparser.filters包之内一共定义了16个不同的Filter,也可以分为几类。<br><span style="color:green;"><a href="http://www.baizeju.com/html/HTMLParser/200807/07-121.html#%E5%88%A4%E6%96%AD%E7%B1%BBFilter" target="_blank" style="color:rgb(16,138,198);"><strong>判断类<span style="color:green;">Filter</span>:</strong></a></span><br><span style="color:blue;">TagNameFilter</span><span style="color:blue;"><br> HasAttributeFilter</span><br> HasChildFilter<br> HasParentFilter<br> HasSiblingFilter<br> IsEqualFilter<br><span style="color:green;"><a href="http://www.baizeju.com/html/HTMLParser/200807/07-121.html#%E9%80%BB%E8%BE%91%E8%BF%90%E7%AE%97Filter" target="_blank" style="color:rgb(16,138,198);"><strong>逻辑运算<span style="color:green;">Filter</span>:</strong></a></span><br><span style="color:blue;">AndFilter</span><span style="color:blue;"><br> NotFilter</span><br> OrFilter<br> XorFilter<br><span style="color:green;"><a href="http://www.baizeju.com/html/HTMLParser/200807/07-121.html#%E5%85%B6%E4%BB%96Filter" target="_blank" style="color:rgb(16,138,198);"><strong>其他<span style="color:green;">Filter</span>:</strong></a></span><br><span style="color:blue;">NodeClassFilter</span><span style="color:blue;"><br> StringFilter</span><br> LinkStringFilter<br> LinkRegexFilter<br> RegexFilter<br> CssSelectorNodeFilter</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 除此以外,可以自定义一些Filter,用于完成特殊需求的过滤。<br><span style="font-size:18px;">(2)Filter的使用示例</span></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 以下示例用于提取HTML文件中的链接</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><div style="background-color:rgb(231,229,220);color:rgb(54,46,43);font-family:Consolas,'Courier New',Courier,mono,serif;"><div><div style="background-color:rgb(248,248,248);color:silver;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:9px;"><strong>[java]</strong>&nbsp;<a title="view plain" href="http://blog.csdn.net/jediael_lu/article/details/26396705#" target="_blank" style="color: rgb(160, 160, 160);">view plain</a><a title="copy" href="http://blog.csdn.net/jediael_lu/article/details/26396705#" target="_blank" style="color: rgb(160, 160, 160);">copy</a><a title="在CODE上查看代码片" href="https://code.csdn.net/snippets/356130" target="_blank" style="color: rgb(160, 160, 160);"><img src="http://note.youdao.com/yws/res/10737/F9100224A02B471E9B4A148E168E4281" alt="在CODE上查看代码片" width="12" height="12" data-media-type="image" /></a><a title="派生到我的代码片" href="https://code.csdn.net/snippets/356130/fork" target="_blank" style="color: rgb(160, 160, 160);"><img src="https://code.csdn.net/assets/ico_fork.svg" alt="派生到我的代码片" width="12" height="12" data-media-type="image" /></a><div></div></div></div><ol start="1" style="background-color:rgb(255,255,255);color:rgb(92,92,92);"><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">package</span>&nbsp;org.ljh.search.html;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.HashSet;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.Set;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.Node;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.NodeFilter;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.Parser;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.filters.NodeClassFilter;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.filters.OrFilter;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.tags.LinkTag;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.util.NodeList;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.htmlparser.util.ParserException;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 130, 0);">//本类创建用于HTML文件解释工具</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">class</span>&nbsp;HtmlParserTool&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//&nbsp;本方法用于提取某个html文档中内嵌的链接</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">static</span>&nbsp;Set&lt;String&gt;&nbsp;extractLinks(String&nbsp;url,&nbsp;LinkFilter&nbsp;filter)&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set&lt;String&gt;&nbsp;links&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;HashSet&lt;String&gt;();&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">try</span>&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//&nbsp;1、构造一个Parser,并设置相关的属性</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Parser&nbsp;parser&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;Parser(url);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;parser.setEncoding(<span style="color: blue;">"gb2312"</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//&nbsp;2.1、自定义一个Filter,用于过滤&lt;Frame&nbsp;&gt;标签,然后取得标签中的src属性值</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;NodeFilter&nbsp;frameNodeFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;NodeFilter()&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(100, 100, 100);">@Override</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">boolean</span>&nbsp;accept(Node&nbsp;node)&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>&nbsp;(node.getText().startsWith(<span style="color: blue;">"frame&nbsp;src="</span>))&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">true</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">else</span>&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">false</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;};&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//2.2、创建第二个Filter,过滤&lt;a&gt;标签</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;NodeFilter&nbsp;aNodeFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;NodeClassFilter(LinkTag.<span style="color: rgb(0, 102, 153); font-weight: bold;">class</span>);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//2.3、净土上述2个Filter形成一个组合逻辑Filter。</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;OrFilter&nbsp;linkFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;OrFilter(frameNodeFilter,&nbsp;aNodeFilter);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//3、使用parser根据filter来取得所有符合条件的节点</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;NodeList&nbsp;nodeList&nbsp;=&nbsp;parser.extractAllNodesThatMatch(linkFilter);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//4、对取得的Node进行处理</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">for</span>(<span style="color: rgb(0, 102, 153); font-weight: bold;">int</span>&nbsp;i&nbsp;=&nbsp;<span style="color: rgb(192, 0, 0);">0</span>;&nbsp;i&lt;nodeList.size();i++){&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Node&nbsp;node&nbsp;=&nbsp;nodeList.elementAt(i);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;linkURL&nbsp;=&nbsp;<span style="color: blue;">""</span>;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//如果链接类型为&lt;a&nbsp;/&gt;</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(node&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">instanceof</span>&nbsp;LinkTag){&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LinkTag&nbsp;link&nbsp;=&nbsp;(LinkTag)node;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;linkURL=&nbsp;link.getLink();&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<span style="color: rgb(0, 102, 153); font-weight: bold;">else</span>{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//如果类型为&lt;frame&nbsp;/&gt;</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;nodeText&nbsp;=&nbsp;node.getText();&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">int</span>&nbsp;beginPosition&nbsp;=&nbsp;nodeText.indexOf(<span style="color: blue;">"src="</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nodeText&nbsp;=&nbsp;nodeText.substring(beginPosition);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">int</span>&nbsp;endPosition&nbsp;=&nbsp;nodeText.indexOf(<span style="color: blue;">"&nbsp;"</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(endPosition&nbsp;==&nbsp;-<span style="color: rgb(192, 0, 0);">1</span>){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;endPosition&nbsp;=&nbsp;nodeText.indexOf(<span style="color: blue;">"&gt;"</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;linkURL&nbsp;=&nbsp;nodeText.substring(<span style="color: rgb(192, 0, 0);">5</span>,&nbsp;endPosition&nbsp;-&nbsp;<span style="color: rgb(192, 0, 0);">1</span>);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 130, 0);">//判断是否属于本次搜索范围的url</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(filter.accept(linkURL)){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;links.add(linkURL);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">catch</span>&nbsp;(ParserException&nbsp;e)&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;e.printStackTrace();&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;links;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">}&nbsp;&nbsp;</span></li></ol></div><p style="color: rgb(54, 46, 43); font-family: Arial;"> 程序中的一些说明:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (1)通过Node#getText()取得节点的String。</p><p style="color: rgb(54, 46, 43); font-family: Arial;"> (2)node instanceof TagLink,即&lt;a/&gt;节点,其它还有很多的类似节点,如tableTag等,基本上每个常见的html标签均会对应一个tag。官方文档说明如下:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><table border="1" cellpadding="2" cellspacing="0" width="100%" style="color: rgb(0, 0, 0); font-family: Simsun; font-size: 14px;"><tbody><tr><td><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/nodes/package-summary.html" target="_blank" style="color:rgb(106,57,6);">org.htmlparser.nodes</a></strong></td><td>The nodes package has the concrete node implementations.</td></tr><tr><td><strong><a href="http://htmlparser.sourceforge.net/javadoc/org/htmlparser/tags/package-summary.html" target="_blank" style="color:rgb(106,57,6);">org.htmlparser.tags</a></strong></td><td>The tags package contains specific tags.</td></tr></tbody></table><span style="color: rgb(54, 46, 43); font-family: Arial;">因此可以通过此方法直接判断一个节点是否某个标签内容。</span><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 其中用到的LinkFilter接口定义如下:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><div style="background-color:rgb(231,229,220);color:rgb(54,46,43);font-family:Consolas,'Courier New',Courier,mono,serif;"><div><div style="background-color:rgb(248,248,248);color:silver;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:9px;"><strong>[java]</strong><div></div></div></div><ol start="1" style="background-color:rgb(255,255,255);color:rgb(92,92,92);"><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">package</span>&nbsp;org.ljh.search.html;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 130, 0);">//本接口所定义的过滤器,用于判断url是否属于本次搜索范围。</span>&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">interface</span>&nbsp;LinkFilter&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">boolean</span>&nbsp;accept(String&nbsp;url);&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">}&nbsp;&nbsp;</span></li></ol></div><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> 测试程序如下:</p><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><div style="background-color:rgb(231,229,220);color:rgb(54,46,43);font-family:Consolas,'Courier New',Courier,mono,serif;"><div><div style="background-color:rgb(248,248,248);color:silver;font-family:Verdana,Geneva,Arial,Helvetica,sans-serif;font-size:9px;"><strong>[java]</strong>&nbsp;<div></div></div></div><ol start="1" style="background-color:rgb(255,255,255);color:rgb(92,92,92);"><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">package</span>&nbsp;org.ljh.search.html;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.Iterator;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;java.util.Set;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">import</span>&nbsp;org.junit.Test;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;"><span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">class</span>&nbsp;HtmlParserToolTest&nbsp;{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(100, 100, 100);">@Test</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">void</span>&nbsp;testExtractLinks()&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;url&nbsp;=&nbsp;<span style="color: blue;">"http://www.baidu.com"</span>;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LinkFilter&nbsp;linkFilter&nbsp;=&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">new</span>&nbsp;LinkFilter(){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(100, 100, 100);">@Override</span>&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">public</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">boolean</span>&nbsp;accept(String&nbsp;url)&nbsp;{&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">if</span>(url.contains(<span style="color: blue;">"baidu"</span>)){&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">true</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<span style="color: rgb(0, 102, 153); font-weight: bold;">else</span>{&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">return</span>&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">false</span>;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;};&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set&lt;String&gt;&nbsp;urlSet&nbsp;=&nbsp;HtmlParserTool.extractLinks(url,&nbsp;linkFilter);&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Iterator&lt;String&gt;&nbsp;it&nbsp;=&nbsp;urlSet.iterator();&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color: rgb(0, 102, 153); font-weight: bold;">while</span>(it.hasNext()){&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(it.next());&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;</span></li><li style="color:inherit;"><span style="color: black;">&nbsp;&nbsp;</span></li><li style="background-color:rgb(248,248,248);"><span style="color: black;">}&nbsp;&nbsp;</span></li></ol></div><p style="color: rgb(54, 46, 43); font-family: Arial;"><br></p><span style="color: rgb(54, 46, 43); font-family: Arial;">输出结果如下:</span><p style="color: rgb(54, 46, 43); font-family: Arial;"></p><p style="color: rgb(54, 46, 43); font-family: Arial;"> http://www.hao123.com<br> http://www.baidu.com/<br> http://www.baidu.com/duty/<br> http://v.baidu.com/v?ct=301989888&amp;rn=20&amp;pn=0&amp;db=0&amp;s=25&amp;word=<br> http://music.baidu.com<br> http://ir.baidu.com<br> http://www.baidu.com/gaoji/preferences.html<br> http://news.baidu.com<br> http://map.baidu.com<br> http://music.baidu.com/search?fr=ps&amp;key=<br> http://image.baidu.com<br> http://zhidao.baidu.com<br> http://image.baidu.com/i?tn=baiduimage&amp;ct=201326592&amp;lm=-1&amp;cl=2&amp;nc=1&amp;word=<br> http://www.baidu.com/more/<br> http://shouji.baidu.com/baidusearch/mobisearch.html?ref=pcjg&amp;from=1000139w<br> http://wenku.baidu.com<br> http://news.baidu.com/ns?cl=2&amp;rn=20&amp;tn=news&amp;word=<br> https://passport.baidu.com/v2/?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2F<br> http://www.baidu.com/cache/sethelp/index.html<br> http://zhidao.baidu.com/q?ct=17&amp;pn=0&amp;tn=ikaslist&amp;rn=10&amp;word=&amp;fr=wwwt<br> http://tieba.baidu.com/f?kw=&amp;fr=wwwt<br> http://home.baidu.com<br> https://passport.baidu.com/v2/?reg&amp;regType=1&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2F<br> http://v.baidu.com<br> http://e.baidu.com/?refer=888<br> ;<br> http://tieba.baidu.com<br> http://baike.baidu.com<br> http://wenku.baidu.com/search?word=&amp;lm=0&amp;od=0<br> http://top.baidu.com<br> http://map.baidu.com/m?word=&amp;fr=ps01000</p></div>