ThinkChat2.0新版上线,更智能更精彩,支持会话、画图、阅读、搜索等,送10W Token,即刻开启你的AI之旅 广告
### 第1关:清洗HTML文档中无意义数据 ``` public Document getDoc(String filePath) throws IOException{ File input = new File(filePath); // File file=new File("./backups/hotels.ctrip.com_domestic-city-hotel.txt"); Document document =Jsoup.parse(input,"UTF-8","http://www.ctrip.com/"); return document ; } /** * 获取清理后的信息 * @param doc * @return */ public List<String> cleanHTML(Document doc){ List<String> aa=new ArrayList<>(); String bb =Jsoup.clean(doc.toString() ,Whitelist.basic()); String cc =Jsoup.clean(doc.toString() ,Whitelist.simpleText()); aa.add(bb); aa.add(cc); return aa; } ``` ### 第2关:获取携程网北京市的所有酒店信息 ``` public List<Hotel> getHotle(String hotelResult){ List<Hotel> a = new ArrayList<Hotel>(); JSONObject b = JSONObject.parseObject(hotelResult); List<Hotel> c = JSON.parseArray(b.getString("hotelPositionJSON"), Hotel.class); // 增加价格数据 JSONArray hotelsPrice = b.getJSONArray("htllist"); if (hotelsPrice != null && !hotelsPrice.isEmpty()) { for (int i = 0; i < c.size(); i++) { JSONObject priceObj = hotelsPrice.getJSONObject(i); if (priceObj != null && !priceObj.isEmpty()) { Hotel hotel = c.get(i); String hotelId = priceObj.getString("hotelid"); double price = priceObj.getDoubleValue("amount"); if (hotel.getId().equals(hotelId)) { hotel.setPrice(price); } } } } a.addAll(c); return a; } ```