创建索引 · Lucene案例开发

转载请注明出处：[http://blog.csdn.net/xiaojimanman/article/details/42872711](http://blog.csdn.net/xiaojimanman/article/details/42872711) 从这篇博客开始，不论是API介绍还是后面的案例开发，都是基于 lucene4.3.1 这个版本，Lucene4.3.1 下载请[点击这里](http://archive.apache.org/dist/lucene/java/4.3.1/)， Lucene其他版本下载请[点击这里](http://archive.apache.org/dist/lucene/java/)，Lucene4.3.1官方API文档请[点击这里](http://lucene.apache.org/core/4_3_1/core/)。 **创建索引demo** 在开始介绍之前，先看一个简单的索引创建demo程序： ~~~ /** *@Description: 索引创建demo */ package com.lulei.lucene.study; import java.io.File; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class IndexCreate { public static void main(String[] args) { //指定索引分词技术，这里使用的是标准分词 Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_43); //indexwriter 配置信息 IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_43, analyzer); //索引的打开方式，没有索引文件就新建，有就打开 indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND); Directory directory = null; IndexWriter indexWrite = null; try { //指定索引硬盘存储路径 directory = FSDirectory.open(new File("D://study/index/testindex")); //如果索引处于锁定状态，则解锁 if (IndexWriter.isLocked(directory)){ IndexWriter.unlock(directory); } //指定所以操作对象indexWrite indexWrite = new IndexWriter(directory, indexWriterConfig); } catch (Exception e) { e.printStackTrace(); } //创建文档一 Document doc1 = new Document(); //对name域赋值“测试标题”，存储域值信息 doc1.add(new TextField("name", "测试标题", Store.YES)); //对content域赋值“测试标题”，存储域值信息 doc1.add(new TextField("content", "测试内容", Store.YES)); try { //将文档写入到索引中 indexWrite.addDocument(doc1); } catch (Exception e) { e.printStackTrace(); } //创建文档二 Document doc2 = new Document(); doc2.add(new TextField("name", "基于lucene的案例开发：索引数学模型", Store.YES)); doc2.add(new TextField("content", "lucene将一篇文档分成若干个域，每个域又分成若干个词元，通过词元在文档中的重要程度，将文档转化为N维的空间向量，通过计算两个向量之间的夹角余弦值来计算两个文档的相似程度", Store.YES)); try { //将文档写入到索引中 indexWrite.addDocument(doc2); } catch (Exception e) { e.printStackTrace(); } //将indexWrite操作提交，如果不提交，之前的操作将不会保存到硬盘 try { //这一步很消耗系统资源，所以commit操作需要有一定的策略 indexWrite.commit(); //关闭资源 indexWrite.close(); directory.close(); } catch (Exception e) { e.printStackTrace(); } } } ~~~ 在上述的程序中，已做了详细的注释，对每一条语句的作用就不再介绍，下面就看一下执行这个main函数之后创建的索引文件，如下图： ![](https://box.kancloud.cn/2016-02-22_56ca7bed79c25.jpg) 通过索引查看工具 luke 可以简单的看下索引中的内容，如下图： ![](https://box.kancloud.cn/2016-02-22_56ca7bed90234.jpg) ![](https://box.kancloud.cn/2016-02-22_56ca7beda979e.jpg) 从上面两张图，我们可以看出索引中一共有两个文档，content域有50个词，name域有18个词，索引中存储了文档的详细信息。 **创建索引核心类** 在上述创建索引过程中，用到了几个核心类：**IndexWriter**、**Directory**、**Analyzer**、**Document**、**Field**。 **IndexWriter** IndexWriter(写索引)是索引过程中的核心组件，这个类负责创建新的索引或打开已有的索引以及向索引中添加、删除、更新被索引的文档信息；IndexWriter需要开辟一定空间来存储索引，该功能可以由Directory完成。 **Directory** Directory类描述了Lucene索引的存放位置。它是一个抽象类，它的子类负责指定索引的存储路径，在前面的例子中，我们用的是FSDirectory.open方法来获取真实文件在文件系统中的存储路径，然后将他们依次传递给IndexWriter类构造方法。 **Analyzer** 文档信息在被索引之前需要经过Analyzer（分析器）处理，上述例子中使用的是标准分词，在以后的博客中会单独介绍各种分词器以及使用场景。 **Document** Document对象的结构比较简单，为一个包含多个Field对象的容器，上述事例中的文档就包含两个域 name、 content。 **Filed** 索引中的每一个文档都包含一个或多个域不同命名的域，每个域都有一个域名和对应的域值以及一组选项来精确控制Lucene索引操作各个域值。在搜索时，所有域的文本就好像连接在一起，作为一个文本域来处理。上述几个核心类在Lucene的操作中非常重要而且常用，如需要详细了解，还请参照[官方API文档](http://lucene.apache.org/core/4_3_1/core/)。