ThinkChat2.0新版上线,更智能更精彩,支持会话、画图、阅读、搜索等,送10W Token,即刻开启你的AI之旅 广告
# MySQL中文分词 全文索引大体分为两个过程: * 索引创建(indexer):将现实世界中所有的结构化数据和非结构化数据提取信息,创建索引的过程 * 搜索索引(search):就是得到用户的查询请求,搜索创建的索引,然后返回结果的过程 ## 编译安装 sphinx+mmsg ### 0. 安装编译依赖工具包 ``` yum install make gcc gcc-c++ libtool autoconf automake imake mysql-devel libxml2-devel expat-devel ``` ### 下载稳定版源码包并解压 ``` [root@localhost.localdomain /usr/local/src] # wget http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gz [root@localhost.localdomain /usr/local/src] # tar xf coreseek-3.2.14.tar.gz [root@localhost.localdomain /usr/local/src] # cd coreseek-3.2.14 [root@localhost.localdomain /usr/local/src/coreseek-3.2.14] # ls csft-3.2.14(sphinx) mmseg-3.2.14 README.txt testpack 其中-- csft-4.1是修改适应了中文环境后的sphinx Mmseg 是中文分词插件 Testpack是测试用的软件包 ``` ### [安装 mmseg](http://www.coreseek.cn/products/products-install/install_on_bsd_linux/) #### cd mmseg ``` [root@localhost.localdomain /usr/local/src/coreseek-3.2.14] # cd mmseg-3.2.14/ ``` #### 执行bootstrap脚本 ``` [root@localhost.localdomain /usr/local/src/coreseek-3.2.14/mmseg-3.2.14] # ./bootstrap ``` #### ./configure --prefix=/usr/local/mmseg ``` [root@localhost.localdomain /usr/local/src/coreseek-3.2.14/mmseg-3.2.14] # ./configure --prefix=/usr/local/mmseg ``` #### make && make install ``` [root@localhost.localdomain /usr/local/src/coreseek-3.2.14/mmseg-3.2.14] # make && make install ``` ### 安装coreseek ``` [root@localhost.localdomain /usr/local/src/coreseek-3.2.14/csft-3.2.14] # ./buildconf.sh [root@localhost.localdomain /usr/local/src/coreseek-3.2.14/csft-3.2.14] # ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg/lib/ --with-mysql [root@localhost.localdomain /usr/local/src/coreseek-3.2.14/csft-3.2.14] # make && make install ``` ## Sphinx的使用 > 1. 数据源---要让sphinx知道,查哪些数据,即针对哪些数据做索引(可以定义多个源) > 2. 索引配置---针对哪个源做索引, 索引文件放在哪个目录?? 等等 > 3. 搜索服务器---sphinx可以在某个端口(默认9312),以其自身的协议,与外部程序做交互. **配置数据源** ``` [root@localhost.localdomain /usr/local/coreseek/etc] # cp sphinx.conf.dist sphinx.conf [root@localhost.localdomain /usr/local/coreseek/etc] # vim sphinx.conf ``` 如下配置: source src1 { type = mysql sql_host = localhost sql_user = root sql_pass = aaaaaa sql_db = test sql_query_pre = set names utf8 sql_query_pre = set session query_cache_type=off sql_query = `select a_id as id,cat_id,title,simtitle,seotitle,tags,source,description,content,dateline,editdateline from article` sql_attr_uint = a_id sql_attr_uint = cat_id sql_attr_timestamp = dateline sql_attr_timestamp = editdateline sql_query_info = `SELECT * FROM article WHERE a_id=$id` } **索引典型配置** > index test1 { > source = test > path = /usr/local/sphinx/var/data/test1 # 生成索引放在哪 > # stopwords = G:\data\stopwords.txt > # wordforms = G:\data\wordforms.txt > # exceptions = /data/exceptions.txt > charset_dictpath = /usr/local/mmseg/etc/ > charset_type = zh_cn.utf-8 > } **生成索引文件** ``` [root@localhost.localdomain /usr/local/coreseek/etc] # /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx.conf test1 (test1为索引名称) Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/sphinx.conf'... indexing index 'test1'... collected 8122 docs, 47.6 MB sorted 8.7 Mhits, 100.0% done total 8122 docs, 47596333 bytes total 17.782 sec, 2676636 bytes/sec, 456.75 docs/sec total 5 reads, 0.011 sec, 4559.8 kb/call avg, 2.3 msec/call avg total 58 writes, 0.429 sec, 903.8 kb/call avg, 7.3 msec/call avg ``` > **Error 注意:** > /usr/local/coreseek/bin/indexer: error while loading shared libraries: **libmysqlclient.so.18**: cannot open shared object file: No such file or directory > 发现**sphinx**的`indexer`依赖库`ibmysqlclient.so.18`找不到,通过编辑此文件来修复这个错误 `/etc/ld.so.conf` > `vi /etc/ld.so.conf ` > 将下面这句加到文件到尾部,并保存文件 > `/usr/local/mysql/lib ` > 然后运行下面这个命令即可 > `ldconfig` 在命令行测试查询 ```` [root@localhost.localdomain /usr/local/coreseek] # ./bin/search -c etc/sphinx.conf 留学 ```