[Xunsearch PHP-SDK](http://www.xunsearch.com) v1.4.8 API 参考文档
# XSTokenizerScws
[All Packages](#)| [方法(函数)](#)
| 包 | [XS.tokenizer](#) |
|-----|-----|
| 继承关系 | class XSTokenizerScws |
| 实现接口 | [XSTokenizer](#) |
| 始于 | 1.3.1 |
| 版本 | 1.0.0 |
| 源代码 | [sdk/php/lib/XSTokenizer.class.php](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php) |
SCWS - 分词器(与搜索服务端通讯)
### Public 方法
[隐去继承来的方法](#)
| 名称 | 描述 | 定义于 |
|-----|-----|-----|
| [__construct()](#) | 构造函数 | XSTokenizerScws |
| [addDict()](#) | 添加分词词典, 支持 TXT/XDB 格式 | XSTokenizerScws |
| [getResult()](#) | 获取分词结果 | XSTokenizerScws |
| [getTokens()](#) | XSTokenizer 接口 | XSTokenizerScws |
| [getTops()](#) | 获取重要词统计结果 | XSTokenizerScws |
| [getVersion()](#) | 获取 scws 版本号 | XSTokenizerScws |
| [hasWord()](#) | 判断是否包含指定词性的词 | XSTokenizerScws |
| [setCharset()](#) | 设置字符集 | XSTokenizerScws |
| [setDict()](#) | 设置分词词典, 支持 TXT/XDB 格式 | XSTokenizerScws |
| [setDuality()](#) | 设置散字二元组合 | XSTokenizerScws |
| [setIgnore()](#) | 设置忽略标点符号 | XSTokenizerScws |
| [setMulti()](#) | 设置复合分词选项 | XSTokenizerScws |
### 方法明细
__construct()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public void <b>__construct</b>(string $arg=NULL)</div></td></tr><tr><td class="paramNameCol">$arg</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">复合等级参数,默认不指定</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L188](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L188) (**[显示](#)**)
`public function __construct($arg = null)
{
if (self::$_server === null) {
$xs = XS::getLastXS();
if ($xs === null) {
throw new XSException('An XS instance should be created before using ' . __CLASS__);
}
self::$_server = $xs->getScwsServer();
self::$_server->setTimeout(0);
self::$_charset = $xs->getDefaultCharset();
// constants
if (!defined('SCWS_MULTI_NONE')) {
define('SCWS_MULTI_NONE', 0);
define('SCWS_MULTI_SHORT', 1);
define('SCWS_MULTI_DUALITY', 2);
define('SCWS_MULTI_ZMAIN', 4);
define('SCWS_MULTI_ZALL', 8);
}
if (!defined('SCWS_XDICT_XDB')) {
define('SCWS_XDICT_XDB', 1);
define('SCWS_XDICT_MEM', 2);
define('SCWS_XDICT_TXT', 4);
}
}
if ($arg !== null && $arg !== '') {
$this->setMulti($arg);
}
}`
构造函数初始化用于分词的搜索服务端
addDict()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>addDict</b>(string $fpath, int $mode=NULL)</div></td></tr><tr><td class="paramNameCol">$fpath</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">服务端的词典路径</td></tr><tr><td class="paramNameCol">$mode</td> <td class="paramTypeCol">int</td> <td class="paramDescCol">词典类型, 常量: SCWS_XDICT_XDB|SCWS_XDICT_TXT|SCWS_XDICT_MEM</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回对象本身以支持串接操作</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L299](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L299) (**[显示](#)**)
`public function addDict($fpath, $mode = null)
{
if (!is_int($mode)) {
$mode = stripos($fpath, '.txt') !== false ? SCWS_XDICT_TXT : SCWS_XDICT_XDB;
}
if (!isset($this->_setting['add_dict'])) {
$this->_setting['add_dict'] = array();
}
$this->_setting['add_dict'][] = new XSCommand(CMD_SEARCH_SCWS_SET, CMD_SCWS_ADD_DICT, $mode, $fpath);
return $this;
}`
添加分词词典, 支持 TXT/XDB 格式
getResult()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public array <b>getResult</b>(string $text)</div></td></tr><tr><td class="paramNameCol">$text</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">待分词的文本</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">array</td> <td class="paramDescCol">返回词汇数组, 每个词汇是包含 [off:词在文本中的位置,attr:词性,word:词]</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L339](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L339) (**[显示](#)**)
`public function getResult($text)
{
$words = array();
$text = $this->applySetting($text);
$cmd = new XSCommand(CMD_SEARCH_SCWS_GET, CMD_SCWS_GET_RESULT, 0, $text);
$res = self::$_server->execCommand($cmd, CMD_OK_SCWS_RESULT);
while ($res->buf !== '') {
$tmp = unpack('Ioff/a4attr/a*word', $res->buf);
$tmp['word'] = XS::convert($tmp['word'], self::$_charset, 'UTF-8');
$words[] = $tmp;
$res = self::$_server->getRespond();
}
return $words;
}`
获取分词结果
getTokens()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public void <b>getTokens</b>($value, $doc=NULL)</div></td></tr><tr><td class="paramNameCol">$value</td> <td class="paramTypeCol"></td> <td class="paramDescCol"></td></tr><tr><td class="paramNameCol">$doc</td> <td class="paramTypeCol"></td> <td class="paramDescCol"></td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L220](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L220) (**[显示](#)**)
`public function getTokens($value, XSDocument $doc = null)
{
$tokens = array();
$this->setIgnore(true);
// save charset, force to use UTF-8
$_charset = self::$_charset;
self::$_charset = 'UTF-8';
$words = $this->getResult($value);
foreach ($words as $word) {
$tokens[] = $word['word'];
}
// restore charset
self::$_charset = $_charset;
return $tokens;
}`
XSTokenizer 接口
getTops()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public array <b>getTops</b>(string $text, string $limit=10, $xattr='')</div></td></tr><tr><td class="paramNameCol">$text</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">待分词的文本</td></tr><tr><td class="paramNameCol">$limit</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">在返回结果的词性过滤, 多个词性之间用逗号分隔, 以~开头取反 如: 设为 n,v 表示只返回名词和动词; 设为 ~n,v 则表示返回名词和动词以外的其它词</td></tr><tr><td class="paramNameCol">$xattr</td> <td class="paramTypeCol"></td> <td class="paramDescCol"></td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">array</td> <td class="paramDescCol">返回词汇数组, 每个词汇是包含 [times:次数,attr:词性,word:词]</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L361](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L361) (**[显示](#)**)
`public function getTops($text, $limit = 10, $xattr = '')
{
$words = array();
$text = $this->applySetting($text);
$cmd = new XSCommand(CMD_SEARCH_SCWS_GET, CMD_SCWS_GET_TOPS, $limit, $text, $xattr);
$res = self::$_server->execCommand($cmd, CMD_OK_SCWS_TOPS);
while ($res->buf !== '') {
$tmp = unpack('Itimes/a4attr/a*word', $res->buf);
$tmp['word'] = XS::convert($tmp['word'], self::$_charset, 'UTF-8');
$words[] = $tmp;
$res = self::$_server->getRespond();
}
return $words;
}`
获取重要词统计结果
getVersion()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public string <b>getVersion</b>()</div></td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">版本号</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L327](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L327) (**[显示](#)**)
`public function getVersion()
{
$cmd = new XSCommand(CMD_SEARCH_SCWS_GET, CMD_SCWS_GET_VERSION);
$res = self::$_server->execCommand($cmd, CMD_OK_INFO);
return $res->buf;
}`
获取 scws 版本号
hasWord()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public bool <b>hasWord</b>(string $text, string $xattr)</div></td></tr><tr><td class="paramNameCol">$text</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">要判断的文本</td></tr><tr><td class="paramNameCol">$xattr</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">要判断的词性, 参见 <a href="XSTokenizerScws.html#getTops">getTops</a> 的说明</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">bool</td> <td class="paramDescCol">文本中是否包含指定词性的词汇</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L382](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L382) (**[显示](#)**)
`public function hasWord($text, $xattr)
{
$text = $this->applySetting($text);
$cmd = new XSCommand(CMD_SEARCH_SCWS_GET, CMD_SCWS_HAS_WORD, 0, $text, $xattr);
$res = self::$_server->execCommand($cmd, CMD_OK_INFO);
return $res->buf === 'OK';
}`
判断是否包含指定词性的词
setCharset()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setCharset</b>(string $charset)</div></td></tr><tr><td class="paramNameCol">$charset</td> <td class="paramTypeCol">string</td> <td class="paramDescCol"></td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回对象本身以支持串接操作</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L242](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L242) (**[显示](#)**)
`public function setCharset($charset)
{
self::$_charset = strtoupper($charset);
if (self::$_charset == 'UTF8') {
self::$_charset = 'UTF-8';
}
return $this;
}`
设置字符集默认字符集是 UTF-8, 这是指 [getResult](#) 系列函数的 $text 参数的字符集
setDict()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setDict</b>(string $fpath, int $mode=NULL)</div></td></tr><tr><td class="paramNameCol">$fpath</td> <td class="paramTypeCol">string</td> <td class="paramDescCol">服务端的词典路径</td></tr><tr><td class="paramNameCol">$mode</td> <td class="paramTypeCol">int</td> <td class="paramDescCol">词典类型, 常量: SCWS_XDICT_XDB|SCWS_XDICT_TXT|SCWS_XDICT_MEM</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回对象本身以支持串接操作</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L283](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L283) (**[显示](#)**)
`public function setDict($fpath, $mode = null)
{
if (!is_int($mode)) {
$mode = stripos($fpath, '.txt') !== false ? SCWS_XDICT_TXT : SCWS_XDICT_XDB;
}
$this->_setting['set_dict'] = new XSCommand(CMD_SEARCH_SCWS_SET, CMD_SCWS_SET_DICT, $mode, $fpath);
unset($this->_setting['add_dict']);
return $this;
}`
设置分词词典, 支持 TXT/XDB 格式
setDuality()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setDuality</b>(bool $yes=true)</div></td></tr><tr><td class="paramNameCol">$yes</td> <td class="paramTypeCol">bool</td> <td class="paramDescCol">是否开启散字自动二分组合功能</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回对象本身以支持串接操作</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L316](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L316) (**[显示](#)**)
`public function setDuality($yes = true)
{
$this->_setting['duality'] = new XSCommand(CMD_SEARCH_SCWS_SET, CMD_SCWS_SET_DUALITY, $yes === false
? 0 : 1);
return $this;
}`
设置散字二元组合
setIgnore()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setIgnore</b>(bool $yes=true)</div></td></tr><tr><td class="paramNameCol">$yes</td> <td class="paramTypeCol">bool</td> <td class="paramDescCol">是否忽略</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回对象本身以支持串接操作</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L256](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L256) (**[显示](#)**)
`public function setIgnore($yes = true)
{
$this->_setting['ignore'] = new XSCommand(CMD_SEARCH_SCWS_SET, CMD_SCWS_SET_IGNORE, $yes === false
? 0 : 1);
return $this;
}`
设置忽略标点符号
setMulti()方法
<table class="summaryTable"><tr><td colspan="3"><div class="signature2">public XSTokenizerScws <b>setMulti</b>(int $mode=3)</div></td></tr><tr><td class="paramNameCol">$mode</td> <td class="paramTypeCol">int</td> <td class="paramDescCol">复合选项, 值范围 0~15 默认为值为 3, 可使用常量组合: SCWS_MULTI_SHORT|SCWS_MULTI_DUALITY|SCWS_MULTI_ZMAIN|SCWS_MULTI_ZALL</td></tr><tr><td class="paramNameCol">{return}</td> <td class="paramTypeCol">XSTokenizerScws</td> <td class="paramDescCol">返回对象本身以支持串接操作</td></tr></table>
**源码:**[sdk/php/lib/XSTokenizer.class.php#L270](https://github.com/hightman/xunsearch/blob/master/sdk/php/lib/XSTokenizer.class.php#L270) (**[显示](#)**)
`public function setMulti($mode = 3)
{
$mode = intval($mode) & self::MULTI_MASK;
$this->_setting['multi'] = new XSCommand(CMD_SEARCH_SCWS_SET, CMD_SCWS_SET_MULTI, $mode);
return $this;
}`
设置复合分词选项
Copyright © 2008-2011 by [杭州云圣网络科技有限公司](http://www.xunsearch.com)
All Rights Reserved.
- 权威指南
- 新手上路
- 最新主要变动
- 概述
- 关于 Xunsearch PHP-SDK
- 安装、升级
- 体验 demo 项目
- 开发规范
- 开发流程
- 了解基础对象
- 基础对象概述
- XS 项目
- XSException 异常
- XSDocument 文档
- XSIndex 索引管理
- XSSearch 搜索
- XSTokenizer 分词接口
- 编写项目配置文件
- 项目配置详解
- 自定义分词器
- 编写第一个配置文件
- 管理索引
- 索引概述
- 添加文档
- 更新、修改文档
- 删除文档
- 清空索引
- 平滑重建索引
- 使用索引缓冲区
- 自定义SCWS词库
- 使用搜索
- 搜索概述
- 构建搜索语句
- 获取搜索匹配结果
- 获取搜索匹配数量
- 获取热门搜索词
- 获取相关搜索词
- 搜索建议和纠错
- 按字段值分面搜索
- 使用辅助工具
- RequiredCheck 运行检测
- Indexer 索引管理器
- Quest 搜索测试工具
- SearchSkel 生成搜索代码
- IniWizzard 配置文件向导
- Logger 搜索日志管理
- 专题
- 同义词搜索功能
- 在SDK中使用SCWS分词
- API 指南
- XS
- XS
- XSCommand
- XSComponent
- XSDocument
- XSErrorException
- XSException
- XSFieldMeta
- XSFieldScheme
- XSIndex
- XSSearch
- XSServer
- XS.tokenizer
- XSTokenizer
- XSTokenizerFull
- XSTokenizerNone
- XSTokenizerScws
- XSTokenizerSplit
- XSTokenizerXlen
- XSTokenizerXstep
- XS.util
- XSCsvDataSource
- XSDataFilter
- XSDatabaseDataSource
- XSDebugFilter
- XSJsonDataSource
- XSUtil
- XS.util.db
- XSDatabase
- XSDatabaseMySQL
- XSDatabaseMySQLI
- XSDatabasePDO
- XSDatabasePDO_MySQL
- XSDatabasePDO_PgSQL
- XSDatabasePDO_SQLite
- XSDatabasePgSQL
- XSDatabaseSQLite
- XSDatabaseSQLite3
- XS.utilf
- XSDataSource
- 其它文档
- 关于 xunsearch
- 特色和优势
- Xunsearch 架构简图
- 下载 Xunsearch
- 商业服务与支持
- XunSearch 授权许可证