Zend_Search_Lucene更新Index的方法

Jun 2007

　　在使用Zend Framework的Search_Lucene模块时，假设你有个文档已经加入到索引里面了，而这个文档后来被删除或者被修改了，需要及时更新索引才能保证数据的时效性，以前比较傻的办法就是全部重新创建一次索引，这个开销很大，也不适合大型应用，典型的场景就是论坛的帖子，如果帖子被删除或者修改了，就需要即使更新索引。

　　Zend_Search_Lucene官方文档关于删除和更新一个索引的说明实在太少，我自己琢磨了个简单的办法来实现，大家可以尝试一下，也许有更好的办法，知道的朋友可以告知我。

　　下面是官方文档的说明：
[coolcode lang=”php”]
find(‘path:’ . $removePath);
foreach ($hits as $hit) {
$index->delete($hit->id);
}
?>
[/coolcode]
　　这里头困惑的是$removePath这个东西，我是没有明白咋回事，下面说说我用的办法。

　　首先，假设我们的文档text都有个唯一的tid字段，那么我们就根据这个tid来作为每次删除和更新的依据，由于Lucene创建索引的时候，（我自己测试的）用数字类型无法成为keyword并且作为索引的字段，于是我们需要转换为字符串，这里我通过md5的方式把tid变成唯一的字符串，通过这个字符串来找到需要删除和更新的索引内容。
[coolcode lang=”php”]
//创建索引的时候，部分代码：
$index = Zend_Search_Lucene::create($this->lucne_index); //我类内部表示index路径的变量
$doc = new Zend_Search_Lucene_Document();
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8()); //根据你自己需要配置字符集

$doc->addField(Zend_Search_Lucene_Field::UnStored(‘key’, md5($tid)));
$doc->addField(Zend_Search_Lucene_Field::Text(‘title’, $title));
$doc->addField(Zend_Search_Lucene_Field::UnStored(‘content’, $content));

$index->addDocument($doc);
$index->commit();

//删除和更新索引的部分代码：
//先删除之
$key = md5($tid);
$index = Zend_Search_Lucene::open($this->lucne_index);
$query = Zend_Search_Lucene_Search_QueryParser::parse(“key:$key”, ‘utf-8’);
$hits = $index->find($query);
foreach ($hits AS $hit) {
$index->delete($hit->id);
}
//重新索引更新后的数据，代码和创建一样
$doc->addField(Zend_Search_Lucene_Field::UnStored(‘key’, md5($tid)));
$doc->addField(Zend_Search_Lucene_Field::Text(‘title’, $title));
$doc->addField(Zend_Search_Lucene_Field::UnStored(‘content’, $content));

$index->addDocument($doc);
$index->commit();
[/coolcode]

　　其实思路就是先找到要更新的内容，删之，然后把新的数据重新添加到索引。

　　抛砖引玉，欢迎交流。

Category: PHP / Zend

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

11 Responses

Joel says:

October 12, 2007 at 3:01 pm Joel(Quote)

Hi, Michael

我有个疑问。用 Zend_Search_Lucene 搜索之后的结果，如何进行分页呢？能不能抛砖引玉一下 🙂

Best Regards,
Joel
Michael says:

October 12, 2007 at 4:14 pm Michael(Quote)

目前，zend_search_lucene 还没有提供方便的分页方法，仅提供 Zend_Search_Lucene::getResultSetLimit() 和 Zend_Search_Lucene::setResultSetLimit()接口实现限制查询返回的结果数量，并不能支持起点游标，所以分页只能在取得结果集后自己去处理，这估计对性能有所影响，可以设置短时间内过期的内存缓存来存放结果数据，降低系统开销。
Joel says:

October 13, 2007 at 12:50 am Joel(Quote)

Hi Michael,

Thanks for the quick reply 🙂
从官方文档上看，的确不支持游标。但愿未来可以支持。

另，从官方文档上看，Zend_Search_Lucene 对索引文件有2G的限制。这个限制在实际使用当中有什么办法可以避免？大数据量下有啥 best practice 吗？

Thanks again ! Best Regards,
Joel
Michael says:

October 13, 2007 at 8:43 pm Michael(Quote)

这个限制不是lucene的问题，是计算机32位和64位的本身限制。另外，实际的lucene应用案例，曾经在国外见到这篇文章，希望有点帮助： http://www.phpriot.com/d/articles/php/search/zend-search-lucene/index.html
3wdotec says:

June 27, 2008 at 2:47 pm 3wdotec(Quote)

不错…
无纺布袋 says:

July 10, 2008 at 9:24 pm 无纺布袋(Quote)

学习了。。。。
Ryan says:

January 4, 2009 at 4:45 pm Ryan(Quote)

问题：
1.存放索引的index目录可不可以支持远程目录？
2.若有一数据列表，在不查询的情况下直接将其数据全部列出来，能不能用lucene实现，如何实现？
Michael says:

January 5, 2009 at 12:36 am Michael(Quote)

[Comment ID #30672 Will Be Quoted Here]

1. 远程目录可以用，但是性能肯定不好，而且共享的时候会有锁的问题
2. 如果知道id的话应该可以，具体的我也需要看看文档和源代码

目前个人感觉使用这个东西的场景还是有限，不太时候大规模应用，加上一级缓存应该能好一些。
Ryan says:

January 5, 2009 at 4:45 pm Ryan(Quote)

吼吼，又来请教问题了，zend_search_lucene提供了一个highlightMatches函数，用来高亮查询的关键字，不知道Michael用过没，但是我用过后它把整个一段都高亮了，另外查询出来后出来一数据列表，点开数据列表后会有详细信息，那点开后的页面又如何高亮之前的关键字呢？
dfddfd says:

October 13, 2009 at 4:44 pm dfddfd(Quote)

ZF新版本支持分页了吗？

更新时，要把原来的索引重新创建一次的？数据多时不是太可怕了。
Michael says:

October 13, 2009 at 6:08 pm Michael(Quote)

[Comment ID #31780 Will Be Quoted Here]

提醒我了，我好久没有去zf官网看看changelog了，赶快去

M	T	W	T	F	S	S
« Jul
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Zend_Search_Lucene更新Index的方法

Leave a Reply Cancel reply

最新留言评论

浏览次数排行

日历

友情链接

推荐链接

Meta

在线用户