Zend_Search_Lucene更新Index的方法

  在使用Zend Framework的Search_Lucene模块时,假设你有个文档已经加入到索引里面了,而这个文档后来被删除或者被修改了,需要及时更新索引才能保证数据的时效性,以前比较傻的办法就是全部重新创建一次索引,这个开销很大,也不适合大型应用,典型的场景就是论坛的帖子,如果帖子被删除或者修改了,就需要即使更新索引。

  Zend_Search_Lucene官方文档关于删除和更新一个索引的说明实在太少,我自己琢磨了个简单的办法来实现,大家可以尝试一下,也许有更好的办法,知道的朋友可以告知我。

  下面是官方文档的说明:
[coolcode lang=”php”]
find(‘path:’ . $removePath);
foreach ($hits as $hit) {
$index->delete($hit->id);
}
?>
[/coolcode]
  这里头困惑的是$removePath这个东西,我是没有明白咋回事,下面说说我用的办法。

  首先,假设我们的文档text都有个唯一的tid字段,那么我们就根据这个tid来作为每次删除和更新的依据,由于Lucene创建索引的时候,(我自己测试的)用数字类型无法成为keyword并且作为索引的字段,于是我们需要转换为字符串,这里我通过md5的方式把tid变成唯一的字符串,通过这个字符串来找到需要删除和更新的索引内容。
[coolcode lang=”php”]
//创建索引的时候,部分代码:
$index = Zend_Search_Lucene::create($this->lucne_index); //我类内部表示index路径的变量
$doc = new Zend_Search_Lucene_Document();
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8()); //根据你自己需要配置字符集

$doc->addField(Zend_Search_Lucene_Field::UnStored(‘key’, md5($tid)));
$doc->addField(Zend_Search_Lucene_Field::Text(‘title’, $title));
$doc->addField(Zend_Search_Lucene_Field::UnStored(‘content’, $content));

$index->addDocument($doc);
$index->commit();

//删除和更新索引的部分代码:
//先删除之
$key = md5($tid);
$index = Zend_Search_Lucene::open($this->lucne_index);
$query = Zend_Search_Lucene_Search_QueryParser::parse(“key:$key”, ‘utf-8’);
$hits = $index->find($query);
foreach ($hits AS $hit) {
$index->delete($hit->id);
}
//重新索引更新后的数据,代码和创建一样
$doc->addField(Zend_Search_Lucene_Field::UnStored(‘key’, md5($tid)));
$doc->addField(Zend_Search_Lucene_Field::Text(‘title’, $title));
$doc->addField(Zend_Search_Lucene_Field::UnStored(‘content’, $content));

$index->addDocument($doc);
$index->commit();
[/coolcode]

  其实思路就是先找到要更新的内容,删之,然后把新的数据重新添加到索引。

  抛砖引玉,欢迎交流。

11 thoughts on “Zend_Search_Lucene更新Index的方法”

  1. 目前,zend_search_lucene 还没有提供方便的分页方法,仅提供 Zend_Search_Lucene::getResultSetLimit() 和 Zend_Search_Lucene::setResultSetLimit()接口实现限制查询返回的结果数量,并不能支持起点游标,所以分页只能在取得结果集后自己去处理,这估计对性能有所影响,可以设置短时间内过期的内存缓存来存放结果数据,降低系统开销。

  2. Hi Michael,

    Thanks for the quick reply 🙂
    从官方文档上看,的确不支持游标。但愿未来可以支持。

    另 ,从官方文档上看,Zend_Search_Lucene 对索引文件有2G的限制。 这个限制在实际使用当中有什么办法可以避免? 大数据量下有啥 best practice 吗?

    Thanks again ! Best Regards,
    Joel

  3. 问题:
    1.存放索引的index目录可不可以支持远程目录?
    2.若有一数据列表,在不查询的情况下直接将其数据全部列出来,能不能用lucene实现,如何实现?

  4. [Comment ID #30672 Will Be Quoted Here]

    1. 远程目录可以用,但是性能肯定不好,而且共享的时候会有锁的问题
    2. 如果知道id的话应该可以,具体的我也需要看看文档和源代码

    目前个人感觉使用这个东西的场景还是有限,不太时候大规模应用,加上一级缓存应该能好一些。

  5. 吼吼,又来请教问题了,zend_search_lucene提供了一个highlightMatches函数,用来高亮查询的关键字,不知道Michael用过没,但是我用过后它把整个一段都高亮了,另外查询出来后出来一数据列表,点开数据列表后会有详细信息,那点开后的页面又如何高亮之前的关键字呢?

  6. I’m impressed, I must say. Seldom do I encounter a blog that’s equally educative and amusing, and let me tell you,
    you’ve hit the nail on the head. The problem is something that not enough folks are speaking intelligently about.
    I’m very happy I found this during my search for something relating to
    this.

  7. Wow that was odd. I just wrote an really long comment but after I clicked
    submit my comment didn’t show up. Grrrr… well I’m not writing
    all that over again. Regardless, just wanted to say wonderful blog!

  8. I’m impressed, I have to admit. Rarely do I encounter a blog that’s both equally educative and amusing, and let me tell you, you have
    hit the nail on the head. The issue is something that too few men and women are speaking
    intelligently about. I’m very happy that I came across this during my hunt for something concerning this.

  9. Hello just wanted to give you a quick heads up.

    The words in your content seem to be running off the screen in Internet explorer.

    I’m not sure if this is a format issue or something to do with internet browser compatibility but I thought I’d post to let you know.

    The layout look great though! Hope you get the issue solved soon. Thanks

  10. Write more, thats all I have to say. Literally,
    it seems as though you relied on the video to make your point.

    You definitely know what youre talking about, why throw away your intelligence on just posting videos to
    your site when you could be giving us something enlightening to read?

  11. Appreciating the dedication you put into your blog and detailed information you present.

    It’s awesome to come across a blog every once in a while that isn’t the same
    outdated rehashed material. Great read! I’ve bookmarked your site and I’m adding your RSS feeds to my
    Google account.

  12. Pingback: 군산출장안마
  13. What you posted made a lot of sense. However, think about this, what
    if you added a little content? I ain’t suggesting your content is not solid,
    however suppose you added something to possibly get folk’s attention? I mean Zend_Search_Lucene更新Index的方法 | 李俊麟的平凡生活 is
    a little boring. You ought to look at Yahoo’s home page and note how they write news headlines to get viewers interested.
    You might add a video or a pic or two to get readers interested about everything’ve got to say.
    In my opinion, it would make your website a little livelier.

  14. Appreciating the dedication you put into your website and in depth information you present.
    It’s good to come across a blog every once in a while that isn’t
    the same unwanted rehashed material. Fantastic read! I’ve saved your site and I’m including your RSS feeds to my
    Google account.

  15. Pingback: Global Radio
  16. Woah! I’m really enjoying the template/theme of this blog.
    It’s simple, yet effective. A lot of times it’s challenging to get that “perfect balance” between usability and appearance.
    I must say you’ve done a very good job with this. Also, the blog loads super quick for me on Firefox.
    Superb Blog!

Leave a Reply

Your email address will not be published. Required fields are marked *

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image