Zend_Search_Lucene更新Index的方法

  在使用Zend Framework的Search_Lucene模块时,假设你有个文档已经加入到索引里面了,而这个文档后来被删除或者被修改了,需要及时更新索引才能保证数据的时效性,以前比较傻的办法就是全部重新创建一次索引,这个开销很大,也不适合大型应用,典型的场景就是论坛的帖子,如果帖子被删除或者修改了,就需要即使更新索引。

  Zend_Search_Lucene官方文档关于删除和更新一个索引的说明实在太少,我自己琢磨了个简单的办法来实现,大家可以尝试一下,也许有更好的办法,知道的朋友可以告知我。

  下面是官方文档的说明:

<?php
$removePath = ...;
$hits = $index->find('path:' . $removePath);
foreach ($hits as $hit) {
    
$index->delete($hit->id);
}
?>

  这里头困惑的是$removePath这个东西,我是没有明白咋回事,下面说说我用的办法。

  首先,假设我们的文档text都有个唯一的tid字段,那么我们就根据这个tid来作为每次删除和更新的依据,由于Lucene创建索引的时候,(我自己测试的)用数字类型无法成为keyword并且作为索引的字段,于是我们需要转换为字符串,这里我通过md5的方式把tid变成唯一的字符串,通过这个字符串来找到需要删除和更新的索引内容。

//创建索引的时候,部分代码:
$index = Zend_Search_Lucene::create($this->lucne_index); //我类内部表示index路径的变量
$doc = new Zend_Search_Lucene_Document();
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    
new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8()); //根据你自己需要配置字符集
 
$doc->addField(Zend_Search_Lucene_Field::UnStored('key', md5($tid)));
$doc->addField(Zend_Search_Lucene_Field::Text('title', $title));
$doc->addField(Zend_Search_Lucene_Field::UnStored('content', $content));
 
$index->addDocument($doc);
$index->commit();
 
//删除和更新索引的部分代码:
//先删除之
$key = md5($tid);
$index = Zend_Search_Lucene::open($this->lucne_index);
$query = Zend_Search_Lucene_Search_QueryParser::parse("key:$key", 'utf-8');
$hits = $index->find($query);
foreach ($hits AS $hit) {
    
$index->delete($hit->id);
}
//重新索引更新后的数据,代码和创建一样
$doc->addField(Zend_Search_Lucene_Field::UnStored('key', md5($tid)));
$doc->addField(Zend_Search_Lucene_Field::Text('title', $title));
$doc->addField(Zend_Search_Lucene_Field::UnStored('content', $content));
 
$index->addDocument($doc);
$index->commit();

  其实思路就是先找到要更新的内容,删之,然后把新的数据重新添加到索引。

  抛砖引玉,欢迎交流。

Category: PHP / Zend
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
11 Responses
  1. Joel says:

    Hi, Michael

    我有个疑问。用 Zend_Search_Lucene 搜索之后的结果,如何进行分页呢? 能不能抛砖引玉一下 🙂

    Best Regards,
    Joel

  2. Michael says:

    目前,zend_search_lucene 还没有提供方便的分页方法,仅提供 Zend_Search_Lucene::getResultSetLimit() 和 Zend_Search_Lucene::setResultSetLimit()接口实现限制查询返回的结果数量,并不能支持起点游标,所以分页只能在取得结果集后自己去处理,这估计对性能有所影响,可以设置短时间内过期的内存缓存来存放结果数据,降低系统开销。

  3. Joel says:

    Hi Michael,

    Thanks for the quick reply 🙂
    从官方文档上看,的确不支持游标。但愿未来可以支持。

    另 ,从官方文档上看,Zend_Search_Lucene 对索引文件有2G的限制。 这个限制在实际使用当中有什么办法可以避免? 大数据量下有啥 best practice 吗?

    Thanks again ! Best Regards,
    Joel

  4. Michael says:

    这个限制不是lucene的问题,是计算机32位和64位的本身限制。 另外,实际的lucene应用案例,曾经在国外见到这篇文章,希望有点帮助: http://www.phpriot.com/d/articles/php/search/zend-search-lucene/index.html

  5. 3wdotec says:

    不错…

  6. 无纺布袋 says:

    学习了。。。。

  7. Ryan says:

    问题:
    1.存放索引的index目录可不可以支持远程目录?
    2.若有一数据列表,在不查询的情况下直接将其数据全部列出来,能不能用lucene实现,如何实现?

  8. Michael says:

    [Comment ID #30672 Will Be Quoted Here]

    1. 远程目录可以用,但是性能肯定不好,而且共享的时候会有锁的问题
    2. 如果知道id的话应该可以,具体的我也需要看看文档和源代码

    目前个人感觉使用这个东西的场景还是有限,不太时候大规模应用,加上一级缓存应该能好一些。

  9. Ryan says:

    吼吼,又来请教问题了,zend_search_lucene提供了一个highlightMatches函数,用来高亮查询的关键字,不知道Michael用过没,但是我用过后它把整个一段都高亮了,另外查询出来后出来一数据列表,点开数据列表后会有详细信息,那点开后的页面又如何高亮之前的关键字呢?

  10. dfddfd says:

    ZF新版本支持分页了吗?

    更新时,要把原来的索引重新创建一次的? 数据多时不是太可怕了。

  11. Michael says:

    [Comment ID #31780 Will Be Quoted Here]

    提醒我了,我好久没有去zf官网看看changelog了,赶快去

  12. I couldn’t resist commenting. Well written!

  13. Right now it appears like Expression Engine
    is the top blogging platform out there right now. (from what I’ve read) Is that
    what you’re using on your blog?

  14. g says:

    I’m impressed, I must say. Seldom do I encounter a blog that’s equally educative and amusing, and let me tell you,
    you’ve hit the nail on the head. The problem is something that not enough folks are speaking intelligently about.
    I’m very happy I found this during my search for something relating to
    this.

  15. g says:

    Wow that was odd. I just wrote an really long comment but after I clicked
    submit my comment didn’t show up. Grrrr… well I’m not writing
    all that over again. Regardless, just wanted to say wonderful blog!

  16. I like it when folks come together and share ideas.
    Great website, continue the good work!

  17. I think this is one of the most important information for me.

    And i’m glad reading your article. But want to
    remark on some general things, The web site style is ideal, the articles is really great : D.

    Good job, cheers

  18. Do you have a spam issue on this website; I also am a blogger,
    and I was curious about your situation; many of us have developed some nice methods and we are looking to
    exchange methods with other folks, please shoot me an email if interested.

  19. I’m impressed, I have to admit. Rarely do I encounter a blog that’s both equally educative and amusing, and let me tell you, you have
    hit the nail on the head. The issue is something that too few men and women are speaking
    intelligently about. I’m very happy that I came across this during my hunt for something concerning this.

  20. Hello just wanted to give you a quick heads up.

    The words in your content seem to be running off the screen in Internet explorer.

    I’m not sure if this is a format issue or something to do with internet browser compatibility but I thought I’d post to let you know.

    The layout look great though! Hope you get the issue solved soon. Thanks

  21. We stumbled over here coming from a different
    web address and thought I should check things out. I like what I see
    so i am just following you. Look forward to exploring
    your web page again.

  22. Hurrah, that’s what I was searching for, what a data! existing
    here at this webpage, thanks admin of this site.

  23. Write more, thats all I have to say. Literally,
    it seems as though you relied on the video to make your point.

    You definitely know what youre talking about, why throw away your intelligence on just posting videos to
    your site when you could be giving us something enlightening to read?

  24. I used to be suggested this web site by my cousin. I
    am no longer certain whether this publish is written via him as
    nobody else know such precise approximately my trouble. You are wonderful!
    Thanks!

  25. Terrific post however , I was wondering if
    you could write a litte more on this subject?
    I’d be very thankful if you could elaborate a little bit further.
    Thank you!

  26. Very good info. Lucky me I recently found your website by chance (stumbleupon).
    I have saved as a favorite for later!

  27. I have read some good stuff here. Definitely price bookmarking for revisiting.
    I surprise how so much effort you set to create such a fantastic informative web site.

  28. quest bars says:

    Appreciating the dedication you put into your blog and detailed information you present.

    It’s awesome to come across a blog every once in a while that isn’t the same
    outdated rehashed material. Great read! I’ve bookmarked your site and I’m adding your RSS feeds to my
    Google account.

  29. Very nice post. I simply stumbled upon your weblog and wanted to say
    that I have really enjoyed browsing your weblog posts. In any case I will be subscribing in your feed and
    I am hoping you write once more soon!

  30. download says:

    Very great post. I simply stumbled upon your weblog
    and wished to mention that I’ve really enjoyed browsing your weblog posts.
    In any case I’ll be subscribing in your feed and I’m hoping you write again soon!

  31. Greetings from Ohio! I’m bored at work so I decided to browse your site on my iphone during lunch break.

    I enjoy the knowledge you present here and can’t wait to take a look when I get home.
    I’m surprised at how fast your blog loaded on my cell phone ..
    I’m not even using WIFI, just 3G .. Anyways, superb site!

  32. What you posted made a lot of sense. However, think about this, what
    if you added a little content? I ain’t suggesting your content is not solid,
    however suppose you added something to possibly get folk’s attention? I mean Zend_Search_Lucene更新Index的方法 | 李俊麟的平凡生活 is
    a little boring. You ought to look at Yahoo’s home page and note how they write news headlines to get viewers interested.
    You might add a video or a pic or two to get readers interested about everything’ve got to say.
    In my opinion, it would make your website a little livelier.

  33. I know this web page gives quality based articles or reviews
    and additional material, is there any other web site which provides
    these stuff in quality?

  34. KeithVet says:

    Incredible a good deal of wonderful knowledge!
    canadian prescription drugstore
    canadian pharmacy online
    canadian pharmacies-24h
    canadian pharmacy
    canadian cialis – https://www.canadianpharmacyu.com/

    canadapharmacy Jen 48e12f9

  35. Hello colleagues, how is all, and what you would like to say regarding
    this paragraph, in my view its really remarkable in favor of me.

  36. Hi! I could have sworn I’ve visited this website before
    but after browsing through many of the articles I realized it’s new to me.
    Anyhow, I’m definitely delighted I discovered it and I’ll be book-marking it and checking
    back regularly!

  37. Excellent write-up. I definitely love this website.
    Stick with it!

  38. online casino real money
    online casino real money
    best online casino real money
    online casino real money
    best online casino real money

  39. You can certainly see your expertise within the article you write.
    The world hopes for even more passionate writers like you
    who are not afraid to mention how they believe.

    At all times go after your heart.

  40. Appreciating the dedication you put into your website and in depth information you present.
    It’s good to come across a blog every once in a while that isn’t
    the same unwanted rehashed material. Fantastic read! I’ve saved your site and I’m including your RSS feeds to my
    Google account.

  41. I got this website from my friend who informed me about this web page and at
    the moment this time I am visiting this site and reading very informative articles at this place.

  42. LemueliNats says:

    You said it very well.!
    viagra impotence
    buy generic viagra
    suppliers of viagra in uk
    viagra label

  43. Thank you for another great article. The place else may
    just anybody get that type of info in such a perfect manner of writing?
    I’ve a presentation subsequent week, and I’m at the look for such information.

  44. Good information. Lucky me I ran across your blog by chance (stumbleupon).

    I’ve bookmarked it for later!

  45. tinyurl.com says:

    It is truly a nice and useful piece of information. I’m glad that you shared this useful
    info with us. Please keep us informed like this. Thank you for sharing.

  46. Pingback: Global Radio
  47. ps4 games says:

    Thank you for sharing your info. I truly appreciate your efforts and I am waiting for your next post thanks once again.

  48. ps4 games says:

    I think this is among the most important info for me.
    And i’m glad reading your article. But wanna remark on few general things, The website
    style is ideal, the articles is really excellent : D. Good job, cheers

Leave a Reply

Your email address will not be published. Required fields are marked *

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Anti-spam image