First time here? You are looking at the most recent posts. You may also want to check out older archives or the tag cloud. Please leave a comment, ask a question and consider subscribing to the latest posts via RSS. Thank you for visiting! (hide this)

lucene_net_green_460

A few weeks ago I expressed my intention of introducing Lucene.net into Subtext, and that I would have written about the journey. In this post I’m going to write some hints on how to get started with Lucene.net.

Download the bits

Unfortunately Lucene.net is not officially releasing new versions since March 2007 when Lucene.net 2.0 was released. But since 2007 a lot of new features and bug fixes where introduced, so the best way to get the latest bits is to download the latest tag of the SVN repository. At the time of writing the latest tag is version 2.3.2. This is also the recommendation of the developers since “bureaucracy” is preventing the official release to happen.

And then, once you got the latest bits from the Subversion repository, you have to start the build process. The solution file available in the repository is still with VS2005 so either you use that version, or run the file through the update wizard and use VS2008.

The structure of the repository is quite deep, so it’s worth to have a few words on it before going on:

  • src: contains 3 subfolders with the source for the core library, the unit tests and a few demo project that are interesting to show the main usage scenario
  • contrib: this contains a few external libraries that enhance the feature of Lucene. There is a small utility class that helps highlighting the matching terms, there is another lib that helps building similarity queries, there is spellchecker, there are stemmers (a “thing” that normalizes the text to be indexed) and there is also a distributed search engine

I hope the developers of Lucene.net find a way to publish an official updated release because seeing a 2 years old release might frighten people and make them think that this project is not maintained any more.

Or alternatively you can skip to the end of this post and download the assembly I built to make life easier for you.

Now that you’ve got a binary version of Lucene.net the next step is learning how to do it. I’m going to write some more posts on this in the next weeks, but if you want to start reading directly from the original documents, in the next section you’ll find where to go.

Find the documentation

Documentation is probably the thing that is lacking the most in this project (and this is one of the reason why I decided to write this series of posts). But being a class-by-class port of the Java version, you can find a bit more information on the Lucene for Java website (but just a bit).

I would suggest starting from the overview available in the JavaDoc, which explains the main packages and a basic usage scenario. And then go for the MSDN-style documentation for Lucene.net (but be careful since it’s related to Lucene.net 2.1). And some more unstructured documents are available in the wiki.

Finally, if you want to get deeper into Lucene, a book from Manning is available: Lucene In Action. It’s a bit old and talks about Lucene 1.4. But most of the key concepts are still the same. There is also the second edition of the book, still in MEAP: Lucene in Action, Second Edition. But it talks about the forthcoming version 3.0, so a bit too much if you are interested in Lucene.net which is still at version 2.3.

That reminds me of how important a good documentation is for opensource projects: even if you built the best OSS library in the World, if it’s not well documented all your efforts are useless. But that’s probably a topic for another post.

Next

Now you know how to get Lucene.net and where to look to find more information. In a future post I’ll write about the main concepts of Lucene.net and later about how I’m implementing Lucene.net into Subtext.

If you trust me, and want to avoid all the hassle of getting the source from SVN, migrating the solution to VS2008 and build it, I did it for you, and you can download the main Lucene.net 2.3.2 library.

 
posted on Thursday, August 27, 2009 2:41 PM

Comments on this entry:

# re: How to get started with Lucene.net

Left by Martin at 8/28/2009 10:50 AM

You're right. I thought Lucene.Net is dead when I saw the year of the last release some time ago ;-)

# re: How to get started with Lucene.net

Left by Simone at 8/28/2009 12:38 PM

Yeah, it gives a wrong impression about the goodness of the project.

# re: How to get started with Lucene.net

Left by herbrandson at 8/28/2009 5:02 PM

I've just started using Lucene.Net for a project and, like you, have been a bit disappointed with the documentation. I can't wait to "compare notes". I'm especially interested in how you handle any threading issues around updating the index. Also, how/when/if you rebuild the entire index.

# re: How to get started with Lucene.net

Left by Simone at 8/28/2009 5:26 PM

@Eric: I'm still learning the library... let's say this series is a kind of public learning process... I've not faced this problems yet.
But I'll try to keep in mind these issues when implementing Lucene.net into Subtext.

# re: How to get started with Lucene.net

Left by Sabine at 9/21/2009 3:19 PM

I found version 2.4 in Lucene .Net incubator trunk.
What do you think about this version ? Is it stable enough ?
Thanks.

# re: How to get started with Lucene.net

Left by kelly at 10/14/2009 9:45 PM

"That reminds me of how important a good documentation is for opensource projects: even if you built the best OSS library in the World, if it’s not well documented all your efforts are useless. But that’s probably a topic for another post".

Oh so true. I have a list of OSS / libs / projects as long as my arm where documentation (lack of) is the main barrier towards them being a productivity barrier as opposed to an aid.

Seems the standard thing is to code like a demon, expend all your energy on a great piece of software and then expect everybody else to browse the source to garner basic usage. Possibly there's a demo project covering some contrived usage scenario's if you're lucky.

I understand, who wants to document while you could be hacking? But it's not ideal for prospective users...

I'm not sure if the "out of date" scenario is even worse. Prime example StructureMap. Semi well documented but sussing out which documentation is relevant to which version is painful!

Comments have been closed on this topic.