Lucene.net tutorial
A few weeks ago I expressed my intention of introducing Lucene.net into Subtext, and that I would have written about the journey. In this post I’m going to write some hints on how to get started with Lucene.net.
Download the bits
Unfortunately Lucene.net is not officially releasing new versions since March 2007 when Lucene.net 2.0 was released. But since 2007 a lot of new features and bug fixes where introduced, so the best way to get the latest bits is to download the latest tag of the SVN repository. At the time of writing the latest tag is version 2.3.2. This is also the recommendation of the developers since “bureaucracy” is preventing the official release to happen.
And then, once you got the latest bits from the Subversion repository, you have to start the build process. The solution file available in the repository is still with VS2005 so either you use that version, or run the file through the update wizard and use VS2008.
The structure of the repository is quite deep, so it’s worth to have a few words on it before going on:
- src: contains 3 subfolders with the source for the core library, the unit tests and a few demo project that are interesting to show the main usage scenario
- contrib: this contains a few external libraries that enhance the feature of Lucene. There is a small utility class that helps highlighting the matching terms, there is another lib that helps building similarity queries, there is spellchecker, there are stemmers (a “thing” that normalizes the text to be indexed) and there is also a distributed search engine
I hope the developers of Lucene.net find a way to publish an official updated release because seeing a 2 years old release might frighten people and make them think that this project is not maintained any more.
Or alternatively you can skip to the end of this post and download the assembly I built to make life easier for you.
Now that you’ve got a binary version of Lucene.net the next step is learning how to do it. I’m going to write some more posts on this in the next weeks, but if you want to start reading directly from the original documents, in the next section you’ll find where to go.
Find the documentation
Documentation is probably the thing that is lacking the most in this project (and this is one of the reason why I decided to write this series of posts). But being a class-by-class port of the Java version, you can find a bit more information on the Lucene for Java website (but just a bit).
I would suggest starting from the overview available in the JavaDoc, which explains the main packages and a basic usage scenario. And then go for the MSDN-style documentation for Lucene.net (but be careful since it’s related to Lucene.net 2.1). And some more unstructured documents are available in the wiki.
Finally, if you want to get deeper into Lucene, a book from Manning is available: Lucene In Action. It’s a bit old and talks about Lucene 1.4. But most of the key concepts are still the same. There is also the second edition of the book, still in MEAP: Lucene in Action, Second Edition. But it talks about the forthcoming version 3.0, so a bit too much if you are interested in Lucene.net which is still at version 2.3.
That reminds me of how important a good documentation is for opensource projects: even if you built the best OSS library in the World, if it’s not well documented all your efforts are useless. But that’s probably a topic for another post.
Next
Now you know how to get Lucene.net and where to look to find more information. In a future post I’ll write about the main concepts of Lucene.net and later about how I’m implementing Lucene.net into Subtext.
If you trust me, and want to avoid all the hassle of getting the source from SVN, migrating the solution to VS2008 and build it, I did it for you, and you can download the main Lucene.net 2.3.2 library.