Answering questions at JavaRanch + free books

2008-12-10  |   |  book   hibernate search   jboss  

I am doing a session on Hibernate Search all this week at JavaRanch. Manning will give away free books of Hibernate Search in Action for the occasion.

If you have questions on Hibernate Search, express yourself :)


JBoss AS 5 is out, Hibernate Search 3.1, Devoxx is warming up and

2008-12-05  |   |  book   conference   hibernate search   jboss  

Great news this week:

  • JBoss AS 5 is out. Congratulations to Dimitris and the many people in and out of JBoss who contributed to it.

  • Hibernate Search 3.1 is out. A lot of good stuffs like performance improvements at indexing and querying time and some cool new features like the analyzer declaration framework (allowing declarative phonetic, synonym, n-gram indexing and searches)

  • Devoxx is very close. Come see the JBoss folks and topics and come to the Seam meetup after the BOF of course :)

Also, on the Bean Validation (JSR 303) side, I am finalizing the last changes in the spec for the public draft. We made a lot of progress in the last two weeks on various subjects including type-safe groups and JPA / JSF / EE integration (with the finalized draft, I will officially contact the EE expert group). Stay tuned, hopefully the draft should be out in a week or two.

Finally, Hibernate Search in Action is supposed to be released in final PDF monday (still not believing in it till I see that one ;) ).

Great week on my side. See you at Devoxx for a drink or two.


Hibernate Search 3.1 RefCard

2008-12-01  |   |  book   hibernate search   jboss  

DZone has a nice Hibernate Search 3.1 6-pages ref card. It is packed with:

  • The list of annotations and their descriptions
  • Hibernate Search's main APIs
  • Lucene's most useful query types
  • Quick examples involving mappings and API usage (including the new analyzer declaration framework)

It's free but you need to register.

Speaking of the devil. John and I have given back our last edits for Hibernate Search in Action. So we are still on target for releasing the book in december. I personally still can't believe I am done, so I will play the St Thomas and will wait till I can touch the paper :)

You can get the paper book at Amazon or on the Manning website. Manning also offers the PDF version.


Book review and NHibernate Search

2008-10-07  |   |  book   hibernate search   jboss  

Ayende, one of the active bees behind the NHibernate portfolio, has a nice review of Hibernate Search in Action on his blog.

By the way, Ayende has ported Hibernate Search to .net : NHibernate.Search. I don't think there is documentation specific to the project but the Hibernate Search documentation is just as useful.

I don't know Ayende personally, but I can only admire someone that blogs more that I can tweet and still have a full time job :)


Hibernate Search book preview final review: hitting where it hurts for the better

2008-09-11  |   |  book   hibernate search   jboss  

We just had our third review of Hibernate Search in Action. Receiving this feedback has been a humble experience. Lot's of good reviews (good) and some critical ones (even better). Every imperfection we left aside came back in the spot lights of our reviewers.

Based on this feedback, we have been working hard the last two weeks to improve a lot the manuscript:

  • clearer code transcripts with more inline annotations
  • better separation between different parts of the same example (Hibernate API versus Java Persistence API)
  • the code has been updated to the latest Hibernate Search version and cleaned up a lot (no more warning, same comments as in the book)
  • the code now contains README files for easier¬†navigation, ant scripts, Eclipse and IntelliJ descriptors
  • a nice appendix summarizing all annotations, Hibernate Search APIs and Lucene Query classes
  • added an index: I wish I could plug Hibernate Search on the book, that one was painful
  • added a section on testing (mocking, in-memory integration testing, performance testing)
  • better explanation on how query and analyzer are interwoven
  • add the Explanation API description
  • clearer introduction for each chapter
  • much more references than before making book navigation easier
  • all references in the book are up to date. No more Chapter XX ;)
  • improvements on the clustering chapter

The code is almost ready for prime time, we will publish it as soon as we find the right vehicle for it.

Thanks to all our reviewers. While I am not sure I appreciate the recent sleep depravation, this definitely improved the book a lot.

As usual, you can get the preview version electronically at Manning, it has all the chapters and I hope to get the latest changes uploaded soon.


Hibernate Search in Action: all chapters written

2008-08-11  |   |  book   hibernate search  

That's now official, I handed over the last chapter to the publisher yesterday. All chapters will be available to the early access program in the next few days. The journey is not finished yet. A lot of reviewing and correction are at sight. If you have feedback, now is the time :)

The last last few chapters out cover:

  • Hibernate Search filters
  • Performance
  • Cluster and scalability

Filters are a neat feature allowing to apply cross cutting restrictions on Lucene queries: you might want to filter to the latest month item creations or filter according to the security level the user is granted. Filters do just that in a declarative fashion: enable one or more of them by their names.

The chapter of performance is a mix of existing content and new content focused around performance at various stages: indexing and searching. It also includes an explanation about index sharding.

The last chapter describes problems that arise when you try to cluster Lucene and how Hibernate Search addresses them. We primarily describe the asynchronous clustering approach implemented out of the box in Hibernate Search (using JMS). The chapter also describes how to customize Hibernate Search to your own clustering strategy to meet your architectural needs.

Now off to the correction marathon :)


Two third of Hibernate Search in Action out!

2008-05-29  |   |  book   hibernate search  

We have just pushed another set of chapters for Hibernate Search in Action and reached the symbolic limit of 2/3. Yoohoo! We have also enhanced some of the existing chapters based on the feedbacks we received and the perseverance of our editor. They have just shipped as part of the early access program available in ebook format. I am very happy with the new chapters especially the description of analyzers in chapter 5.

I describe the use of Hibernate Search and the new Solr integration (coming up in Hibernate Search 3.1) to enable synonym, phonetic, n-gram and snowball search. Snowball is an algorithm that deduces the root of a word enabling searches for words of the same family: work, working, worker will all be reduced to the same root (called stemmer). The chapter both aims at demystifying analyzers and providing a concrete approach on how to use them. I really enjoyed writing it, I hope you will like reading it.

Beyond analyzers, here is a small list of the subjects I am personally happy with in the book:

  • the introduction gives you a nice picture of what search is, why it matters nowadays and how it can be implemented in todays applications
  • custom bridges: chapter 4 gives some nice examples of custom bridges including how to write one for composite identifiers
  • mapping associations has always been a confusing subject for beginners. Chapter 4 comes with some nice diagrams that should clarify this topic
  • chapter 5 explores when and how to use synchronous indexing, asynchronous indexing and clustered indexing (through JMS) and describes what is going on under the hood

We (and by we, I mean John) also have finished the Lucene dedicated content. While Hibernate Search hides the gory details of indexing and searching, its let you use all the flexibility of Lucene queries. Understanding Lucene is key and we have added the necessary knowledge you need for your daily work with Hibernate Search.

Let us know what you think and, of course, go buy the e-book :)


Hibernate Search - cool, but is it the right approach? Year baby!

2007-06-15  |   |  hibernate search  

Sanjiv Jivan wrote a blog entry questioning the "point" of Hibernate Search. He missed some critical steps in his argumentation, that I am willing to correct. I started to answer on his blog, but the answer being fairly long, I opted for a blog entry.

I think Sanjiv failed to understand which population Hibernate Search is targeting.
Hibernate Search is about ORM. If you don't use Hibernate, if you don't use JPA, forget about Hibernate Search, it's not for you.

His main point is, why use Hibernate Search instead of a straight Lucene + Database (I'm assuming JDBC) solution? Five years before he could have asked, why use an ORM rather than a straight JDBC access? Because it does for you and optimize 90% of the job and let you focus on the 10% that is hard.
I won't explain why an ORM is usually (but not always) a good approach (everybody got that nowadays), so let's focus on a different question: considering that Hibernate is used in a given application, should we go for plain Lucene and JDBC layer as Sanjiv suggests or should we go for Hibernate Search? Should we go for 2 different set of APIs / programmatic model and model representation, or should we go for one unified model?

Let's see each of Sanjiv's concerns one at a time.

Why Hibernate Search rather than plain Lucene and JDBC?
Out of the box, setting up a plain Lucene and JDBC solution requires to write the bridge. Lucene has it's own world, the DB an other one. Your code has to bind them together (write the optimized JDBC routine + optimized Lucene index routine). It can be long, painful and buggy.
I doubt Sanjiv had to do it before, he would not talk like that :) Hibernate Search does the binding for you in your Hibernate backed application.
People are attracted by Hibernate Search because it lowers the barrier of entry to Lucene in a project by a great deal. This opens the Search capabilities to a lot of applications that would not have considered it with only plain Lucene in their hands.

Hibernate (Search) does not play well with massive indexing
Sanjiv claims that the initial indexing (or reindexing) is slow (he hasn't tried actually) and memory consuming.
Have a second look at the Hibernate Search reference documentation, the massive indexing procedure explicitly helps you to control the amount of memory spent.
In Lucene, one good rule of thumb is use as much memory as possible to minimize IO access. So yes, the more memory you'll spend the more efficient your hibernate Search massive indexing will be. You have to think about the global system, not only a subpart.

Event based indexing should not be used
Next Sanjiv tries to show that the event based indexing is wrong and that one should always use batch indexing. The honest answer is it depends.
Hibernate Search does not constraint to index things per transaction (it's a pluggable strategy), and I never said that indexing at commit time was important. Not indexing before commit time is critical (think about rollbacks).
As a matter of fact, the clustered mode (JMS mode) explicitly does not index at commit time, it delegates the work for later (and to someone else). The overhead of sending a message for later indexing (I'm not speaking of actual Lucene operations here) is minimal.
What do we gain? The usual on the fly vs batch mode benefits: no batch window, more homogeneous CPU consumption on systems, not having to take care of a batch job. I don't know about you, but the less batch jobs I have in my systems, the better I sleep.
By the way, is batch mode supported with Hibernate Search? Absolutely. Who likes to avoid batch jobs when possible, most of the developers and ops guys I have met. When you need to use them, do it ; when you don't stop the masochism.

To justify that batch mode should rules, Sanjiv used the data mining and star / snow schema as an example. These are a very specific kind of applications where ORM are almost never used. They could be, with some adjustments tot he ORM, but that's another story, maybe my next project :) Anyway, this is out of the scope of Hibernate Search, see the very first point.

I agree that JMS is highly over engineered and should be simplified in Java EE6, but come on, setting up a Queue is only a few clicks in a graphical console... it's not too bad. Don't tell me JMS is too hard (Hibernate Search does the JMS calls by the way, not you).

Hibernate Search does not support third party modifications in the database
It's actually a fairly known problem to people who use 2nd level cache in ORMs, has 2nd level cache been banned from our toolbox? clearly no. But once again Hibernate Search works fine in a batch mode. So this should solve Sanjiv's concerns.

Annotation based indexing definition is not flexible
Is that an inflexible approach? How practical would it be to change them on the fly? Changing which elements are indexed, or how would require to reindex the whole set of data. Quite possible, but definitely something that is not so useful on the fly. As for boosting, I do set my field boosting at query time, I find it more flexible than index time boosting, so I never had the issue Sanjiv is describing.

Why using Hibernate Search query API?
Why not using straight Lucene queries an APIs, it's all about text in the end?
The nice thing about the Hibernate Search is that it's really easy to replace a HQL query by a Lucene query: just replace the Query object and you're done, the rest of the code remains unchanged. Because is that simple, people tend to use Hibernate Search and Lucene queries in a more widespread number of usecases, and not simply for a Yahoo-like search screen (we always talk about Google, let's switch for a while ;) ):
- save some DB CPU cycles and distribute it to cheaper machines
- efficient multi word queries
- wildcards
- etc
Here is a use case that is clearly not about plain text:
"increase visibility of all books where 'Paris Hilton' is mentioned and double the increase if 'prison' is also present"

Hibernate Search queries can return either managed objects or projected properties (retrieving only a subset of the data). When to use what?
Sometimes, you use property projections rather than object retrieval in HQL queries either for ease of use or performance reasons, It's more convinient to play with the objects, but you pick up the best tool for the job. I would say the same kind of rules can be applied with Hibernate Search between a regular query and a field projection.

Hibernate Search not suitable for high volume websites
I love this one. I did design high volume websites backed by Lucene. I know what you gain, I know what you lose. Hibernate Search is full of best practices. The Hibernate Search clustering support is a good example of architecture that an architect could mimic to scale with Lucene (up and out). But it's not the only one, it depends on the use case, that's why Hibernate Search does not impose an architecture, that's why I prefer libraries over off-the-shelves products.

I would recommend this off-the-shelves solution?
DBSight or Solr (which I know better) are interesting solutions indeed, but not for the same kind of projects, or at least not for the same integration strategy. We are comparing a library versus a black box. BTW DBSight has a 3-minutes install demo. I could not beat them, it took me 15 mins on stage at JavaOne ( but I walk and talk a lot :) )
I have never been a big fan of black boxes nicely integrated in my IT system, but if I had to choose such a solution I would also give the Google Search Appliance a try, the Google Mini is fairly cheap.


Anyway, Hibernate Search has been developed with practical solutions for practical problems, not theoretical considerations. Giving it a shot is the only way to judge.
Damn long post, sorry about that :(


Hibernate Search freshly baked features

2007-06-06  |   |  hibernate search  

I had to release Hibernate Search Beta3 early after we discovered a fairly severe bug in Beta2. But I had time to inject some new features. After those introduced in Beta2, that a fairly good week :)

batch size limit on object indexing
If you don't pay attention when initially indexing (or reindexing) your data, you may face out of memory exceptions. The old solution was to execute indexing in several smaller transactions, but the code ended up being fairly complex. Here is the new solution:

hibernate.search.worker.batch_size=5000

int batchSize=5000;
//scroll will load objects as needed
ScrollableResults results = fullTextSession.createCriteria( Email.class )
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % batchSize == 0) s.clear(); //clear every batchSize
}
wrap that into one transaction and you are good to go.

Native Lucene
The APIs were never officially published (until beta3), but Hibernate Search lets you fall back to native Lucene when needed. All the needed APIs are held by SearchFactory.

DirectoryProvider provider = searchFactory.getDirectoryProvider(Order.class);
org.apache.lucene.store.Directory directory = provider.getDirectory();
This one is the brute force and gives you access to the Lucene Directory containing Orders. A smarter way, if you intend to execute a search query, is to use the ReaderProvider
DirectoryProvider clientProvider = searchFactory.getDirectoryProvider(Client.class);
IndexReader reader = searchFactory.getReaderProvider().openReader(clientProvider);

try {
//do read-only operations on the reader
}
finally {
readerProvider.closeReader(reader);
}
Smarter because you share the same IndexReaders as Hibernate Search, hence avoid the unnecessary IndexReader opening and warm up.

Finally you can optimize a Lucene Index (roughly a defragmentation)

SearchFactory searchFactory = fullTextSession.getSearchFactory();
searchFactory.optimize(Order.class);
//or searchFactory.optimize();


Demo of JBoss Seam DVD Store powered by Hibernate Search

2007-05-16  |   |  hibernate search  

Many asked me if the DVD Store demo powered by Hibernate Search that I ran at JavaOne was available online.
The answer is not yet, but it will. My plan is to package it nicely and release it when the beta 2 of Hibernate Search is out.
This will hopefully happen fairly soon. JavaOne being behind us, I can focus back on the code base.

I had some very interesting discussions with some of you about Lucene and the features you need, it's good to see the community growing around the project.


Name: Emmanuel Bernard
Bio tags: French, Open Source actor, Hibernate, (No)SQL, JCP, JBoss, Snowboard, Economy
Employer: JBoss by Red Hat
Resume: LinkedIn
Team blog: in.relation.to
Personal blog: No relation to
Microblog: Twitter, Google+
Geoloc: Paris, France

Tags