Hibernate at Jazoon'07

2007-06-22  |   |  conference  

I will be at Jazoon (Zurich) to talk about Hibernate Search on Tuesday. I'll hang around Monday and Tuesday, so feel free to pass by the JBoss booth for a chat.


Hibernate Search - cool, but is it the right approach? Year baby!

2007-06-15  |   |  hibernate search  

Sanjiv Jivan wrote a blog entry questioning the "point" of Hibernate Search. He missed some critical steps in his argumentation, that I am willing to correct. I started to answer on his blog, but the answer being fairly long, I opted for a blog entry.

I think Sanjiv failed to understand which population Hibernate Search is targeting.
Hibernate Search is about ORM. If you don't use Hibernate, if you don't use JPA, forget about Hibernate Search, it's not for you.

His main point is, why use Hibernate Search instead of a straight Lucene + Database (I'm assuming JDBC) solution? Five years before he could have asked, why use an ORM rather than a straight JDBC access? Because it does for you and optimize 90% of the job and let you focus on the 10% that is hard.
I won't explain why an ORM is usually (but not always) a good approach (everybody got that nowadays), so let's focus on a different question: considering that Hibernate is used in a given application, should we go for plain Lucene and JDBC layer as Sanjiv suggests or should we go for Hibernate Search? Should we go for 2 different set of APIs / programmatic model and model representation, or should we go for one unified model?

Let's see each of Sanjiv's concerns one at a time.

Why Hibernate Search rather than plain Lucene and JDBC?
Out of the box, setting up a plain Lucene and JDBC solution requires to write the bridge. Lucene has it's own world, the DB an other one. Your code has to bind them together (write the optimized JDBC routine + optimized Lucene index routine). It can be long, painful and buggy.
I doubt Sanjiv had to do it before, he would not talk like that :) Hibernate Search does the binding for you in your Hibernate backed application.
People are attracted by Hibernate Search because it lowers the barrier of entry to Lucene in a project by a great deal. This opens the Search capabilities to a lot of applications that would not have considered it with only plain Lucene in their hands.

Hibernate (Search) does not play well with massive indexing
Sanjiv claims that the initial indexing (or reindexing) is slow (he hasn't tried actually) and memory consuming.
Have a second look at the Hibernate Search reference documentation, the massive indexing procedure explicitly helps you to control the amount of memory spent.
In Lucene, one good rule of thumb is use as much memory as possible to minimize IO access. So yes, the more memory you'll spend the more efficient your hibernate Search massive indexing will be. You have to think about the global system, not only a subpart.

Event based indexing should not be used
Next Sanjiv tries to show that the event based indexing is wrong and that one should always use batch indexing. The honest answer is it depends.
Hibernate Search does not constraint to index things per transaction (it's a pluggable strategy), and I never said that indexing at commit time was important. Not indexing before commit time is critical (think about rollbacks).
As a matter of fact, the clustered mode (JMS mode) explicitly does not index at commit time, it delegates the work for later (and to someone else). The overhead of sending a message for later indexing (I'm not speaking of actual Lucene operations here) is minimal.
What do we gain? The usual on the fly vs batch mode benefits: no batch window, more homogeneous CPU consumption on systems, not having to take care of a batch job. I don't know about you, but the less batch jobs I have in my systems, the better I sleep.
By the way, is batch mode supported with Hibernate Search? Absolutely. Who likes to avoid batch jobs when possible, most of the developers and ops guys I have met. When you need to use them, do it ; when you don't stop the masochism.

To justify that batch mode should rules, Sanjiv used the data mining and star / snow schema as an example. These are a very specific kind of applications where ORM are almost never used. They could be, with some adjustments tot he ORM, but that's another story, maybe my next project :) Anyway, this is out of the scope of Hibernate Search, see the very first point.

I agree that JMS is highly over engineered and should be simplified in Java EE6, but come on, setting up a Queue is only a few clicks in a graphical console... it's not too bad. Don't tell me JMS is too hard (Hibernate Search does the JMS calls by the way, not you).

Hibernate Search does not support third party modifications in the database
It's actually a fairly known problem to people who use 2nd level cache in ORMs, has 2nd level cache been banned from our toolbox? clearly no. But once again Hibernate Search works fine in a batch mode. So this should solve Sanjiv's concerns.

Annotation based indexing definition is not flexible
Is that an inflexible approach? How practical would it be to change them on the fly? Changing which elements are indexed, or how would require to reindex the whole set of data. Quite possible, but definitely something that is not so useful on the fly. As for boosting, I do set my field boosting at query time, I find it more flexible than index time boosting, so I never had the issue Sanjiv is describing.

Why using Hibernate Search query API?
Why not using straight Lucene queries an APIs, it's all about text in the end?
The nice thing about the Hibernate Search is that it's really easy to replace a HQL query by a Lucene query: just replace the Query object and you're done, the rest of the code remains unchanged. Because is that simple, people tend to use Hibernate Search and Lucene queries in a more widespread number of usecases, and not simply for a Yahoo-like search screen (we always talk about Google, let's switch for a while ;) ):
- save some DB CPU cycles and distribute it to cheaper machines
- efficient multi word queries
- wildcards
- etc
Here is a use case that is clearly not about plain text:
"increase visibility of all books where 'Paris Hilton' is mentioned and double the increase if 'prison' is also present"

Hibernate Search queries can return either managed objects or projected properties (retrieving only a subset of the data). When to use what?
Sometimes, you use property projections rather than object retrieval in HQL queries either for ease of use or performance reasons, It's more convinient to play with the objects, but you pick up the best tool for the job. I would say the same kind of rules can be applied with Hibernate Search between a regular query and a field projection.

Hibernate Search not suitable for high volume websites
I love this one. I did design high volume websites backed by Lucene. I know what you gain, I know what you lose. Hibernate Search is full of best practices. The Hibernate Search clustering support is a good example of architecture that an architect could mimic to scale with Lucene (up and out). But it's not the only one, it depends on the use case, that's why Hibernate Search does not impose an architecture, that's why I prefer libraries over off-the-shelves products.

I would recommend this off-the-shelves solution?
DBSight or Solr (which I know better) are interesting solutions indeed, but not for the same kind of projects, or at least not for the same integration strategy. We are comparing a library versus a black box. BTW DBSight has a 3-minutes install demo. I could not beat them, it took me 15 mins on stage at JavaOne ( but I walk and talk a lot :) )
I have never been a big fan of black boxes nicely integrated in my IT system, but if I had to choose such a solution I would also give the Google Search Appliance a try, the Google Mini is fairly cheap.


Anyway, Hibernate Search has been developed with practical solutions for practical problems, not theoretical considerations. Giving it a shot is the only way to judge.
Damn long post, sorry about that :(


Got a MacBook (Pro), better consider this upgrade

2007-06-12  |   |  apple  

Last night my battery and my MacBook Pro decided not to talk to each other anymore. The battery was full but as soon as I unplugged the AC, the laptop shut down immediately, fairly useless ;-)

Have a look at Apple's support case MacBook Battery Update.

This patch apparently prevent the synchronization failure to happen, saving you a WFT moment and few hours of downtime.

I have been fairly impressed by the Genius bar support. It took me 1h to figure it out the problem in the morning (The Apple support website roughly described my problem), I made an appointment right away, another hour later I had my new battery operational (travel time included).

At least, they don't argue with you and don't ask if you know how to turn the power on. It's a big plus compared to other support organizations.


Hibernate Search freshly baked features

2007-06-06  |   |  hibernate search  

I had to release Hibernate Search Beta3 early after we discovered a fairly severe bug in Beta2. But I had time to inject some new features. After those introduced in Beta2, that a fairly good week :)

batch size limit on object indexing
If you don't pay attention when initially indexing (or reindexing) your data, you may face out of memory exceptions. The old solution was to execute indexing in several smaller transactions, but the code ended up being fairly complex. Here is the new solution:

hibernate.search.worker.batch_size=5000

int batchSize=5000;
//scroll will load objects as needed
ScrollableResults results = fullTextSession.createCriteria( Email.class )
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % batchSize == 0) s.clear(); //clear every batchSize
}
wrap that into one transaction and you are good to go.

Native Lucene
The APIs were never officially published (until beta3), but Hibernate Search lets you fall back to native Lucene when needed. All the needed APIs are held by SearchFactory.

DirectoryProvider provider = searchFactory.getDirectoryProvider(Order.class);
org.apache.lucene.store.Directory directory = provider.getDirectory();
This one is the brute force and gives you access to the Lucene Directory containing Orders. A smarter way, if you intend to execute a search query, is to use the ReaderProvider
DirectoryProvider clientProvider = searchFactory.getDirectoryProvider(Client.class);
IndexReader reader = searchFactory.getReaderProvider().openReader(clientProvider);

try {
//do read-only operations on the reader
}
finally {
readerProvider.closeReader(reader);
}
Smarter because you share the same IndexReaders as Hibernate Search, hence avoid the unnecessary IndexReader opening and warm up.

Finally you can optimize a Lucene Index (roughly a defragmentation)

SearchFactory searchFactory = fullTextSession.getSearchFactory();
searchFactory.optimize(Order.class);
//or searchFactory.optimize();


From a Bug blooms a thousand Features

2007-06-06  |   |  oss  

When a severe bug hits a product, you have to fix and release quickly (at least I feel I have to). But, especially in the beta phase, it's fairly humiliating to release with one single ticket resolution.

Call it pride, pair pressure, ego, unwillingness to face reality, teenager knee jerk, I just can't release a beta with one single lonely closed ticket.

This is what happened on Hibernate Search. Beta2 introduced a severe bug in object retrievals. So I ended up coding a few new features, fixing a few additional annoyances to hide the obvious.

That's one of the things I like in the Software as a Service model, transparent bug fixing, but that's another story.

Obviously, such aggressive release cycles can only work as long as a Product Manager don't look over your shoulder.

Who said bugs were a bad thing? ;-)


Lucene feedback from Atlassian JIRA and Confluence

2007-05-22  |   | 

Mike Cannon-Brookes from Atlassian has posted (some moons ago) two interesting presentations related to Lucene and Atlassian's feedbacks from Confluence and JIRA.

My favorite is the first one: Lucene: Generic Data Indexing. It's a nice introduction to the benefits of Full Text search engines, as well as the gotchas you will face.
I found their use of FilterQuery as a cross-cutting concern implementation for security fairly interesting.

The first presentation also quickly address some of the indexing strategies (synchronous / asynchronous) depending on the product requirements. Mike goes a bit deeper in the second one by describing some clustering solutions and the one they choose.

JIRA has gone very far in its use of Lucene, I am not sure I would have gone that far, but that's definitively a very interesting extreme use case, and very successful :)


Demo of JBoss Seam DVD Store powered by Hibernate Search

2007-05-16  |   |  hibernate search  

Many asked me if the DVD Store demo powered by Hibernate Search that I ran at JavaOne was available online.
The answer is not yet, but it will. My plan is to package it nicely and release it when the beta 2 of Hibernate Search is out.
This will hopefully happen fairly soon. JavaOne being behind us, I can focus back on the code base.

I had some very interesting discussions with some of you about Lucene and the features you need, it's good to see the community growing around the project.


Hibernate Search and JSR-303 at JavaOne

2007-05-09  |   | 

I will be presenting Hibernate Search at JavaOne.

I will be demoing a live migration of the JBoss Seam DVD Store application from a classic SQL based search engine onto Hibernate Search with Google-like search capabilities. If your user pressures you for a decent and useful search feature or your DBA asks you not to kill the database perfs, you might want to take a look at it.

It's Friday at 10h50:
TS-4746 - Hibernate Search: Googling Your Java Technology-Based Persistent Domain Model


I will also give an update on JSR-303 Bean Validation (and Hibernate Validator) about the goals, the expectations and where it fits in the Java ecosystem, with a demo too ;-)

Friday again at 14h50 ( 2:50 PM in our local hosts language ;-) )
TS-4112 - Declarative Programming: Tighten Enterprise JavaBeans (EJB) 3.0 and JSR 303 Beans Validation

See you there


Hibernate Search talk at JAX '07

2007-04-22  |   |  hibernate search  

One feature request for Hibernate Search has been surprisingly pretty popular: support for indexed embedded collections and hence correlated queries involving collections.
This is no longer a request and is available in SVN :-)

Imagine a Movie having a list of Actors, the following query is now possible:

give me the movie talking about Central Intelligence Agency and having one of the Baldwins in the casting
or in Lucene language
description:"Central Intelligence Agency" authors.name:Baldwin
Of course the drawback is to potentially increase drastically the size of your index. So use it when the collection size is under control.

I am going to talk about Hibernate Search at the JAX 07 conference in Wiesbaden on Tuesday. Let's have a beer if you are around.

By the way, I think De Niro did a not so great job on this movie, too much is suggested (probably too many details too fast). I know that was the intend, but he went too far in my opinion.


Licensing and trademark

2007-03-26  |   |  oss  

There has been lots of turmoils last week on two not so related subjects. Let's clarify them a bit.

LGPL rights and duty


Lot's have been said about this license, and lot's of people out there don't understand the rights and duty of this license.

  • Goal
From the GNU LGPL Preambule:
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users.
The goal is to guaranty freedom (of speech) to he users of a given software.

  • Can I use a verbatim copy of a LGPL library in my software? What about my code license? What if I distribute my software?
You can use a verbatim (unmodified) copy of an LGPL library in your code and distribute your application. Your application can use any license (commercial or open source), in other words your code does not fall into the LGPL license. The library remains LGPL of course.

  • Can I modify the library? What happens then?
You can modify the LGPL library, any modification has to be LGPL. If you distribute those modifications, you have to comply with the LGPL and distribute the modified source code as well. In other words, a user of yours will be able to see the code changes and do whatever it pleases him with it provided that he follows the LGPL rules.
Your application (aside from those modifications) does not fall into the LGPL license.

It is usually admitted (while not required), as a courtesy, to provide (all) your modifications to everybody (not only the third party you distribute your application to). It usually doesn't matter in the end, because any of your application users will be able to freely redistribute for free the modifications you made on the LGPL library. There is nothing you can do about it.

  • Goal (once again)
The goal is to be sure that any change to an LGPL library will remain LGPL, be contributed back to the community, and never be hidden in a closed source program.

Check the LGPL license for more info.

Trademarks

A trademark includes any word, name, symbol, or device, or any combination, used, or intended to be used, in commerce to identify and distinguish the goods of one manufacturer or seller from goods manufactured or sold by others, and to indicate the source of the goods. In short, a trademark is a brand name.[1]
A trademark does not prevent you from providing a service based on a given product. It restricts and organize, however, the way you can use a given (combination of) word.
(Protection of) Trademarks is actually a fairly common practice, including in the Open Source world, to name a few
All of them, at one time or an other, have made sure their trademark is enforced.

Why?
For all of them, to protect the brand, to protect the message the brand is pushing.

That is the reason why I changed the name Hibernate Lucene to Hibernate Search, it violated the ASF trademark, so I went ahead and fixed it.

To clarify the turmoil with Hibernate, please check the clarification by Mark Webbing. It's in the comments here but I will reproduce it for clarity.

I am writing to clarify the issues raised by the publication of Ms. Robertson's communication on behalf of Red Hat. First, the letter is not placed into the context of the situation it was addressing. That presents the opportunity for misinterpretation. At the same time, I would agree that the letter is less than precise in defining what has been done wrong and the corrective action that is required. Ultimately, that is my fault as the person in charge of trademark enforcement at Red Hat.

Contrary to Gavin's statements above, you cannot offer HIBERNATE Training or JBOSS Training. This is an improper use of Red Hat trademarks in that the marks are being used (a) either as nouns or (b) to promote a good or service that is directly branded with Red Hat owned marks. What is permissable, and I am sure this is what Gavin meant, is that you are permitted to offer HIBERNATE(R) Object Relational Mapping Software Training or, as another example, JBoss(R) Application Server Training. Here the marks are being applied to the goods in a proper manner and it is clear that the training is being provided for that branded technology, not by the brand owner. As a further common courtesy, it would also be appropriate for those properly using the marks in this manner to make clear that they are not in anyway associated with Red Hat or its JBoss Division.

With that clarification I hope I have resolved the confusion and/or discontent around this issue. More extensive information on the permitted uses of Red Hat marks can be found at http://www.redhat.com/about/companyprofile/trademark/

I would also ask, as a courtesy to Ms. Robertson, that the party who posted her letter please indicate that they were the party posting the letter, not Ms. Robertson.

My apologies for any confusion that has been caused.

Mark Webbink
Deputy General Counsel
Red Hat, Inc.

Sidenote

Contrary to some claims, you don't have to have a @jboss.com address to contribute to JBoss projects (I mean commit access). All you have to do is being accepted by the community and the project lead (as any open source project), and sign a contributor agreement (in a similar manner an ASF contributor agreement is signed). To name Hibernate, I can count at least twice as many active contributors not having a @jboss.com address than having one :-)

By the way, I am not a lawyer, so take my words as is etc etc. My dog knows a dog who knows a lawyer, but I am not sure that qualifies me ;-)


Name: Emmanuel Bernard
Bio tags: French, Open Source actor, Hibernate, (No)SQL, JCP, JBoss, Snowboard, Economy
Employer: JBoss by Red Hat
Resume: LinkedIn
Team blog: in.relation.to
Personal blog: No relation to
Microblog: Twitter, Google+
Geoloc: Paris, France

Tags