Release Candidate for Hibernate Search 3.0.0

2007-09-04  |   | 

Release Candidate for Hibernate Search 3.0.0

Hibernate Search 3.0.0.CR1 is now out. This release is mainly the last bits of new features and polishing before the final version. The next cycle will be dedicated to bug fixes (of any bug that pops up), as well as test suite and documentation improvements.

Thanks to Hardy for the new getting started section (this should ease the path for newcomers), and to John for hammering the last features we wanted in the GA version.

The next version should be the GA release unless some complex bugs are discovered.

Check the changelogs for a detailed change list.


Podcast about Hibernate Shards

2007-08-07  |   |  shards  

Max Ross and Maulik Shah from the Hibernate Shards team at Google got interviewed by the Google Developer Podcast.
It's a nice and easy access 30 mins introduction of Hibernate Shards: how does it work, where does it come from, what are Hibernate Shards do's and don'ts, what's the secret plan to take over the world... Insightful. One of the cool stuffs they added in the latest beta is support for Hibernate Annotations.
If you want to know what shards is all about or if your DB starts to feel the heat, go get your headset.


Innovation in the Log space, yawn...

2007-08-01  |   |  java  

Steve and I had a discussion yesterday about loggers. I know what you're thinking: hasn't log been a solved problem for years now? Plus it's boring ;)
That's why usually, when a discussion starts on the subject, I tend to carefully not listen. But because it's Steve, and because he has some specific requirement for the next Hibernate Core version, I decided to ignore my own rule.

It turned out to be much more interesting than what I expected. Here are some news for people, like me, that stopped listening to the log crowd when the rule was use log4j and when you can't use commons-logging.

I was pleasantly surprised by slf4j. I know, it's yet another facade and the name is awful. But this project has 2 features that really caught my attention.

Parameterized log
Isn't it very annoying to have to do

if ( log.isDebugEnabled() ) {
log.debug("My dummy " + object1 + " and expensive " + object2 + " concatenation");
}

because the object's toString() method are expensive?

slf4j solves that by using parameterized logs

log.debug("My dummy {} and expensive {} concatenation", object1, object2);

Very elegant and just as efficient as the previous form. Now because slf4j supports JDK 1.3, the API cannot use varargs, which means that for 3 or more parameters you will have to write

log.debug("My dummy {} and expensive {} concatenation {}", new Object[] { object1, object2, object 3 });


instead of the much more elegant

log.debug("My dummy {} and expensive {} concatenation {}", object1, object2, object3);


Damn slow movers! I guess most of the time you have 1 or 2 arguments so the pain should be minimal, or you could write you own facade, sigh

Static binding

Once I understood what it meant, I liked it. Basically to switch from one underlying logger to another, you will replace slf4j-mylogger1.jar by slf4j-mylogger2.jar: the slf4j engine is statically bound to an implementation.
OMG! This means I cannot change my logger implementation by hot deploying a config file! Oh, wait a minute, it's useless anyway.
The good side is that classloader hells are behind us.

The Ultimate Uber Cool solution

The ultimate solution is actually what Gavin came up with in Seam. So instead of doing

private static final Log log = LogFactory.getLog(CreateOrderAction.class);

public Order createOrder(User user, Product product, int quantity) {
if ( log.isDebugEnabled() ) {
log.debug("Creating new order for user: " + user.username() +
" product: " + product.name()
+ " quantity: " + quantity);
}
return new Order(user, product, quantity);
}

you end up with

@Logger private Log log;

public Order createOrder(User user, Product product, int quantity) {
log.debug("Creating new order for user: #{user.username} product: #{product.name} quantity: #0", quantity);
return new Order(user, product, quantity);
}


Notice the parameterized logs, the log injection (yes the framework is smart enough to guess the category, doh!), and the contextual parameters injection.

But such solution is not accessible to library developers until someone decides to push that into the JDK.
OK I'm done for logs for another 5 years time :)


Hibernate at Jazoon'07

2007-06-22  |   |  conference  

I will be at Jazoon (Zurich) to talk about Hibernate Search on Tuesday. I'll hang around Monday and Tuesday, so feel free to pass by the JBoss booth for a chat.


Hibernate Search - cool, but is it the right approach? Year baby!

2007-06-15  |   |  hibernate search  

Sanjiv Jivan wrote a blog entry questioning the "point" of Hibernate Search. He missed some critical steps in his argumentation, that I am willing to correct. I started to answer on his blog, but the answer being fairly long, I opted for a blog entry.

I think Sanjiv failed to understand which population Hibernate Search is targeting.
Hibernate Search is about ORM. If you don't use Hibernate, if you don't use JPA, forget about Hibernate Search, it's not for you.

His main point is, why use Hibernate Search instead of a straight Lucene + Database (I'm assuming JDBC) solution? Five years before he could have asked, why use an ORM rather than a straight JDBC access? Because it does for you and optimize 90% of the job and let you focus on the 10% that is hard.
I won't explain why an ORM is usually (but not always) a good approach (everybody got that nowadays), so let's focus on a different question: considering that Hibernate is used in a given application, should we go for plain Lucene and JDBC layer as Sanjiv suggests or should we go for Hibernate Search? Should we go for 2 different set of APIs / programmatic model and model representation, or should we go for one unified model?

Let's see each of Sanjiv's concerns one at a time.

Why Hibernate Search rather than plain Lucene and JDBC?
Out of the box, setting up a plain Lucene and JDBC solution requires to write the bridge. Lucene has it's own world, the DB an other one. Your code has to bind them together (write the optimized JDBC routine + optimized Lucene index routine). It can be long, painful and buggy.
I doubt Sanjiv had to do it before, he would not talk like that :) Hibernate Search does the binding for you in your Hibernate backed application.
People are attracted by Hibernate Search because it lowers the barrier of entry to Lucene in a project by a great deal. This opens the Search capabilities to a lot of applications that would not have considered it with only plain Lucene in their hands.

Hibernate (Search) does not play well with massive indexing
Sanjiv claims that the initial indexing (or reindexing) is slow (he hasn't tried actually) and memory consuming.
Have a second look at the Hibernate Search reference documentation, the massive indexing procedure explicitly helps you to control the amount of memory spent.
In Lucene, one good rule of thumb is use as much memory as possible to minimize IO access. So yes, the more memory you'll spend the more efficient your hibernate Search massive indexing will be. You have to think about the global system, not only a subpart.

Event based indexing should not be used
Next Sanjiv tries to show that the event based indexing is wrong and that one should always use batch indexing. The honest answer is it depends.
Hibernate Search does not constraint to index things per transaction (it's a pluggable strategy), and I never said that indexing at commit time was important. Not indexing before commit time is critical (think about rollbacks).
As a matter of fact, the clustered mode (JMS mode) explicitly does not index at commit time, it delegates the work for later (and to someone else). The overhead of sending a message for later indexing (I'm not speaking of actual Lucene operations here) is minimal.
What do we gain? The usual on the fly vs batch mode benefits: no batch window, more homogeneous CPU consumption on systems, not having to take care of a batch job. I don't know about you, but the less batch jobs I have in my systems, the better I sleep.
By the way, is batch mode supported with Hibernate Search? Absolutely. Who likes to avoid batch jobs when possible, most of the developers and ops guys I have met. When you need to use them, do it ; when you don't stop the masochism.

To justify that batch mode should rules, Sanjiv used the data mining and star / snow schema as an example. These are a very specific kind of applications where ORM are almost never used. They could be, with some adjustments tot he ORM, but that's another story, maybe my next project :) Anyway, this is out of the scope of Hibernate Search, see the very first point.

I agree that JMS is highly over engineered and should be simplified in Java EE6, but come on, setting up a Queue is only a few clicks in a graphical console... it's not too bad. Don't tell me JMS is too hard (Hibernate Search does the JMS calls by the way, not you).

Hibernate Search does not support third party modifications in the database
It's actually a fairly known problem to people who use 2nd level cache in ORMs, has 2nd level cache been banned from our toolbox? clearly no. But once again Hibernate Search works fine in a batch mode. So this should solve Sanjiv's concerns.

Annotation based indexing definition is not flexible
Is that an inflexible approach? How practical would it be to change them on the fly? Changing which elements are indexed, or how would require to reindex the whole set of data. Quite possible, but definitely something that is not so useful on the fly. As for boosting, I do set my field boosting at query time, I find it more flexible than index time boosting, so I never had the issue Sanjiv is describing.

Why using Hibernate Search query API?
Why not using straight Lucene queries an APIs, it's all about text in the end?
The nice thing about the Hibernate Search is that it's really easy to replace a HQL query by a Lucene query: just replace the Query object and you're done, the rest of the code remains unchanged. Because is that simple, people tend to use Hibernate Search and Lucene queries in a more widespread number of usecases, and not simply for a Yahoo-like search screen (we always talk about Google, let's switch for a while ;) ):
- save some DB CPU cycles and distribute it to cheaper machines
- efficient multi word queries
- wildcards
- etc
Here is a use case that is clearly not about plain text:
"increase visibility of all books where 'Paris Hilton' is mentioned and double the increase if 'prison' is also present"

Hibernate Search queries can return either managed objects or projected properties (retrieving only a subset of the data). When to use what?
Sometimes, you use property projections rather than object retrieval in HQL queries either for ease of use or performance reasons, It's more convinient to play with the objects, but you pick up the best tool for the job. I would say the same kind of rules can be applied with Hibernate Search between a regular query and a field projection.

Hibernate Search not suitable for high volume websites
I love this one. I did design high volume websites backed by Lucene. I know what you gain, I know what you lose. Hibernate Search is full of best practices. The Hibernate Search clustering support is a good example of architecture that an architect could mimic to scale with Lucene (up and out). But it's not the only one, it depends on the use case, that's why Hibernate Search does not impose an architecture, that's why I prefer libraries over off-the-shelves products.

I would recommend this off-the-shelves solution?
DBSight or Solr (which I know better) are interesting solutions indeed, but not for the same kind of projects, or at least not for the same integration strategy. We are comparing a library versus a black box. BTW DBSight has a 3-minutes install demo. I could not beat them, it took me 15 mins on stage at JavaOne ( but I walk and talk a lot :) )
I have never been a big fan of black boxes nicely integrated in my IT system, but if I had to choose such a solution I would also give the Google Search Appliance a try, the Google Mini is fairly cheap.


Anyway, Hibernate Search has been developed with practical solutions for practical problems, not theoretical considerations. Giving it a shot is the only way to judge.
Damn long post, sorry about that :(


Got a MacBook (Pro), better consider this upgrade

2007-06-12  |   |  apple  

Last night my battery and my MacBook Pro decided not to talk to each other anymore. The battery was full but as soon as I unplugged the AC, the laptop shut down immediately, fairly useless ;-)

Have a look at Apple's support case MacBook Battery Update.

This patch apparently prevent the synchronization failure to happen, saving you a WFT moment and few hours of downtime.

I have been fairly impressed by the Genius bar support. It took me 1h to figure it out the problem in the morning (The Apple support website roughly described my problem), I made an appointment right away, another hour later I had my new battery operational (travel time included).

At least, they don't argue with you and don't ask if you know how to turn the power on. It's a big plus compared to other support organizations.


Hibernate Search freshly baked features

2007-06-06  |   |  hibernate search  

I had to release Hibernate Search Beta3 early after we discovered a fairly severe bug in Beta2. But I had time to inject some new features. After those introduced in Beta2, that a fairly good week :)

batch size limit on object indexing
If you don't pay attention when initially indexing (or reindexing) your data, you may face out of memory exceptions. The old solution was to execute indexing in several smaller transactions, but the code ended up being fairly complex. Here is the new solution:

hibernate.search.worker.batch_size=5000

int batchSize=5000;
//scroll will load objects as needed
ScrollableResults results = fullTextSession.createCriteria( Email.class )
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % batchSize == 0) s.clear(); //clear every batchSize
}
wrap that into one transaction and you are good to go.

Native Lucene
The APIs were never officially published (until beta3), but Hibernate Search lets you fall back to native Lucene when needed. All the needed APIs are held by SearchFactory.

DirectoryProvider provider = searchFactory.getDirectoryProvider(Order.class);
org.apache.lucene.store.Directory directory = provider.getDirectory();
This one is the brute force and gives you access to the Lucene Directory containing Orders. A smarter way, if you intend to execute a search query, is to use the ReaderProvider
DirectoryProvider clientProvider = searchFactory.getDirectoryProvider(Client.class);
IndexReader reader = searchFactory.getReaderProvider().openReader(clientProvider);

try {
//do read-only operations on the reader
}
finally {
readerProvider.closeReader(reader);
}
Smarter because you share the same IndexReaders as Hibernate Search, hence avoid the unnecessary IndexReader opening and warm up.

Finally you can optimize a Lucene Index (roughly a defragmentation)

SearchFactory searchFactory = fullTextSession.getSearchFactory();
searchFactory.optimize(Order.class);
//or searchFactory.optimize();


From a Bug blooms a thousand Features

2007-06-06  |   |  oss  

When a severe bug hits a product, you have to fix and release quickly (at least I feel I have to). But, especially in the beta phase, it's fairly humiliating to release with one single ticket resolution.

Call it pride, pair pressure, ego, unwillingness to face reality, teenager knee jerk, I just can't release a beta with one single lonely closed ticket.

This is what happened on Hibernate Search. Beta2 introduced a severe bug in object retrievals. So I ended up coding a few new features, fixing a few additional annoyances to hide the obvious.

That's one of the things I like in the Software as a Service model, transparent bug fixing, but that's another story.

Obviously, such aggressive release cycles can only work as long as a Product Manager don't look over your shoulder.

Who said bugs were a bad thing? ;-)


Lucene feedback from Atlassian JIRA and Confluence

2007-05-22  |   | 

Mike Cannon-Brookes from Atlassian has posted (some moons ago) two interesting presentations related to Lucene and Atlassian's feedbacks from Confluence and JIRA.

My favorite is the first one: Lucene: Generic Data Indexing. It's a nice introduction to the benefits of Full Text search engines, as well as the gotchas you will face.
I found their use of FilterQuery as a cross-cutting concern implementation for security fairly interesting.

The first presentation also quickly address some of the indexing strategies (synchronous / asynchronous) depending on the product requirements. Mike goes a bit deeper in the second one by describing some clustering solutions and the one they choose.

JIRA has gone very far in its use of Lucene, I am not sure I would have gone that far, but that's definitively a very interesting extreme use case, and very successful :)


Demo of JBoss Seam DVD Store powered by Hibernate Search

2007-05-16  |   |  hibernate search  

Many asked me if the DVD Store demo powered by Hibernate Search that I ran at JavaOne was available online.
The answer is not yet, but it will. My plan is to package it nicely and release it when the beta 2 of Hibernate Search is out.
This will hopefully happen fairly soon. JavaOne being behind us, I can focus back on the code base.

I had some very interesting discussions with some of you about Lucene and the features you need, it's good to see the community growing around the project.


Name: Emmanuel Bernard
Bio tags: French, Open Source actor, Hibernate, (No)SQL, JCP, JBoss, Snowboard, Economy
Employer: JBoss by Red Hat
Resume: LinkedIn
Team blog: in.relation.to
Personal blog: No relation to
Microblog: Twitter, Google+
Geoloc: Paris, France

Tags