Course on inverted index

less than 1 minute read

I gave a three hours course on inverted index to students from Telecom SudParis an engineering school here in… Paris :) It was fun to refresh my knowledge on all the fundamental structures that make Lucene what it is.

I covered quite some ground for this three hours course (a bit to much to be honest). Amongst other things: b-tree, inverted index, how analyzers and filters do most of the magic (synonym, n-gram, phonetic approximation, stemming, etc.), how fuzzy search work in Lucene (state machine based), scoring, log-structured merge and the actual physical representation of a Lucene index and a few of the tricks the Lucene developers came up with. My list of reference link is pretty rich too.

Without further ado, here is the presentation. I tend to be sparse on my slides so make sure to press s to see the speaker notes. The presentation is released under Creative Commons and sources are on GitHub.

It is a first revision and can definitely benefit from a few improvements but there is only so much time per day :)