What’s that supposed to be, I have been asked a couple of times. Sphinx is the solution to a very ugly and rather common problem.
You want one entry form a database, selected by a unique ID number. In most cases you have an index, probably even a primary index that will find your record in no time. That is the ideal case though. Now you make lists with many entries, where you already have a much larger set of data to wrangle and again indexes can be your friend, but are less well performing as when you could select them by ID. And worst of all is full text searching. You want all entries that contain a text in multiple fields. Maybe even partial matches and so on. Doing this with just a normal query involves things as LIKE and probably under MySQL the % sign. If you don’t have many entries this might even work well enough. Queue up large databases and you are completely out of luck.
Sphinx helps you with these scenarios.
In Sphinx you run a indexer program that goes into the database and generates it’s own index for words and even partial words. Depending on the configuration you can teach it to not do partial matches, even match in the middle of the words or treat similar words as one and the same word (both on indexing and on search queries). Switching to Sphinx is very simple. You have to install Sphinx on your server (which is very easy), setup the configuration file to reflect your database and search preferences and then just run the indexer script periodically (depending on your content). Inside your application you change the path that leads to those LIKE queries to ask Sphinx over the supplied API. Now Sphinx does not return the results from the database, but gives you a list of id’s you can then use to query the database. Now you can run just a fast fetch for those id’s and you have your result set.
If you just search for all entries from category X or user Z then you are still better off just using the usual ways. But you need full text searching, then Sphinx is your perfect solution.
And now the latest version is out. There are a lot of bugs squished in it, but most not very important, which goes to show how well it has worked now for a long while already. My oldest installation runs for over a year now without a glitch on the Sphinx end.
Currently Sphinx needs to rebuild the whole index each time you want it rebuilt, or use something called delta indexing, which is more a stopgap measure. A real-time updating index is on the menu for the next version though and will make this extremely powerful tool even better.
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
You must be logged in to post a comment.