FULL TEXT Searching In MySQL
Published: 08/17/2009
Programming, Code
I’ve been wanting to add a search feature to wp-click-track for a while now. After almost a year of writing 150 posts I have a lot of links; so many in fact that it’s getting a little tough finding the links I want when I want them. I’ve written a few search engines in my time (some actually pretty good and some that just plain sucked) and thought this would be a simple addition to the program.
Boy, was I right.
Real quick; I’m going to be referencing a MySQL table below. Here’s the SQL to create that table:
CREATE TABLE tracking_links ( `link_id` int(10) NOT NULL auto_increment, `link_title` varchar(255) NOT NULL default '', `link_desc` text NOT NULL, `link_destination` varchar(255) NOT NULL default '', `link_total_clicks` int(10) NOT NULL default '0', `link_unique_clicks` int(10) NOT NULL default '0', `creation_date` datetime NOT NULL default '0000-00-00 00:00:00', `last_modified` datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (`link_id`) )
The traditional, no frills, way easy path, would be to build a basic “LIKE” query and attach it to a single text field. The queries would then look like this:
SELECT * FROM tracking_links WHERE link_title LIKE '%$_search%'
If you don’t already know; the above query would match all the links with a title that has the string $_search anywhere within the column.The percent character (%) is a wild card; sandwiching $_search like it is basically says “Grab me everything that has this string anywhere within the column”.
The above is the lowest of the low when it comes to elegance of design and sheer awesomeness though. It’s pretty much the amateur version of a search mechanism.
For one thing performance is going to be an issue when dealing with a lot of rows in the table. The speed on a query like the above, even with an index on the “link_title” column, is going to crumble onceĀ you get more than a couple hundred thousand rows because the MySQL has to do a full table scan.
Another issue is that relevancy is completely absent. There’s no way to remove elements or, really, do any boolean matching. You’re going to get results that just seem… off.
If I want to search against multiple columns the SQL just gets horrifying:
SELECT * FROM tracking_links WHERE link_title LIKE '%$_search1%' OR link_destination LIKE '%$_search2% OR…'
And then there’s the accompanying form for a query like the above. Traditionally, it would be x form fields for each column; in the case of wp-click-track that would be 3 individual fields. This would pretty much destroy the plugin interface or, worse, require a separate, dedicated, search page.
The answer to those problems is full text search. According to Wikipedia :
In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.
This is a difficult concept to grasp, especially for someone coming from the LIKE camp; I know it took me a little bit to get it fully. Here are a couple notes that should help:
- String to search must be longer than 3 characters long.
- DB Type must be MyISAM
- Don’t try for individual search fields
ex: SELECT * FROM table WHERE col1 = ‘$col1’ AND col2 = ‘$col2’
instead
Have all fields setup as a single Fulltext field.
ex: SELECT table.* , MATCH (col1, col2, col3) AGAINST (’+( $text1) -($text2)’) AS Score FROM table WHERE MATCH (col1, col2, col3) AGAINST (’+( $text1) -($text2)’ IN BOOLEAN MODE) - IN BOOLEAN MODE is required for queries with the ‘-’ attribute.
- Algorithm for relevance is:
w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)
Using fulltext searching allows your site users to do Google style searching like so:
+php -mysql +sqlite +temp*
The above search query breaks down to “Grab everything with “php”, without “mysql” but with “sqlite” and any word that begins with “temp”. Pretty snazzy and useful if your users are savvy in how to search effectively. Hell, even without knowing how to perform boolean searches the above is still useful and makes your code way cleaner.
In order to implement fulltext searching the table has to be setup with the MyISAM engine and a fulltext index has to be defined. To change a table to use the MyISAM engine just run the below query:
ALTER TABLE tracking_links ENGINE = MYISAM
To add a fulltext index you create it like so:
ALTER TABLE tracking_links ADD FULLTEXT search_index (link_title,link_desc,link_destination)
The complete fulltext search query for the above query (+php -mysql +sqlite +temp*) example would look like:
SELECT * FROM tracking_links MATCH(link_title,link_desc,link_destination) AGAINST('+php -mysql +sqlite +temp*' IN BOOLEAN MODE);
There’s one other feature worth mentioning; the Score option. The Score
SELECT *, MATCH(link_title,link_desc,link_destination) AGAINST('+php -mysql +sqlite +temp*' IN BOOLEAN MODE) AS SCORE FROM tracking_links MATCH(link_title,link_desc,link_destination) AGAINST('+php -mysql +sqlite +temp*' IN BOOLEAN MODE);
The above query will return all the rows as well as the relevance of the column to the dataset. On initial inspection the number won’t make much sense; it’s usually something like 0.9843949302 and 2.9408509384 which is, admittedly, pretty useless by itself. But, using a simple formula you can turn it into something pretty relevant:
Using the above code against a SCORE of .9823475 results in a $score of 98.2% which is a simple way to let your users know the result is relevant.
As the above, hopefully, demonstrates FULL TEXT searching with MySQL is easy and adds an extra layer of awesome and fun to an application.