Validating fuzzy logic values

Run a process on the original table containing the column to be fuzzy searched.

This process will extract every individual word from the original column and write these words to the word table along with the original key.

This will help confirm the probability of your data being accurately matched.

validating fuzzy logic values-64

Once this is in place, we take any user input and search using normal word = input or LIKE input%.

We never do a LIKE %input as we are always looking for a match on any of the first 3 characters, which are all indexed.

Sometimes it can generate the same code for two really different words.

Double metaphone was created to help take care of that problem.

Another algorithm was created called the Metaphone, and it was later revised to a Double Metaphone algorithm.

I have personally used the java apache commons implementation of double metaphone and it is customizable and accurate.

Before we built Match2Lists.com, we used to spend an unhealthy amount of time validating fuzzy matches.

In Match2Lists we incorporated a powerful Visualisation tool enabling us to review non-exact matches, this proved to be a real game changer in terms of match validation, reducing our costs and enabling us to deliver results much more quickly. Here's a link to the php discussion of the soundex functions in mysql and php.

I'd start from there, then expand into your other not-so-well-defined requirements. It's more appropriate for measuring the difference between two known words, not for searching. It discusses a solution designed more to detect things like proofing errors (using "Levenshtien" for "Levenshtein") rather than spelling errors (where the user doesn't know how to spell, say "Levenshtein" and types in "Levinstein".

Tags: , ,