The Mohawk Software Full Text Search System

Many data storage systems require the ability to locate particular records fast. Oracle uses bitmap indexes on fields and many other vendors have implemented similar proprietary systems as well. Proprietary features like Oracle's bitmap indexes are not standard SQL, so if you are not using Oracle you can't use them. If you do use them in Oracle, you are tied to Oracle as a solution for the life of your codebase. Furthermore, they index bitmaps on a single field and it is a more complex query to search on multiple fields.

While Proprietary systems often try to tie you into their database schema, the Mohawk Software Full Text Search System (FTSS) is designed around generic data store concepts and will work with any SQL database or even a flat file data store. It is really simple, you specify the data you want indexed, and the FTSS indexer creates an index file which provides relational access to database records.

What makes FTSS really fast is the algorithms used to maintain the index. The current example of the music lookup database is running on a standard desktop machine running Linux, using the PostgreSQL database. This is a far cry from what is required for any system currently on the market which can compete on performance! More over, it is unoptimized SQL! Each music lookup performs a two table join to produce the output.

FTSS can be used to improve data search times, and/or reduce TCO of a data search system.

Using FTSS is simple, you specify two SQL queries. The first SQL query returns the number of records to be indexed. The next query is used to collect the records for indexing. After running the FTSS indexer to create the index, you run the FTSS server to answer full text search queries.

FTSS has many options, you can create an index of many tables, or just one. You can create a "combined" index of multiple tables while still having individual tables. You can even run the FTSS search daemon on a separate machine from the database for improved scalability. Text manipulation is also available with two types of metaphone algorithms, one which provides very fuzzy matches, and one which provides a more strict matching. Since FTSS is based on a library, the options are almost unlimited, and it can be extended easily.

You don't even need to have FTSS present SQL data, you can have it simply return filenames or keys for use in whatever "front-end" system you wish. Any program which can make TCP/IP socket calls, can interface with FTSS. The FTSS server program is an easy to use client/server system in which a TCP/IP socket is opened, a string of search terms are sent, and a stream on information is returned.

The FreeDB Music Search bar is an example using FTSS, PostgreSQL, PHP, and Apache on Linux. The PHP code for the search results page can be viewed in example. (You may need to download this file and open it in a text editor because the line separators do not always work with Internet Explorer.)

FTSS is built using Mohawk Software's Project Phoenix library. So, it is easy to customize the behavior of FTSS to suit your needs.

The FTSS will be available 2nd quarter 2001, and will support Linux, FreeBSD, Solaris, and Windows NT.

For more information contact Mohawk Software