I just read an excellent article from Avi Rappoport of www.SearchTools.com and it prompted some thoughts I want to share here. Avi basically explains that BI (Business Intelligence) has grown alongside database technology using fairly traditional data access and storage technologies that are costly in terms of hardware and limited in performance and speed and agility (SQL, OLAP, etc.). Inverted indices used by web and enterprise search engines like Google and all other web page or documents search engines are a great system for powering BI applications because they are more efficient and don’t require the resources that traditional database access technologies require.
What’s interesting here is that the inverted index is really made for full-text documents (web pages, Word, PDF, etc.) not for highly structured relational database tabular formats. An inverted index is really a way of structuring full-text in a way that makes it easier to sift through (especially for enormous document repositories like the web). It’s also designed with a specific purpose: to look for documents by the words they contain (keywords). The simple inverted index contains essentially a long table with keywords in column 1 and URL of web pages or document network locations in column 2. Because the search logic of the search engine software doesn’t require taking in account the relationships of these words in the original text (semantic relationships beyond the words themselves) these relationships don’t need to be represented in the index. The inverted index is single-purpose. The search engine software then just needs to scan the tables of keywords to start ranking relevant documents (search results). Of course there are many more things in modern search engines (phrases not just keywords, word proximity, document text cashed for display, pre-processing and normalization of queries, etc. etc.). But the essence of their architecture is still that a simpler, more structured, single-purpose inverted version of the content to search will allow the search engine to scan millions or billions of documents in milliseconds.
BI software can indeed use a search engine to look for relevant data points such as “gas pedal problem” on the web if you were with Toyota crisis management team a few months ago. But in many cases, BI will be sifting through data that is actually already structured (CRM, Accounting Transactions, etc.) and in a database. In these cases, the indices used are still inverted in their own way but they are not much different than the original data in the database. They just more compact, easier to access, and contain only the data fields you need to search by and those the search engine needs to display to the end user before the final data is accessed in the database.
The NoSQL movement or the Hasso Plattner Institute are out to prove that new databases will be faster or that database indexers will become the new databases because they are faster. In the video below, Hasso Platner explains why an in-memory database makes sense and is maybe the future form that large databases will take.
Software like Exorbyte MatchMaker also create an in-memory index which allows our applications and our customers’ applications to run queries which would never be possible with a regular database (too slow, too complex for the database software). Levenshtein, advanced multi-stage queries with phonetic, algorithmic, geometric or semantic logic like the ones we run for new SaaS-based Exorbyte Commerce require this sort of technology. This is a small revolution for the software industry and large one for the data access technology industries (Business Intelligence, Search Engines, Databases, Master Data Management, data storage, etc.) which affects all users of databases who need fast, flexible and frequent access to billions of data records. As Avi says in her article: ”Because there will never be less data.”.
As conclusion, I’ll just say that these facts above are probably why Hasso Plattner (Founder of SAP) said recently “Now we can start building business intelligence” (see the short video).