Archive for the ‘databases’ Category

Open eGovernment Data

Monday, November 28th, 2011

The Open Source movement is moving into government data.  Governments are finding a new source of untapped economic stimulus with the mountains of data they collect.  The  data is collected for the ultimate good of the public but rarely shared because information access was too people intensive and expensive up until recently.  Things have changed.

GOV opendata1 300x168 Open eGovernment Data

ETALAB (France), data.gov.uk (UK), data.dc.gov (Washington, DC, US), whitehouse.gov/open (US), and countless other local and national governments have open their data coffers.  In the case of DC for instance, the cost of publishing the data was $50K for the city. The DC government expected it to spur the creation of a few new ventures, and a bit of private investments.  Instead, 50 startups were born and $3M invested.  There is a world of open data coming to the private software industry.

Open Government data is also going to be Big Data.  The size of data collected is by definition larger larger than traditional “enterprise data” for instance (especially at the national level).  The tools being developed for big data will solve some of the issues with access and real time analytics that exit with government data.  Exorbyte MatchMaker is one of these tools.  That’s why government agencies have already chosen MatchMaker for their search and data access challenges (2 national European census agencies, German Finance Ministry, and more).

Are you ready for open government data?  Any ideas what would make sense to build with this data?

Big Data Search

Monday, November 28th, 2011

Big Data1 300x225 Big Data SearchEvery economic cycle comes with its host of enterprise software trends.  Big Data hs become a recognized phenomenon in 2011.  In May 2011 McKinsey released the “Big data: The next frontier for innovation, competition, and productivity” report. It started with:  ”The amount of data in our world has been exploding and analyzing large data sets—so-called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus”.

IBM, Oracle, SAP, Microsoft, SalesForce.com, and others are all aiming their development efforts at Big Data (see vldb.org).  The amount of data produced, collected and stored by online activities to which companies, their customers, their partners, and their sales channels participate has grown enormously.  Tools are being developed that allow affordable long-term storage.  New columnar in-memory database formats have emerged that enable near-real-time analytics.  Fast growing stratups and open source solutions have also converged with their own new NoSQL formats (InfiniDB, LucidDB, InfoBright, Hadoop, NoSQL, etc.)     MatchMaker, Exoryte’s Universal Search platform, is the perfect answer to search within Big Data.

The challenges of search within big data are:

  • Searching Big Data though SQL queries is simply too slow and inflexible – fuzzy or advanced search requires a search indexer layer or  something different than traditional on-disk relational DB formats.
  • Indexing large databases can be long, disruptive to normal database operation and require complex hardware infrastructures.
  • Running complex queries and fuzzy logic requires so much calculation and lookups that new search strategies are required.

Exorbyte MatchMaker is made to address these challenges and our professional services team has proven repeatedly tht they can be addressed:  Allianz (the world’s 12th-largest financial services group),  German Finance Ministry, and more blue chip and government organizations tun o us each year for that very expertise.

What do you think of Big Data?

Marc Andreessen Wants Oracle Dead

Thursday, September 29th, 2011

Exorbyte doesn’t care but the feud between Oracle and the “cloud bunch” (Andreessen, Benioff of Salesfoce.com, and more) but we feel it has something else to teach us besides the low blows.

“Andreessen kicked off BoxWorks, the first-ever customer conference for cloud collaboration provider Box.net, this morning in San Francisco.

His firm invested in Box, he says, partly because he found that a lot of the other startups they were funding already used Box’s product.

As he put it, “Ten years ago, it was a joke: you’d raise $20 million in venture capital and write a $4 or $5 million check to Oracle, Sun, BEA, and EMC….When it started, Salesforce looked like a toy compared with Siebel. Look ahead five years later, it’s obviously better. Not a single one of our startups uses Oracle.”

Read more: Marc Andreessen: The “Clock Is Ticking” On Oracle - http://www.businessinsider.com/boxnet-2011-9#comment-4e84d76aecad04784000002d#ixzz1ZNOjydsW

Oracle is indeed under siege big time. But this cloud computing debacle is just one more front for them to fight on. SAP, their main rival in the DB business, has been taking effective swings at them regarding the lack of believe they have in in-memory databases (which are by-the-way the way forward for cloud computing also). Hasso Plattner from SAP has a field day here with this:

Our company sells SaaS site search for ecommerce and we also did away with Oracle from the start. Better yet, the vast majority of our customers are the same and this is a growing market!

Good luck Larry!

Autonomy’s Secrets Revealed

Thursday, September 29th, 2011

Wow, after the Microsoft / Fast $1 Billion debacle , we now have the Autonomy / HP / Oracle $10 Billion version of the same type of debates over company valuation when HP acquired Autonomy last month.  Sitting on the sidelines happily we get splattered with great intelligence on the reality of Autonomy’s claims of shareholder value.  Enjoy and send your comments!

BUSTED: Oracle Publishes Slide Deck To Prove Autonomy CEO Is Lying
Read more: http://www.businessinsider.com/oracle-hp-autonomy-2011-9#ixzz1ZNOVjCVb

9/29/2011 – 2:16PM  - I will just add that no matter what Autonomy and HP say to justify that acquisition, I just can’t see it why they ended up above $10B, especially after looking at these slides.  Good luck.

“Now we can start building business intelligence.”

Thursday, October 28th, 2010

I just read an excellent article from Avi Rappoport of www.SearchTools.com and it prompted some thoughts I want to share here.   Avi basically explains that BI (Business Intelligence) has grown alongside database technology using fairly traditional data access and storage technologies that are costly in terms of hardware and limited in performance and speed and agility (SQL, OLAP, etc.).  Inverted indices used by web and enterprise search engines like Google and all other web page or documents search engines are a great system for powering BI applications because they are more efficient and don’t require the resources that traditional database access technologies require.

What’s interesting here is that the inverted index is really made for full-text documents (web pages, Word, PDF, etc.) not for highly structured relational database tabular formats.  An inverted index is really a way of structuring full-text in a way that makes it easier to sift through (especially for enormous document repositories like the web).  It’s also designed with a specific purpose: to look for documents by the words they contain (keywords).  The simple inverted index contains essentially a long table with keywords in column 1 and URL of web pages or document network locations in column 2.  Because the search logic of the search engine software doesn’t require taking in account the relationships of these words in the original text  (semantic relationships beyond the words themselves) these relationships don’t need to be represented in the index.  The inverted index is single-purpose.  The search engine software then just needs to scan the tables of keywords to start ranking relevant documents (search results).  Of course there are many more things in modern search engines (phrases not just keywords, word proximity, document text cashed for display, pre-processing and normalization of queries, etc. etc.).  But the essence of their architecture is still that a simpler, more structured, single-purpose inverted version of the content to search will allow the search engine to scan millions or billions of documents in milliseconds.

BI software can indeed use a search engine to look for relevant data points such as “gas pedal problem” on the web if you were with Toyota crisis management team a few months ago.   But in many cases, BI will be sifting through data that is actually already structured (CRM, Accounting Transactions, etc.) and in a database.  In these cases, the indices used are still inverted in their own way but they are not much different than the original data in the database.  They just more compact, easier to access, and contain only the data fields you need to search by and those the search engine needs to display to the end user before the final data is accessed in the database.

The NoSQL movement or the Hasso Plattner Institute are out to prove that new databases will be faster or that database indexers will become the new databases because they are faster.  In the video below, Hasso Platner explains why an in-memory database makes sense and is maybe the future form that large databases will take.

Software like Exorbyte MatchMaker also create an in-memory index which allows our applications and our customers’ applications to run queries which would never be possible with a regular database (too slow, too complex for the database software).  Levenshtein, advanced multi-stage queries with phonetic, algorithmic, geometric or semantic logic like the ones we run for new SaaS-based Exorbyte Commerce require this sort of technology.  This is a small revolution for the software industry and large one for the data access technology industries (Business Intelligence, Search Engines, Databases, Master Data Management, data storage, etc.)  which affects all users of databases who need fast, flexible and frequent access to billions of data records.  As Avi says in her article:  ”Because there will never be less data.”.

As conclusion, I’ll just say that these facts above are probably why Hasso Plattner (Founder of SAP) said recently “Now we can start building business intelligence” (see the short video).