The column-store pioneer. Search results CWI Repository. Star Schema Bechmark: InfoBright, InfiniDB and LucidDB. Www.cs.umb.edu/~poneil/StarSchemaB.PDF. H - Homepage. Summary The TPC Benchmark™H (TPC-H) is a decision support benchmark.
It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. More Information about Pricing. Www.tpc.org/tpch/spec/tpch2.7.0.pdf. Analyzing air traffic performance with InfoBright and MonetDB.
Accidentally me and Baron played with InfoBright (see this week.
And following Baron’s example I also run the same load against MonetDB. Reading comments to Baron’s post I tied to load the same data to LucidDB, but I was not successful in this. I tried to analyze a bigger dataset and I took public available data about USA domestic flights with information about flight length and delays.
The data is available from 1988 to 2009 in chunks per month, so I downloaded 252 files (for 1988-2008 years) with size from 170MB to 300MB each. In total raw data is about 55GB. Select avg(c1) from (select year,month,count(*) as c1 from ontime group by YEAR,month) t; for InfoBright and with t as (select yeard,monthd,count(*) as c1 from ontime group by YEARD,monthd) select AVG(c1) FROM t for MonetDB. Few words about environment: server Dell SC1425, with 4GB of RAM and Dual Intel(R) Xeon(TM) CPU 3.40GHz. The table I loaded data is: Last fields starting with “Div*” are not really used.
Load procedure: Quick comparison of MyISAM, Infobright, and MonetDB. Recently I was doing a little work for a client who has MyISAM tables with many columns (the same one Peter wrote about recently).
The client’s performance is suffering in part because of the number of columns, which is over 200. The queries are generally pretty simple (sums of columns), but they’re ad-hoc (can access any columns) and it seems tailor-made for a column-oriented database. I decided it was time to actually give Infobright a try. They have an open-source community edition, which is crippled but not enough to matter for this test. The “Knowledge Grid” architecture seems ideal for the types of queries the client runs. What follows is not a realistic benchmark, it’s not scientific, it’s just some quick and dirty tinkering. The first thing I tried doing was loading the data with SQL statements. The tests I loaded 1 million rows into the table. MyISAM took 88 seconds, MonetDB took 200, and Infobright took 486. MyISAM is 787MB, MonetDB is 791MB, and Infobright is 317MB.