background preloader


Facebook Twitter

Business intelligence industry trends. February 21, 2012 This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms.

Business intelligence industry trends

The four posts in the series cover: Besides company-specific comments, the 2011/2012 Gartner Magic Quadrant for Business Intelligence (BI) Platforms offered observations on overall BI trends in a “Market Overview” section. I have mixed feelings about Gartner’s list. Sumo Logic and UIs for text-oriented data. February 6, 2012 I talked with the Sumo Logic folks for an hour Thursday.

Sumo Logic and UIs for text-oriented data

Highlights included: Sumo Logic does SaaS (Software as a Service) log management.Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as “Splunk-like”. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to branch out.)Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.Sumo Logic’s main differentiation is automated classification of events.

What interests me about Sumo Logic is that automated classification story. It’s largely unsupervised machine learning.It’s specific to a particular user/data set.It can be up and running and classifying things effectively almost instantly (i.e., on seconds’ or minutes’ worth of data).It’s informed by what different users tag as false positives. The payoff is that machine learning directly informs the Sumo Logic user interface.

Comments. Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions. MarkLogic’s Hadoop connector. November 3, 2011.

MarkLogic’s Hadoop connector

Teradata Unity and the idea of active-active data warehouse replication. October 3, 2011 Teradata is having its annual conference, Teradata Partners, at the same time as Oracle OpenWorld this week.

Teradata Unity and the idea of active-active data warehouse replication

That made it an easy decision for Teradata to preannounce its big news, Teradata Columnar and the rest of Teradata 14. But of course it held some stuff back, notably Teradata Unity, which is the name chosen for replication technology based on Teradata’s Xkoto acquisition. The core mission of Teradata Unity is asynchronous, near-real-time replication across Teradata systems. The point of “asynchronous” is performance. Investigative analytics and derived data: Enzee Universe 2011 talk. The Vertica story (with soundbites!) June 20, 2011 I’ve blogged separately that: And of course you know: Vertica (the product) is columnar, MPP, and fast.

The Vertica story (with soundbites!)

*Vertica (the company) was recently acquired by HP. *Similar things seem true of ParAccel, but most of the other serious columnar analytic DBMS aren’t actually MPP (Massively Parallel Processing) yet. ** Vertica says it has a “staggering” pipeline now that it’s been with HP for a few months. Temporal data, time series, and imprecise predicates. Dirty data, stored dirt cheap. June 4, 2011 A major driver of Hadoop adoption is the “big bit bucket” use case.

Dirty data, stored dirt cheap

Users take a whole lot of data, often machine-generated data in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing. Hadoop hardware doesn’t need to be that costly either. And once you get that data into Hadoop, there are a whole lot of things you can do with it. Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. So the question arises — why would you want to spend serious money to look after your low-value data? For example, I was told of one big bank that was pulling 5 GB of logs every half hour into Splunk (selected for performance), or at least planning to.

Hardware for Hadoop. June 4, 2011 After suggesting that there’s little point to Hadoop appliances, it occurred to me to look into what kinds of hardware actually are used with Hadoop.

Hardware for Hadoop

So far as I can tell: Hadoop nodes today tend to run on fairly standard boxes.Hadoop nodes in the past have tended to run on boxes that were light with respect to RAM.The number of spindles per core on Hadoop node boxes is going up even as disks get bigger. A key input comes from Cloudera, who to my joy delegated the questions to Omer Trajman, who wrote:

Why you would want an appliance — and when you wouldn’t. June 2, 2011 Data warehouse appliances are booming.

Why you would want an appliance — and when you wouldn’t

But Hadoop appliances are a non-starter. Data warehouse and other data management appliances are on the upswing. Oracle is pushing Exadata. Object-oriented database management systems (OODBMS) May 21, 2011.

Object-oriented database management systems (OODBMS)

Starcounter high-speed memory-centric object-oriented DBMS, coming soon. Transparent sharding. February 24, 2011 When databases are too big to manage via a single server, responsibility for them is spread among multiple servers.

Transparent sharding

Revolution Analytics update. April 8, 2011 I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post). Revolution Analytics business and business model highlights include: Revolution Analytics is an open-core vendor built around the R language. Revolution Analytics’ top market sector by far appears to be financial services, both in trading/investment banks/hedge funds and in credit cards/risk analysis.

PostgreSQL 8.4: Creating a Database. The first test to see whether you can access the database server is to try to create a database. A running PostgreSQL server can manage many databases. Typically, a separate database is used for each project or for each user. Possibly, your site administrator has already created a database for your use. He should have told you what the name of your database is. In that case you can omit this step and skip ahead to the next section. VoltDB. eXtremeDB. History[edit] Later editions targeted the high performance non-embedded software market, including capital markets applications (algorithmic trading, order matching engines) and real-time caching for Web-based applications, including social networks and e-commerce.

Features added to support this focus include a SQL ODBC and JDBC interfaces, 64-bit support, and multiversion concurrency control (MVCC) transaction management.[4] Product features[edit] Core eXtremeDB engine[edit] eXtremeDB supports the following features across its product family.[5] In-process architecture[edit] eXtremeDB runs in-process with an application, rather than as a database server that is separate from client processes.

Application programming interfaces[edit] PostgreSQL: The world's most advanced open source database. NOSQL Databases. Architectural options for analytic database management systems. January 18, 2011 Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS. Let’s expand the conversation to cover desirable architectural characteristics of analytic DBMS in general. But first, a few housekeeping notes: This is a very long post.Even so, to keep it somewhat manageable, I’ve cut corners on completeness. Analytic platforms defined. February 24, 2011 A few weeks ago, I described the elements of an “analytic computing system” or “analytic platform,” while reserving judgment as to which of the two terms would or should win out.

DataStax introduces a Cassandra-based Hadoop distribution called Brisk. Hadapt is launching. March 23, 2011 The HadoopDB company Hadapt is finally launching, based on the HadoopDB project, albeit with code rewritten from scratch. As you may recall, the core idea of HadoopDB is to put a DBMS on every node, and use MapReduce to talk to the whole database. The idea is to get the same SQL/MapReduce integration as you get if you use Hive, but with much better performance* and perhaps somewhat better SQL functionality.** Advantages vs. a DBMS-based analytic platform that includes MapReduce — e.g.