background preloader


Facebook Twitter

Business intelligence industry trends. February 21, 2012 This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms.

Business intelligence industry trends

The four posts in the series cover: Besides company-specific comments, the 2011/2012 Gartner Magic Quadrant for Business Intelligence (BI) Platforms offered observations on overall BI trends in a “Market Overview” section. Sumo Logic and UIs for text-oriented data. February 6, 2012 I talked with the Sumo Logic folks for an hour Thursday.

Sumo Logic and UIs for text-oriented data

Highlights included: Sumo Logic does SaaS (Software as a Service) log management.Sumo Logic is text indexing/Lucene-based. Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions. MarkLogic’s Hadoop connector. November 3, 2011 It’s time to circle back to a subject I skipped when I otherwise wrote about MarkLogic 5: MarkLogic’s new Hadoop connector.

MarkLogic’s Hadoop connector

Most of what’s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you: Hadoop can talk XQuery to MarkLogic. Teradata Unity and the idea of active-active data warehouse replication. October 3, 2011 Teradata is having its annual conference, Teradata Partners, at the same time as Oracle OpenWorld this week.

Teradata Unity and the idea of active-active data warehouse replication

That made it an easy decision for Teradata to preannounce its big news, Teradata Columnar and the rest of Teradata 14. Investigative analytics and derived data: Enzee Universe 2011 talk. June 19, 2011 I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference.

Investigative analytics and derived data: Enzee Universe 2011 talk

Thus, as is my custom: I’m posting draft slides.I’m encouraging comment (especially in the short time window before I have to actually give the talk).I’m offering links below to more detail on various subjects covered in the talk. The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session.

So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology. The Vertica story (with soundbites!) June 20, 2011 I’ve blogged separately that: And of course you know: Vertica (the product) is columnar, MPP, and fast.

The Vertica story (with soundbites!)

*Vertica (the company) was recently acquired by HP. *Similar things seem true of ParAccel, but most of the other serious columnar analytic DBMS aren’t actually MPP (Massively Parallel Processing) yet. Temporal data, time series, and imprecise predicates. June 20, 2011 I’ve been confused about temporal data management for a while, because there are several different things going on.

Temporal data, time series, and imprecise predicates

Date arithmetic. This of course has been around for a very long — er, for a very long time.Time-series-aware compression. This has been around for quite a while too. “Time travel”/snapshotting — preserving the state of the database at previous points in time. In essence, the point of time series/event series SQL functionality is to do SQL against incomplete, imprecise, or derived data.* For example, suppose in one time series events happen at times 3.00, 3.01, 3.03, and 3.05; in another time series events happen at times 3.00, 3.02, 3.03, 3.04, and 3.05; and you want to join the time series together.

*This is a limited counterexample to my dictum that you should explicitly store derived data because it’s too much trouble to keep re-deriving it on the fly. Dirty data, stored dirt cheap. June 4, 2011 A major driver of Hadoop adoption is the “big bit bucket” use case.

Dirty data, stored dirt cheap

Users take a whole lot of data, often machine-generated data in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing. Hadoop hardware doesn’t need to be that costly either. And once you get that data into Hadoop, there are a whole lot of things you can do with it. Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets. Hardware for Hadoop. June 4, 2011 After suggesting that there’s little point to Hadoop appliances, it occurred to me to look into what kinds of hardware actually are used with Hadoop.

Hardware for Hadoop

So far as I can tell: Hadoop nodes today tend to run on fairly standard boxes.Hadoop nodes in the past have tended to run on boxes that were light with respect to RAM.The number of spindles per core on Hadoop node boxes is going up even as disks get bigger. A key input comes from Cloudera, who to my joy delegated the questions to Omer Trajman, who wrote: Most Hadoop deployments today use systems with dual socket and quad or hex cores (8 or 12 cores total, 16 or 24 hyper-threaded). Bullet points from that year-ago link include:

Why you would want an appliance — and when you wouldn’t. June 2, 2011 Data warehouse appliances are booming.

Why you would want an appliance — and when you wouldn’t

But Hadoop appliances are a non-starter. Object-oriented database management systems (OODBMS) May 21, 2011 There seems to be a fair amount of confusion about object-oriented database management systems (OODBMS). Starcounter high-speed memory-centric object-oriented DBMS, coming soon. May 18, 2011 Since posting recently about Starcounter, I’ve had the chance to actually talk with the company (twice). Hence I know more than before. Starcounter: Has been around as a company since 2006.Has developed memory-centric object-oriented DBMS technology that has been OEMed by a few application software companies (especially in bricks-and-mortar retailing and in online advertising).Is planning to actually launch an OODBMS product sometime this summer.Has 14 employees (most or all of whom are in Sweden, which is also where I think Starcounter’s current customers are centered).Is planning to shift emphasis soon to the US market.

Starcounter’s value propositions are programming ease (no object/relational impedance mismatch) and performance. The key technical aspect to Starcounter is integration between the DBMS and the virtual machine, so that the same copy of the data is accessed by both the DBMS and the application program, without any movement or transformation being needed. Transparent sharding. February 24, 2011 When databases are too big to manage via a single server, responsibility for them is spread among multiple servers. There are numerous names for this strategy, or versions of it — all of them at least somewhat problematic. Revolution Analytics update. April 8, 2011 I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer.

And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post). Revolution Analytics business and business model highlights include: Revolution Analytics is an open-core vendor built around the R language. Revolution Analytics’ top market sector by far appears to be financial services, both in trading/investment banks/hedge funds and in credit cards/risk analysis. PostgreSQL 8.4: Creating a Database. The first test to see whether you can access the database server is to try to create a database. A running PostgreSQL server can manage many databases. VoltDB. eXtremeDB. History[edit] PostgreSQL: The world's most advanced open source database. NOSQL Databases. Architectural options for analytic database management systems. January 18, 2011 Mike Stonebraker recently kicked off some discussion about desirable architectural features of a columnar analytic DBMS.

Analytic platforms defined. February 24, 2011 A few weeks ago, I described the elements of an “analytic computing system” or “analytic platform,” while reserving judgment as to which of the two terms would or should win out. DataStax introduces a Cassandra-based Hadoop distribution called Brisk. March 23, 2011. Hadapt is launching. March 23, 2011.