February 21, 2012 This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover: Besides company-specific comments, the 2011/2012 Gartner Magic Quadrant for Business Intelligence (BI) Platforms offered observations on overall BI trends in a “Market Overview” section. I have mixed feelings about Gartner’s list. Business intelligence industry trends
Sumo Logic and UIs for text-oriented data February 6, 2012 I talked with the Sumo Logic folks for an hour Thursday. Highlights included: Sumo Logic does SaaS (Software as a Service) log management.Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as “Splunk-like”. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to branch out.)Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.Sumo Logic’s main differentiation is automated classification of events.
Comments on the 2012 Forrester Wave: Enterprise Hadoop Solutions
November 3, 2011 It’s time to circle back to a subject I skipped when I otherwise wrote about MarkLogic 5: MarkLogic’s new Hadoop connector. Most of what’s confusing about the MarkLogic Hadoop Connector lies in two pairs of options it presents you: MarkLogic’s Hadoop connector
October 3, 2011 Teradata is having its annual conference, Teradata Partners, at the same time as Oracle OpenWorld this week. That made it an easy decision for Teradata to preannounce its big news, Teradata Columnar and the rest of Teradata 14. But of course it held some stuff back, notably Teradata Unity, which is the name chosen for replication technology based on Teradata’s Xkoto acquisition. The core mission of Teradata Unity is asynchronous, near-real-time replication across Teradata systems. The point of “asynchronous” is performance. Teradata Unity and the idea of active-active data warehouse replication
Investigative analytics and derived data: Enzee Universe 2011 talk June 19, 2011 I’ll be speaking Monday, June 20 at IBM Netezza’s Enzee Universe conference. Thus, as is my custom: I’m posting draft slides.I’m encouraging comment (especially in the short time window before I have to actually give the talk).I’m offering links below to more detail on various subjects covered in the talk. The talk concept started out as “advanced analytics” (as opposed to fast query, a subject amply covered in the rest of any Netezza event), as a lunch break in what is otherwise a detailed “best practices” session. So I suggested we constrain the subject by focusing on a specific application area — customer acquisition and retention, something of importance to almost any enterprise, and which exploits most areas of analytic technology.
June 20, 2011 I’ve blogged separately that: And of course you know: Vertica (the product) is columnar, MPP, and fast.*Vertica (the company) was recently acquired by HP. The Vertica story (with soundbites!)
June 20, 2011 I’ve been confused about temporal data management for a while, because there are several different things going on. Date arithmetic. This of course has been around for a very long — er, for a very long time.Time-series-aware compression. Temporal data, time series, and imprecise predicates
Dirty data, stored dirt cheap June 4, 2011 A major driver of Hadoop adoption is the “big bit bucket” use case. Users take a whole lot of data, often machine-generated data in logs of different kinds, and dump it into one place, managed by Hadoop, at open-source pricing. Hadoop hardware doesn’t need to be that costly either. And once you get that data into Hadoop, there are a whole lot of things you can do with it. Of course, there are various outfits who’d like to sell you not-so-cheap bit buckets.
Hardware for Hadoop June 4, 2011 After suggesting that there’s little point to Hadoop appliances, it occurred to me to look into what kinds of hardware actually are used with Hadoop. So far as I can tell: Hadoop nodes today tend to run on fairly standard boxes.Hadoop nodes in the past have tended to run on boxes that were light with respect to RAM.The number of spindles per core on Hadoop node boxes is going up even as disks get bigger. A key input comes from Cloudera, who to my joy delegated the questions to Omer Trajman, who wrote: Most Hadoop deployments today use systems with dual socket and quad or hex cores (8 or 12 cores total, 16 or 24 hyper-threaded).
Why you would want an appliance — and when you wouldn’t June 2, 2011 Data warehouse appliances are booming. But Hadoop appliances are a non-starter. Data warehouse and other data management appliances are on the upswing. Oracle is pushing Exadata. Teradata* is going strong, and also recently bought Aster Data.
Object-oriented database management systems (OODBMS) May 21, 2011 There seems to be a fair amount of confusion about object-oriented database management systems (OODBMS). Let’s start with a working definition: An object-oriented database management system (OODBMS, but sometimes just called “object database”) is a DBMS that stores data in a logical model that is closely aligned with an application program’s object model.
Starcounter high-speed memory-centric object-oriented DBMS, coming soon May 18, 2011 Since posting recently about Starcounter, I’ve had the chance to actually talk with the company (twice). Hence I know more than before. Starcounter: Has been around as a company since 2006.Has developed memory-centric object-oriented DBMS technology that has been OEMed by a few application software companies (especially in bricks-and-mortar retailing and in online advertising).Is planning to actually launch an OODBMS product sometime this summer.Has 14 employees (most or all of whom are in Sweden, which is also where I think Starcounter’s current customers are centered).Is planning to shift emphasis soon to the US market.
Transparent sharding February 24, 2011 When databases are too big to manage via a single server, responsibility for them is spread among multiple servers. There are numerous names for this strategy, or versions of it — all of them at least somewhat problematic.
Revolution Analytics update April 8, 2011 I wasn’t too impressed when I spoke with Revolution Analytics at the time of its relaunch last year. But a conversation Thursday evening was much clearer. And I even learned some cool stuff about general predictive modeling trends (see the bottom of this post). Revolution Analytics business and business model highlights include:
PostgreSQL 8.4: Creating a Database The first test to see whether you can access the database server is to try to create a database. A running PostgreSQL server can manage many databases. Typically, a separate database is used for each project or for each user.
Architectural options for analytic database management systems
Analytic platforms defined
DataStax introduces a Cassandra-based Hadoop distribution called Brisk
Hadapt is launching