background preloader

Data

Facebook Twitter

Insight Data Engineering Ecosystem: An Interactive Map. Enterprise service bus. All customer services communicate in the same way with the ESB: the ESB translates a message to the correct message type and sends the message to the correct consumer service.

Enterprise service bus

An enterprise service bus (ESB) implements a communication system between mutually interacting software applications in a service-oriented architecture (SOA). It represents a software architecture for distributed computing, and is a special variant of the more general client-server model, wherein any application may behave as server or client.

ESB promotes agility and flexibility with regard to high-level protocol communication between applications. Its primary use is in enterprise application integration (EAI) of heterogeneous and complex service landscapes. Architecture[edit] No global standards exist for enterprise service bus concepts or implementations.[1] Most providers of message-oriented middleware have adopted the enterprise service bus concept as de facto standard for a service-oriented architecture. Extract, transform, load. In computing, extract, transform, load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s).

Extract, transform, load

Data lake. A data lake is a large storage repository and processing engine, they provide "massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs".[1] The term was coined by James Dixon, Pentaho chief technology officer.[2] Dixon used the term initially to contrast with "data mart", which is a smaller repository of interesting attributes extracted from the raw data.

Data lake

He wrote: "If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples. " [3] Dixon argued that data marts have several inherent problems, and that data lakes are the optimal solution. Operational Intelligence, Log Management, Application Management, Enterprise Security and Compliance. Data warehouse. Data Warehouse Overview.

Data warehouse

Data mart. What is a Data Management Platform, or DMP? This is the latest in a series of articles that explains, in plain English, new technology tools and platforms that are changing the face of digital media.

What is a Data Management Platform, or DMP?

Our first entry covered DSPs. To suggest new entries, please email me at the address below. Data now informs almost all aspects of digital media, and data management platforms have emerged to help marketers, publishers and other businesses make sense of it all. Spark for Data Padawans Episode 1: a look at distributed data storage. If you've been anywhere near data in the past year or so you must have heard about the war going on between Spark and Hadoop for total control over the management of large amounts of data.

Spark for Data Padawans Episode 1: a look at distributed data storage

We have a big announcement coming at Dataiku about Spark, so ever since I started working that word has been popping up every day and I kept wondering what it could mean. I’ve already written about my limited technical background before arriving at Dataiku. Luckily, in the past month, I’ve had the opportunity to speak to all of our brilliant data scientists and developers, as well as a couple of data experts. Spark for Data Padawans Episode 2: Spark vs Hadoop? The cat is out of the bag, Data Science Studio now integrates with Spark!

Spark for Data Padawans Episode 2: Spark vs Hadoop?

It's the perfect moment (I know, crazy good timing right!) For me to continue my presentation of Spark for super beginners with episode 2: the birth of Spark and how it compares to Hadoop. As a reminder, this is episode 2 of my investigation into what the heck Spark is. These are the other episodes, including the upcoming episodes 2 and 3: Spark for Data Padawans Episode 3: Spark vs MapReduce. After learning about Hadoop and distributed data storage, and what exactly Spark is in the previous episodes, it's time to dig a little deaper to understand why even if Spark is great, it isn't necessarily a miracle solution to all your data processing issues.

Spark for Data Padawans Episode 3: Spark vs MapReduce

It's time for Spark for super beginners episode 3! As always, I try to keep these articles as easy to understand as possible, but if you really are a super data padawan you probably need to have a quick look at episode 1 and episode 2 to understand what I'm talking about. You can always go back to a previous episode later: After reading episode 1 and episode 2, Spark seems pretty great. You’re probably thinking that it can only replace MapReduce and any other system out there since it can: