background preloader

Data Vault

Facebook Twitter

DV2 Sequences, Hash Keys, Business Keys – Candid Look. This entry is a candid look (technical, unbiased view) of the three alternative primary key options in a Data Vault 2.0 Model.

DV2 Sequences, Hash Keys, Business Keys – Candid Look

There are pros and cons to each selection. I hope you enjoy this factual entry. (C) Copyright 2018 Dan Linstedt all rights reserved, NO reprints allowed without express written permission from Dan Linstedt There are three main alternatives for selecting primary key values in a Data Vault 2.0 model: Sequence NumbersHash KeysBusiness Keys Sequence Numbers Sequence numbers have been around since the beginning of machines.

Upper limit (the size of the numeric field for non-decimal values)Introduce process issue when utilizing Sequences during load because they require any child entity to look up its corresponding parent record to inherit the parent value.Hold no business meaning Lookups also cause “pre-caching” problems under volume loads. It doesn’t matter if the technology is an ETL engine, real-time process engine, or SQL data management enabled engine. Is Data Vault becoming obsolete? – Roelant Vos. What value do we get from having an intermediate hyper-normalised layer?

Is Data Vault becoming obsolete? – Roelant Vos

Let me start by stating that a Data Warehouse is a necessary evil at the best of times. In the ideal world, there would be no need for it, as optimal governance and near real-time multidirectional data harmonisation would have created an environment where it is easy to retrieve information without any ambiguity across systems (including its history of changes). When a full history of changes is too much: implementing abstraction for Point-In-Time (PIT) and Dimension tables – Roelant Vos. When changes are just too many When you construct a Point-In-Time (PIT) table or Dimension from your Data Vault model, do you sometimes find yourself in the situation where there are too many change records present?

When a full history of changes is too much: implementing abstraction for Point-In-Time (PIT) and Dimension tables – Roelant Vos

This is because, in the standard Data Vault design, tiny variations when loading data may result in the creation of very small time slices when the various historised data sets (e.g. Satellites) are combined. There is such a thing as too much information, and this post explains ways to remediate this by applying various ‘time condensing’ mechanism. Applying these techniques can have a significant impact on the performance and ease-of-maintenance for PIT and Dimension tables, and is worth looking into. This post builds on some of the concepts that were covered in earlier posts. Roland Bouman's blog: Do we still need to talk about Data Vault 2.0 Hash keys? A few days ago, I ran into the article "Hash Keys In The Data Vault", published recently (2017-04-28) on the the Scalefree Company blog.

Roland Bouman's blog: Do we still need to talk about Data Vault 2.0 Hash keys?

Scalefree is a company, founded by Dan Linstedt and Michael Olschminke. Data Vault Modeling Increase Data Warehouse Agility. What is Data Vault?

Data Vault Modeling Increase Data Warehouse Agility

When talking about data modeling for data warehousing, most organizations implement either a dimensional (Ralph Kimball [1]) or normalized (Bill Inmon [2]) modeling techniques. Both approaches have been around for more than 20 years and have proven their practical use over time. In the last several years, however, market forces have made it imperative for business intelligence and analytics processes to become more agile. This trend comes with many challenges. One major issue is that dimensional and normalized data models are not built for rapid change. These types of problems have spurred interest in the Data Vault approach. “… It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The underlying idea of Data Vault is to separate the more stable part of entities (business keys and relations) from their descriptive attributes.

Figure 1. Hans Hultgrens Präsentationen auf SlideShare. Lean Data Warehouse via Data Vault. Thoughts on Data Vault vs. Star Schemas. I am back in Belgium to deliver some training and do a bit of consultancy.

Thoughts on Data Vault vs. Star Schemas

Since that leaves me at yet another hotel room, I might as well share some thoughts on something I have noticed over the past year or so and which I also noticed during my last visit here. Thus in continuation of my last post, I will share one more observation from the Data Warehouse automation event in Belgium: Data Vault is a hot topic in the Benelux region and was part of almost every presentation.

This is a distinct contrast to what I experience during my travels to the rest of Europe, USA, Canada, South Africa, Iceland and India where it is hard to find a customer, BI consultant or even anyone at a BI conference, that has ever heard of Data Vault. Data Vault Modeling & Methodology - Data Warehouse Architecture. DVA - Online Data Vault Training. #NoSQL, #bigdata, #datavault and Transaction Speed. Like many of you, I’ve been working in this industry for years.

#NoSQL, #bigdata, #datavault and Transaction Speed

Like you, I’ve seen and been a part of massive implementations in the real world where #bigdata has been a big part of that system. Of course, they’ve done this in a traditional RDBMS and not a NoSQL environment. There’s even been a case where Data Vault has been used to process big data with huge success. In this entry I will discuss the problem of big data, and discuss the RDBMS along side the NoSQL approaches. First, a little history please… Why is everyone up in arms about #bigdata? Well, I’m here to tell you that it may not be everything that media hype has cracked it up to be; and if you aren’t careful, you may just lose sight of your massive investments in relational data base systems.