Get flash to fully experience Pearltrees
The February CTP of SQL Server 2008 delivered a usable version of the Data Profiling Task that is being introduced into SQL Server Integration Services 2008. I am putting together a series of blog entries that highlights the functionality available with this task and the blog entry that you are reading now is an introduction to that series. Before I delve into the detail of each type of data profile request that can be run I want to quickly show a list of some high-level bullet points that you are going to need to know about if you are going to use this task. I aim to produce a blog entry dedicated to each type of profile request available with the Data Profiling Task. Here is a list of those entries thus far: All of the examples make use of the AdventureWorks database that can be downloaded from Codeplex .
A while back I was reading one of Jamie Thomson most excellent posts on Conchango blogs SSIS Junkie SSIS: Data Profiling Task: Part 7 - Functional Dependency and it occurred to me it would be interesting to provide a poor man’s Functional Dependency using on TSQL. First let’s define Functional Dependency. Functional Dependency Determines the extent to which the values in one column (the dependent column) depend on the values in another column or set of columns (the determinant column) ; This profile also helps you identify problems in your data, such as values that are not valid. For example, you profile the dependency between a column that contains Country or Region Codes and a column that contains States/Provinces.
Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file ) and collecting statistics and information about that data. The purpose of these statistics may be to: Find out whether existing data can easily be used for other purposes Improve the ability to search the data by tagging it with keywords , descriptions, or assigning it to a category Give metrics on data quality , including whether the data conforms to particular standards or patterns Assess the risk involved in integrating data for new applications, including the challenges of joins Assess whether metadata accurately describes the actual values in the source database Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.