Introduction
Data Domain Introduction
Data Domains are the heart of Data discovery and cataloging. In traditional database parlance, data domains are a collection of fields (values) that are encompassed by an attribute (database column). For example, using a Customer table example below, the timeZone attribute has a data domain of A, P, M, C, E, or null, which represent Alaska, Pacific, Mountain, Central and Eastern time zones. In other words, the data for timeZone is limited to this data set, or data domain.
The timeZone data for customer records might appear as in the following example:
With YOUnite, data domains refer to versions of a specific data type, such as employee
, student
or course
and is defined by the parties
responsible for data governance. The goal is to:
-
Create data domains that will normalize data across an organization
-
Manage access to their organization’s disjointed data sets (referred to as data governance or governance).
YOUnite allows data architects to:
-
Create data domains that reflect their organization’s requirements or its unique organizational structure
-
Version their data domains to accommodate new applications and application versions
Once a domain is created, and the data in the source systems is linked, it can be referenced by other data domains, data stewards, and by API consumers as a source of truth.
YOUnite domains are defined:
-
In JSON format.
-
Have universally-adopted domain versions agreed upon stakholders in the data fabric.
-
Create matching algorithms for identifying duplicate records in different systems where some data may or may not completely match.
Creating Domain Versions
Data domain versions can be created using the YOUnite UI or can be created by defined via JSON schemas.
For more see:
Matching Algorithms
Matching Algorithms contain a set of SQL-like rules that determine whether the DR Key Properties of two records indicate a match, ie the source (new record being checked) and destination (existing record being compared) are the same Data Record.
Why Matching?
By default, if a matching algorithm is not specified, records are determined to match only if all of the values of their DR Key Properties are identical. Custom rules and/or score based matching Algorithms are useful for identifying duplicate records in different systems where some data may or may not completely match.