Data Domains are the heart of Data discovery and cataloging. In traditional database parlance, data domains are a collection of fields (values) that are encompassed by an attribute (database column). For example, using a Customer table example below, the timeZone attribute has a data domain of A, P, M, C, E, or null, which represent Alaska, Pacific, Mountain, Central and Eastern time zones. In other words, the data for timeZone is limited to this data set, or data domain.
The timeZone data for customer records might appear as in the following example:
With YOUnite, data domains refer to versions of a specific data type, such as employee
, student
or course
and is defined by the parties
responsible for data governance. The goal is to:
-
Create data domains that will normalize data across an organization
-
Manage access to their organization’s disjointed data sets (referred to as data governance or governance).
YOUnite allows data architects to:
-
Create data domains that reflect their organization’s requirements or its unique organizational structure
-
Version their data domains to accommodate new applications and application versions
Once a domain is created, and the data in the source systems is linked, it can be referenced by other data domains, data stewards, and by API consumers as a source of truth.
YOUnite domains are defined:
-
In JSON format.
-
Have universally-adopted domain versions agreed upon stakholders in the data fabric.
-
Create matching algorithms for identifying duplicate records in different systems where some data may or may not completely match.
Federated Data Domains
In YOUnite’s Data Fabric platform, "Federated" refers to a system where data management and governance are distributed among multiple entities or locations, rather than being centralized. This approach is particularly relevant in scenarios which focus on data integration and permissions.
In a federated data domain, each participating entity maintains control over its own data (i.e. source systems), including how it’s stored, processed, and shared. This model allows for collaboration and data sharing across different organizations and source systems while maintaining a degree of autonomy and privacy for each participant. Federated data domains are increasingly common in sectors that handle sensitive data, like healthcare, finance, and cross-organizational research projects.
With YOUnite, source systems "subscribe" to and receive data updates from other source systems that they consider to store accurate or "truth" for example, MIS, ERP, or CRM systems. YOUnite routes the data based on publication and subscription settings configured by each application and ensures data quality. Federated data domains require adaptors, metadata, and governance configurations and allow organizations to maintain a high level of control over their data. See Adaptors and Federated Data for more information on mapping federated domains in YOUnite.
Accessing federated data is covered on Accessing Data Records.
Domain Version Model Schemas
A domain version’s Model Schema refers to the attributes (properties), format, and other metadata that defines how a specific domain version should expect to store the data, for the purposes of standardizing how data is exchanged between systems. The Data Governance Steward is responsible for configuring and maintaining domain version model schemas. A domain version model schema is a JSON object describing/defining the properties for the domain’s schema. The format of the model schema implements a subset of the JSON Schema standard. See json-schema.org for more information.
See Valid Property Names and Model Schema Definition for more details about the model schema below.
Note
|
A data domain defines the name and data domain type while the data domain’s version’s define the model schemas. |
Domain Creation Overview
Data domains and their versions can be created in the YOUnite User Interface. Following is an overview of the domain creation process via the API, and is followed by an example of the domain creation process and then posting records to and retrieving records from the domain.
POST the Domain
The first step in creating a domain is to define the domain name and the domain’s type and the zone it is attached to (its "owning zone"). If you do not define the type and zone, then YOUnite will use defaults as described below:
POST /domains
{
"name": "<domain name>",
"description": "<description of the data domain>",
"domainType": "<domain type>"
}
Domain Properties Descriptions
property | required | valid values | description |
---|---|---|---|
|
yes |
Must be between 2 to 128 characters long and must start with an alpha character. The |
The domain name. Must be unique to the entire YOUnite deployment since domains are typically shared. |
|
no |
0 to 255 characters long. If longer it will be truncated. |
A human readable description of the domain. |
|
yes |
|
FEDERATED domains do not store their data in YOUnite, but reference and update data on the systems in which it resides. Federated domains leverage adaptors and governance configurations. Accessing federated data is covered on Accessing Data Records. |
The model schema is created in the next step. Model schemas are tied to specific domain versions, which is covered below.
POST a Domain Version
With the domain in place, its first version can be created. The domain version defines the properties that make up its model schema.
Domain version numbers are automatically generated and start at 1 and continue in ascending order. The first version of a
domain is the default version and will remain the default version if more versions of a given domain are created. The PATCH
method
on the /domains/<domain-uuid>
endpoint can be used to change a domain’s default version. See YOUnite API: PATCH domain version.
A domain version is defined with a domain JSON Object as follows:
POST /domains/<domain-uuid>/versions
{
"modelSchema": {
"properties": {
"<property-name>": {
"type": "<property-type>",
"description": "<optional description>",
...item1 properties....
},
"<property-name2>": {
...
}
},
"required": [
"<property-name>",
...
]
},
"description": "<description>",
"drKeyProperties": ["<property-name 1>", "<property-name2>, ..."],
"matchingAlgorithm: {
...
}
}
Validating a Domain Version Before Saving
To validate, but not save, a domain version, add the query parameter validate-only=true
to the end of the POST
,
ie POST /domains/<domain-uuid>/versions?validate-only=true
. This will validate the domain version and return errors
(if any), or a 200 with the validated domain version if the POST
is valid.
Domain Version Properties Descriptions
property | required | valid values | description |
---|---|---|---|
|
yes |
See Model Schema Definition below for details. |
A JSON-Schema model describing the schema for the data domain; it defines the properties that make up the domains schema. |
|
no |
0 to 255 characters long. If longer it will be truncated. |
A human readable description of the domain version. |
|
yes |
A list of one or more valid properties for the given domain version. Each property must have a type either string, integer, number or boolean. |
|
|
no |
The definition used to match records from different systems to each other. This is optional and if not specified, the "exact" matching algorithm is used wherein each property must match exactly. |
Data Record Key (DR Key Properties or DR Key)
Each domain version must designate one or more of its properties as part of the domain version DR Key Properties. DR Key Properties are domain version properties that identify those fields that, when combined, YOUnite should use to detect whether the data record is unique. In addition, DR Key Properties identify those properties that are used to match a record at one source to a record at another source. See Matching Algorithms below for more information.
Matching Algorithms
Domains may define custom criteria that determines uniqueness of a record based on some combination of the values of the DrKeyProperties along with SQL-like rules, including case insensitive and fuzzy matching.
If no custom matching algorithm is supplied, the default, "exact" matching algorithm is used which does a case sensitive match between all DrKeyProperties values. Matching Algorithms are a big topic and are detailed on the Matching Algorithms and Data Issues and Data Event Exceptions pages.
See Posting Data Records and Retrieving Data Records sections for further details on posting/retrieving data records.
Valid Property Names
-
Must start with a letter or "\_" (underscore).
-
Can only contain letters, digits, "-" (dash) and "\_" (underscore).
-
Can be up to 64 characters in length.
-
Are case in-sensitive.
-
If two properties have the same name at the same level in the model schema JSON structure, then only one will be used. In Example 1 below, only one name "property" will be used. In Example 2, both will be used because
name
occurs at different levels in the JSON structure.
Example 1
{
"properties": {
...
"name": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" },
"name": { "type": "string" },
...
}
}
Example 2
{
"properties": {
...
"owner": {
"type": "object",
"properties": {
"name": { "type": "string" },
"phone": { "type": "string" },
....
}
},
"pet": {
"type": "object",
"properties": {
"name": { "type": "string" },
"classification": { "type": "string" },
....
}
},
....
}
}
Model Schema Definition
The model schema of a domain version uses the JSON Schema standard. See json-schema.org for more information on this standard. Note that only a subset of this standard is implemented - supported keywords / properties are noted below.
Example:
{
"$id": "https://example.com/person.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Customer",
"description": "Longer description of the Customer schema would go here",
"type": "object",
"properties": {
"customerId": {
"type": "integer",
"description": "The customer's unique identifier",
"minimum": 0
},
"firstName": {
"type": "string",
"description": "The customer's first name",
"maxLength": 255
},
"lastName": {
"type": "string",
"description": "The customer's last name",
"maxLength": 255
},
"active": {
"type": "boolean",
"description": "The customer's active status"
},
"primaryLocation": {
"type": "string",
"description": "Primary location (may be EAST or WEST)",
"enum": [
"EAST",
"WEST"
]
},
"address": {
"type": "object",
"properties": {
"street": {
"type": "string",
"description": "Street portion of address"
},
"city": {
"type": "string",
"description": "City portion of address"
}
},
"required": [
"city"
]
},
"customerRepresentative": {
"description": "The customer's repesentative",
"$ref": "/domains/employees:v1"
},
"sales": {
"description": "List of sales",
"type": "array",
"items": {
"type": "object",
"properties": {
"saleId": {
"description": "Unique identifier of the sale",
"type": "string"
},
"amount": {
"description": "Amount of the sale",
"type": "number",
"maximum": 999.99
}
},
"required": [
"saleId",
"amount"
]
}
}
},
"required": [
"customerId",
"lastName",
"active"
]
}
Note
|
The top-level properties $id , $schema , title and description are all optional but may be included for
clarity. The only required top-level element that is required is properties (a type of object is implied if not
included.)
|
In JSON Schema, there are six data types: string
, integer
, number
, boolean
, array
, object
. These data types
correspond to the types of data that can be transmitted in a JSON-encoded document.
There is also a special case data type of $ref
, which references an external object
i.e. another data record in another domain for purposes of cross-referencing; these are called Cross-References.
Below are details of each data type with supported JSON-schema keywords and example:
String
A string of characters, otherwise know as "text".
The following keywords are allowed for this type:
Keyword | Description | Default |
---|---|---|
minLength |
Minimum string length |
None |
maxLength |
Maximum string length |
None |
pattern |
Regular Expression (Regex) Pattern |
None |
format |
Format of the string (such as |
None |
enum |
List of values that are allowed, expressed as an array of strings |
None |
default |
Default value to use if none is provided, or if the provided value is null |
None |
title |
Short description of the property |
None |
description |
Longer description of the property |
None |
Regular Expressions
String properties with a pattern
must use the Java Regular Expressions Syntax.
Supported Formats
Many of the built-in formats from JSON schema are supported as well as one custom one: uuid
.
The following are supported: date
, time
, date-time
, email
, uri
, uuid
, hostname
, ipv4
, ipv6
, regex
.
All of the above supported formats will perform validation on incoming data, except hostname
and regex
.
Integer
A number with no decimal component.
Note
|
YOUnite support integers up to a maximum size of 64-bits, or between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 (inclusive). |
The following keywords are allowed for this type:
Property | Description | Default |
---|---|---|
minimum |
Minimum value allowed |
None |
maximum |
Maximum value allowed |
None |
enum |
List of values that are allowed, expressed as an array of integers |
None |
default |
Default value to use if none is provided, or if the provided value is null |
None |
title |
Short description of the property |
None |
description |
Longer description of the property |
None |
Number
A numeric, decimal number with up to 64 bits of precision.
Note
|
YOUnite support decimal numbers with up to 64-bits precision. This data type should never be used where absolute precision is required, such as for currency. |
The following keywords are allowed for this type:
Property | Description | Default |
---|---|---|
minimum |
Minimum value allowed |
None |
maximum |
Maximum value allowed |
None |
enum |
List of values that are allowed, expressed as an array of numbers |
None |
default |
Default value to use if none is provided, or if the provided value is null |
None |
title |
Short description of the property |
None |
description |
Longer description of the property |
None |
Boolean
A boolean, allowing one of two values: true
or false
The following keywords are allowed for this type:
Property | Description | Default |
---|---|---|
default |
Default value to use if none is provided, or if the provided value is null |
None |
title |
Short description of the property |
None |
description |
Longer description of the property |
None |
Array
An array of a single type: either an object
or $ref
.
Note
|
Anonymous arrays (where the properties inside the array are not named), which are supported by JSON-Schema are NOT supported by YOUnite. |
The following keywords are allowed for this type:
Property | Description | Default |
---|---|---|
items |
Either an |
None |
minItems |
Minimum items allowed in the array |
None |
maxItems |
Maximum items allowed in the array |
None |
title |
Short description of the property |
None |
description |
Longer description of the property |
None |
Array Items
The items
keyword of an array property indicates what the contents of the array can be. This can either be an object
or $ref
. Note that a $ref
is a special case of an object, where the schema is contained in another domain.
Example: An array of objects that has a first name and a last name:
{
"properties": {
"people": {
"type": "array",
"items": {
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
}
}
}
}
}
}
Example: An array of objects that point to another schema:
{
"properties": {
"people": {
"type": "array",
"items": {
"description": "Link to data in the person schema",
"$ref": "/domains/person:v1"
}
}
}
}
Object
An object that contains sub-properties. For example, address
is a container object with the properties city
and state
.
The following keywords are allowed for this type:
Property | Description | Default |
---|---|---|
properties |
A list of child-properties. Required. |
None |
required |
List of required child-properties that are expressed as an array of strings. |
false |
title |
Short description of the property |
None |
description |
Longer description of the property |
None |
Example:
{
"properties": {
"address": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"state": {
"type": "string",
"enum": [
"CA",
"OR",
"NV"
]
}
},
"required": [
"city"
]
}
}
}
Cross-Reference to Another Domain ($ref)
A cross-reference ($ref
) points to another domain or domain data element. Cross-references are made by using Lucene queries. The properties referenced in the Lucene queries must be included in the parent or child domain DR Key properties.
Cross-referencing allows "assembling" data records by querying the adaptors that have the data record and compiling the results. If a Federated domain has a cross-reference, when it is assembled, the cross-reference will be assembled as well.
For more information see Cross Domains.
The type
and required
properties are not used when defining cross-references.
The following keywords are allowed for this type:
Property | Description | Default |
---|---|---|
$ref |
Link to other data domain in the form of:
e.g.
|
None |
description |
Optional description of the property. |
None |
query |
Cross-references are established by specifying the relationship in Lucene syntax. IMPORTANT! Properties of both the parent and child records referenced in query MUST be included in DR Key Properties. |
None |
multivalued |
The cross-reference relationship can be one-to-one ( |
false |
sort |
Order to sort on: e.g. Sort the "total" properties in the cross-referenced domain:version in descending order: "sort": "-total" |
+ (ascending order) |
limit |
An integer greater than or equal to zero that defines the maximum number of data records to be returned. |
None |
Any data events originating in the source system will use the default value if the property does not include the value or is sent in with a null value. Note an empty or blank property is not the same as null.
If the property is missing or is null, a data exception is thrown. For more on data exceptions see Data Issues and Data Event Exceptions.
Note that if a domain property has a "default" value then "required" is implied as the value will never be non-null.
An Example of Creating a Domain in Two Steps
For this example, we’ll create a data domain and data domain version.
-
POST the Domain
-
POST a Domain Version
POST the Domain
POST /domains
{
"name": "states",
"description": "A reference domain of states that can be referenced in the YOUnite Data Fabric",
"domainType": "FEDERATED"
}
The location header returned provides the URI for POSTing a domain version below.
e.g. Location /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc
POST a Domain Version
This is a basic domain version.
POST /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc/versions
{
"description": "A reference list of states in the North American States: USA, Mexico and Canada",
"modelSchema": {
"properties": {
"name": {
"type": "string",
"description": "The state's official name",
"minLength": 2,
"maxLength": 80
},
"abbreviation": {
"type": "string",
"description": "The state's official abbreviation",
"minLength": 2,
"maxLength": 2
}
},
"required": [
"name",
"abbreviation"
]
},
"drKeyProperties": [
"abbreviation"
]
}
The following JSON is another simple example of a model domain version.
POST /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc/versions
{
"modelSchema": {
"properties": {
"name": {
"type": "string",
"min": 2,
"max": 80
},
"countrycode": {
"type": "string",
"description": "ISO Standard 3-character Country Code",
"min": 3,
"max": 3
},
"population": {
"type": "integer"
},
"capital": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"state": {
"type": "string"
}
},
"required": [
"city",
"state"
]
}
},
"required": [
"name",
"countrycode"
]
},
"drKeyProperties": [
"countrycode"
]
}
The response code on success is: 201 CREATED
Updating a Domain Version / Adding Properties to a Domain Version
The modelSchema
of a domain version may be modified at any time so long as there are no data records associated with
it. Once data records are recorded, new properties may be added to the modelSchema
, but existing properties may not be deleted.
To update the modelSchema
, use the PATCH
method, ie:
PATCH /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc/versions/8cec38f0-cd61-4feb-8c05-726363cfcea5
{
"modelSchema": {
... new schema here ...
}
}
Note
|
If the domain version is in use, properties may be added but not removed. |
Posting Data Records
Standard Data Record Linking
YOUnite does not require that all data records in the customers overall IT Ecosystem are mapped prior to going operational. Data records can be mapped as data events occur. This takes a big burden off of data analysts, architects and IT staff.
If an adaptor is capable of the POST operation for given domain version, then, when it detects a new source record in the source system (or a change to an existing one) it then:
-
generates a new data event and sends it to YOUnite.
-
When YOUnite receives the data event from the adaptor it inspects the DrKey in the payload and determines if data record is a new data record in the YOUnite Data Fabric or an update to an existing one.
-
The YOUnite router now knows if the data event is a POST (new record) or PUT (update to an existing) then considers the domain and adaptor where the change was detected and, applies the appropriate governance and routes the data event to the other adaptors in the YOUnite Data Fabric.
Data Discovery Record Linking
Data discovery is the process of synchronizing and applying data quality rules to federated data managed through YOUnite for a given data domain version by scanning source systems for source entities and linking them to data records in YOUnite before putting the adaptor in "PLAY" mode. Data records discovered while an adaptor is in data discovery mode are not routed to other adaptors but merely linked to the federated data records maintained by the YOUnite server. See the document The Data Catalog - Linking YOUnite Data Records to Entities in Source Systems.
Retrieving Data Records
Data Record Assembly i.e. Data Virtualization
Since YOUnite does not store the data records for federated data domains, it cannot return the entire data record. The GET /drs
request for a federated data record will return just the DrKeyProperties (DrKeys). To
return the entire data record, the data record assembler is requested with a POST /drs/<dr-uuid>/assembler
request.
Retrieve a Data Record’s DrKeyProperties (DrKeys)
The following two requests will return only the DrKey properties and UUIDs for the data records in a domain.
GET /drs?filters=name:<domain-name>
Retrieving the DrKeys for the data records (and their respective UUIDs) for a given domain and version:
GET /drs?filters=name:<domain-name>,version:<version>
Retrieve a Federated Data Record using Data Record Assembly
Data virtualization or is covered in detail in Accessing Data Records and Assembling Federated Data Records.