image

Data Domains are the heart of Data discovery and cataloging. In traditional database parlance, data domains are a collection of fields (values) that are encompassed by an attribute (database column). For example, using a Customer table example below, the timeZone attribute has a data domain of A, P, M, C, E, or null, which represent Alaska, Pacific, Mountain, Central and Eastern time zones. In other words, the data for timeZone is limited to this data set, or data domain.

The timeZone data for customer records might appear as in the following example:

image

With YOUnite, data domains refer to versions of a specific data type, such as employee, student or course and is defined by the parties responsible for data governance. The goal is to:

  • Create data domains that will normalize data across an organization

  • Manage access to their organization’s disjointed data sets (referred to as data governance or governance).

YOUnite allows data architects to:

  • Create data domains that reflect their organization’s requirements or its unique organizational structure

  • Version their data domains to accommodate new applications and application versions

Once a domain is created, and the data in the source systems is linked, it can be referenced by other data domains, data stewards, and by API consumers as a source of truth.

YOUnite domains are defined:

  • In JSON format.

  • Have universally-adopted domain versions agreed upon stakholders in the data fabric.

  • Create matching algorithms for identifying duplicate records in different systems where some data may or may not completely match.

Federated Data Domains

In YOUnite’s Data Fabric platform, "Federated" refers to a system where data management and governance are distributed among multiple entities or locations, rather than being centralized. This approach is particularly relevant in scenarios which focus on data integration and permissions.

In a federated data domain, each participating entity maintains control over its own data (i.e. source systems), including how it’s stored, processed, and shared. This model allows for collaboration and data sharing across different organizations and source systems while maintaining a degree of autonomy and privacy for each participant. Federated data domains are increasingly common in sectors that handle sensitive data, like healthcare, finance, and cross-organizational research projects.

With YOUnite, source systems "subscribe" to and receive data updates from other source systems that they consider to store accurate or "truth" for example, MIS, ERP, or CRM systems. YOUnite routes the data based on publication and subscription settings configured by each application and ensures data quality. Federated data domains require adaptors, metadata, and governance configurations and allow organizations to maintain a high level of control over their data. See Adaptors and Federated Data for more information on mapping federated domains in YOUnite.

Accessing federated data is covered on Accessing Data Records.

Domain Version Model Schemas

A domain version’s Model Schema refers to the attributes (properties), format, and other metadata that defines how a specific domain version should expect to store the data, for the purposes of standardizing how data is exchanged between systems. The Data Governance Steward is responsible for configuring and maintaining domain version model schemas. A domain version model schema is a JSON object describing/defining the properties for the domain’s schema. The format of the model schema implements a subset of the JSON Schema standard. See json-schema.org for more information.

See Valid Property Names and Model Schema Definition for more details about the model schema below.

Note
A data domain defines the name and data domain type while the data domain’s version’s define the model schemas.

Domain Creation Overview

Data domains and their versions can be created in the YOUnite User Interface. Following is an overview of the domain creation process via the API, and is followed by an example of the domain creation process and then posting records to and retrieving records from the domain.

POST the Domain

The first step in creating a domain is to define the domain name and the domain’s type and the zone it is attached to (its "owning zone"). If you do not define the type and zone, then YOUnite will use defaults as described below:

POST /domains

{
    "name": "<domain name>",
    "description": "<description of the data domain>",
    "domainType": "<domain type>"
}

Domain Properties Descriptions

property required valid values description

name

yes

Must be between 2 to 128 characters long and must start with an alpha character. The name property value can only contain upper/lower case alpha characters, digits, and "_" and "-".

The domain name. Must be unique to the entire YOUnite deployment since domains are typically shared.

description

no

0 to 255 characters long. If longer it will be truncated.

A human readable description of the domain.

domainType

yes

FEDERATED

FEDERATED domains do not store their data in YOUnite, but reference and update data on the systems in which it resides. Federated domains leverage adaptors and governance configurations. Accessing federated data is covered on Accessing Data Records.

The model schema is created in the next step. Model schemas are tied to specific domain versions, which is covered below.

POST a Domain Version

With the domain in place, its first version can be created. The domain version defines the properties that make up its model schema.

Domain version numbers are automatically generated and start at 1 and continue in ascending order. The first version of a domain is the default version and will remain the default version if more versions of a given domain are created. The PATCH method on the /domains/<domain-uuid> endpoint can be used to change a domain’s default version. See YOUnite API: PATCH domain version.

A domain version is defined with a domain JSON Object as follows:

POST /domains/<domain-uuid>/versions

{
    "modelSchema": {
        "properties": {
            "<property-name>": {
                "type": "<property-type>",
                "description": "<optional description>",
                ...item1 properties....
            },
            "<property-name2>": {
                 ...
            }
        },
        "required": [
            "<property-name>",
            ...
        ]
    },
    "description": "<description>",
    "drKeyProperties": ["<property-name 1>", "<property-name2>, ..."],
    "matchingAlgorithm: {
        ...
    }
}

Validating a Domain Version Before Saving

To validate, but not save, a domain version, add the query parameter validate-only=true to the end of the POST, ie POST /domains/<domain-uuid>/versions?validate-only=true. This will validate the domain version and return errors (if any), or a 200 with the validated domain version if the POST is valid.

Domain Version Properties Descriptions

property required valid values description

modelSchema

yes

See Model Schema Definition below for details.

A JSON-Schema model describing the schema for the data domain; it defines the properties that make up the domains schema.

description

no

0 to 255 characters long. If longer it will be truncated.

A human readable description of the domain version.

drKeyProperties

yes

Data Record Key Properties

A list of one or more valid properties for the given domain version. Each property must have a type either string, integer, number or boolean.

matchingAlgorithm

no

A Matching Algorithm

The definition used to match records from different systems to each other. This is optional and if not specified, the "exact" matching algorithm is used wherein each property must match exactly.

Data Record Key (DR Key Properties or DR Key)

Each domain version must designate one or more of its properties as part of the domain version DR Key Properties. DR Key Properties are domain version properties that identify those fields that, when combined, YOUnite should use to detect whether the data record is unique. In addition, DR Key Properties identify those properties that are used to match a record at one source to a record at another source. See Matching Algorithms below for more information.

Matching Algorithms

Domains may define custom criteria that determines uniqueness of a record based on some combination of the values of the DrKeyProperties along with SQL-like rules, including case insensitive and fuzzy matching.

If no custom matching algorithm is supplied, the default, "exact" matching algorithm is used which does a case sensitive match between all DrKeyProperties values. Matching Algorithms are a big topic and are detailed on the Matching Algorithms and Data Issues and Data Event Exceptions pages.

See Posting Data Records and Retrieving Data Records sections for further details on posting/retrieving data records.

Valid Property Names

  • Must start with a letter or "\_" (underscore).

  • Can only contain letters, digits, "-" (dash) and "\_" (underscore).

  • Can be up to 64 characters in length.

  • Are case in-sensitive.

  • If two properties have the same name at the same level in the model schema JSON structure, then only one will be used. In Example 1 below, only one name "property" will be used. In Example 2, both will be used  because name occurs at different levels in the JSON structure.

Example 1

{
    "properties": {
        ...
        "name": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "name": { "type": "string" },
        ...
    }
}

Example 2

{
    "properties": {
        ...
        "owner": {
            "type": "object",
            "properties": {
                "name": { "type": "string" },
                "phone": { "type": "string" },
                ....
            }
        },
        "pet": {
            "type": "object",
            "properties": {
                "name": { "type": "string" },
                "classification": { "type": "string" },
                ....
            }
        },
        ....
    }
}

Model Schema Definition

The model schema of a domain version uses the JSON Schema standard. See json-schema.org for more information on this standard. Note that only a subset of this standard is implemented - supported keywords / properties are noted below.

Example:

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Customer",
  "description": "Longer description of the Customer schema would go here",
  "type": "object",
  "properties": {
    "customerId": {
      "type": "integer",
      "description": "The customer's unique identifier",
      "minimum": 0
    },
    "firstName": {
      "type": "string",
      "description": "The customer's first name",
      "maxLength": 255
    },
    "lastName": {
      "type": "string",
      "description": "The customer's last name",
      "maxLength": 255
    },
    "active": {
      "type": "boolean",
      "description": "The customer's active status"
    },
    "primaryLocation": {
      "type": "string",
      "description": "Primary location (may be EAST or WEST)",
      "enum": [
        "EAST",
        "WEST"
      ]
    },
    "address": {
      "type": "object",
      "properties": {
        "street": {
          "type": "string",
          "description": "Street portion of address"
        },
        "city": {
          "type": "string",
          "description": "City portion of address"
        }
      },
      "required": [
        "city"
      ]
    },
    "customerRepresentative": {
      "description": "The customer's repesentative",
      "$ref": "/domains/employees:v1"
    },
    "sales": {
      "description": "List of sales",
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "saleId": {
            "description": "Unique identifier of the sale",
            "type": "string"
          },
          "amount": {
            "description": "Amount of the sale",
            "type": "number",
            "maximum": 999.99
          }
        },
        "required": [
          "saleId",
          "amount"
        ]
      }
    }
  },
  "required": [
    "customerId",
    "lastName",
    "active"
  ]
}
Note
The top-level properties $id, $schema, title and description are all optional but may be included for clarity. The only required top-level element that is required is properties (a type of object is implied if not included.)

In JSON Schema, there are six data types: string, integer, number, boolean, array, object. These data types correspond to the types of data that can be transmitted in a JSON-encoded document.

There is also a special case data type of $ref, which references an external object i.e. another data record in an other domain for purposes of cross-referencing; these are called cross domains).

Below are details of each data type with supported JSON-schema keywords and example:

String

A string of characters, otherwise know as "text".

The following keywords are allowed for this type:

Keyword Description Default

minLength

Minimum string length

None

maxLength

Maximum string length

None

pattern

Regular Expression (Regex) Pattern

None

format

Format of the string (such as uuid or uri). See Supported Formats.

None

enum

List of values that are allowed, expressed as an array of strings

None

default

Default value to use if none is provided, or if the provided value is null

None

title

Short description of the property

None

description

Longer description of the property

None

Regular Expressions

String properties with a pattern must use the Java Regular Expressions Syntax.

Supported Formats

Many of the built-in formats from JSON schema are supported as well as one custom one: uuid.

The following are supported: date, time, date-time, email, uri, uuid, hostname, ipv4, ipv6, regex.

All of the above supported formats will perform validation on incoming data, except hostname and regex.

Integer

A number with no decimal component.

Note
YOUnite support integers up to a maximum size of 64-bits, or between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 (inclusive).

The following keywords are allowed for this type:

Property Description Default

minimum

Minimum value allowed

None

maximum

Maximum value allowed

None

enum

List of values that are allowed, expressed as an array of integers

None

default

Default value to use if none is provided, or if the provided value is null

None

title

Short description of the property

None

description

Longer description of the property

None

Number

A numeric, decimal number with up to 64 bits of precision.

Note
YOUnite support decimal numbers with up to 64-bits precision. This data type should never be used where absolute precision is required, such as for currency.

The following keywords are allowed for this type:

Property Description Default

minimum

Minimum value allowed

None

maximum

Maximum value allowed

None

enum

List of values that are allowed, expressed as an array of numbers

None

default

Default value to use if none is provided, or if the provided value is null

None

title

Short description of the property

None

description

Longer description of the property

None

Boolean

A boolean, allowing one of two values: true or false

The following keywords are allowed for this type:

Property Description Default

default

Default value to use if none is provided, or if the provided value is null

None

title

Short description of the property

None

description

Longer description of the property

None

Array

An array of a single type: either an object or $ref.

Note
Anonymous arrays (where the properties inside the array are not named), which are supported by JSON-Schema are NOT supported by YOUnite.

The following keywords are allowed for this type:

Property Description Default

items

Either an object or $ref (see below). Required.

None

minItems

Minimum items allowed in the array

None

maxItems

Maximum items allowed in the array

None

title

Short description of the property

None

description

Longer description of the property

None

Array Items

The items keyword of an array property indicates what the contents of the array can be. This can either be an object or $ref. Note that a $ref is a special case of an object, where the schema is contained in another domain.

Example: An array of objects that has a first name and a last name:

{
    "properties": {
        "people": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "firstName": {
                        "type": "string"
                    },
                    "lastName": {
                        "type": "string"
                    }
                }
            }
        }
    }
}

Example: An array of objects that point to another schema:

{
    "properties": {
        "people": {
            "type": "array",
            "items": {
                "description": "Link to data in the person schema",
                "$ref": "/domains/person:v1"
            }
        }
    }
}

Object

An object that contains sub-properties. For example, address is a container object with the properties city and state.

The following keywords are allowed for this type:

Property Description Default

properties

A list of sub-properties. Required.

None

required

List of the sub-properties that are required expressed as an array of strings

false

title

Short description of the property

None

description

Longer description of the property

None

Example:

{
    "properties": {
        "address": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string"
                },
                "state": {
                    "type": "string",
                    "enum": [
                        "CA",
                        "OR",
                        "NV"
                    ]
                }
            },
            "required": [
                "city"
            ]
        }
    }
}

Cross-reference ($ref)

A cross-reference ($ref) points to another domain or domain data element.

For more information see Cross Domains.

The following keywords are allowed for this type:

Property Description Default

title

Short description of the property

None

description

Longer description of the property

None

Any data events originating in the source system will use the default value if the property does not include the value or is sent in with a null value. Note an empty or blank property is not the same as null.

If the property is missing or is null, a data exception is thrown. For more on data exceptions see Data Issues and Data Event Exceptions.

Note that if a domain property has a "default" value then "required" is implied as the value will never be non-null.

An Example of Creating a Domain in Two Steps

For this example, we’ll create a data domain and data domain version.

  1. POST the Domain

  2. POST a Domain Version

POST the Domain

POST /domains

{
    "name": "states",
    "description": "A reference domain of states that can be referenced in the YOUnite Data Fabric",
    "domainType": "FEDERATED"
}

The location header returned provides the URI for POSTing a domain version below.

e.g. Location /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc

POST a Domain Version

This is a basic domain version.

POST /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc/versions

{
    "description": "A reference list of states in the North American States: USA, Mexico and Canada",
    "modelSchema": {
        "properties": {
            "name": {
                "type": "string",
                "description": "The state's official name",
                "minLength": 2,
                "maxLength": 80
            },
            "abbreviation": {
                "type": "string",
                "description": "The state's official abbreviation",
                "minLength": 2,
                "maxLength": 2
            }
        },
        "required": [
            "name",
            "abbreviation"
        ]
    },
    "drKeyProperties": [
        "abbreviation"
    ]
}

The following JSON is another simple example of a model domain version.

POST /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc/versions

{
    "modelSchema": {
        "properties": {
            "name": {
                "type": "string",
                "min": 2,
                "max": 80
            },
            "countrycode": {
                "type": "string",
                "description": "ISO Standard 3-character Country Code",
                "min": 3,
                "max": 3
            },
            "population": {
                "type": "integer"
            },
            "capital": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string"
                    },
                    "state": {
                        "type": "string"
                    }
                },
                "required": [
                    "city",
                    "state"
                ]
            }
        },
        "required": [
            "name",
            "countrycode"
        ]
    },
    "drKeyProperties": [
        "countrycode"
    ]
}

The response code on success is: 201 CREATED

Updating a Domain Version / Adding Properties to a Domain Version

The modelSchema of a domain version may be modified at any time so long as there are no data records associated with it. Once data records are recorded, new properties may be added to the modelSchema, but existing properties may not be deleted.

To update the modelSchema, use the PATCH method, ie:

PATCH /domains/7f28180b-7d9f-42b5-b5ed-d4a0e7ec09fc/versions/8cec38f0-cd61-4feb-8c05-726363cfcea5

{
    "modelSchema": {
        ... new schema here ...
    }
}
Note
If the domain version is in use, properties may be added but not removed.

Posting Data Records

Standard Data Record Linking

YOUnite does not require that all data records in the customers overall IT Ecosystem are mapped prior to going operational. Data records can be mapped as data events occur. This takes a big burden off of data analysts, architects and IT staff.

If an adaptor is capable of the POST operation for given domain version, then, when it detects a new source record in the source system (or a change to an existing one) it then:

  1. generates a new data event and sends it to YOUnite.

  2. When YOUnite receives the data event from the adaptor it inspects the DrKey in the payload and determines if data record is a new data record in the YOUnite Data Fabric or an update to an existing one.

  3. The YOUnite router now knows if the data event is a POST (new record) or PUT (update to an existing) then considers the domain and adaptor where the change was detected and, applies the appropriate governance and routes the data event to the other adaptors in the YOUnite Data Fabric.

Data Discovery Record Linking

Data discovery is the process of synchronizing and applying data quality rules to federated data managed through YOUnite for a given data domain version by scanning source systems for source entities and linking them to data records in YOUnite before putting the adaptor in "PLAY" mode. Data records discovered while an adaptor is in data discovery mode are not routed to other adaptors but merely linked to the federated data records maintained by the YOUnite server. See the document The Data Catalog - Linking YOUnite Data Records to Entities in Source Systems.

Retrieving Data Records

Data Record Assembly i.e. Data Virtualization

Since YOUnite does not store the data records for federated data domains, it cannot return the entire data record. The GET /drs request for a federated data record will return just the DrKeyProperties (DrKeys). To return the entire data record, the data record assembler is requested with a POST /drs/<dr-uuid>/assembler request.

Retrieve a Data Record’s DrKeyProperties (DrKeys)

The following two requests will return only the DrKey properties and UUIDs for the data records in a domain.

GET /drs?filters=name:<domain-name>

Retrieving the DrKeys for the data records (and their respective UUIDs) for a given domain and version:

GET /drs?filters=name:<domain-name>,version:<version>

Retrieve a Federated Data Record using Data Record Assembly

Data virtualization or is covered in detail in Accessing Data Records and Assembling Federated Data Records.