Read Me First!

Before reading this document, it is recommended that Managing Adaptors is read as it contains information about the entire process of configuring and deploying an Adaptor, whereas this document deals specifically with the off-the-shelf YOUnite File System Adaptor.

Overview

The File System Adaptor is actually two types of adaptors: a source and a destination. The source adaptor monitors folders for changes and sends them to YOUnite. The destination adaptor receives messages from YOUnite and performs the file operation.

The source and destination cannot be the same adaptor.

Features

  • File system support: everything Apache VFS supports as well as Amazon S3. See http://commons.apache.org/proper/commons-vfs/filesystems.html

    • NOTE: For monitoring to succeed it needs to know the last modified time and size of each file. Some file systems supported by VFS may not be able to supply this information and therefore cannot be monitored.

  • Fast backing database (H2) runs on the local file system, ensures that on restart or recovery from failure that file system monitoring resumes where it left off.

  • Scanning and processing changes are asynchronous, for efficiency.

Limitations

  • Entire list of monitored files is stored in memory during monitoring. It is backed by a database only for restart or recovery from failure.

  • H2 is only currently supported database.

File Name Limitations

  • File names can contain nearly any character EXCEPT forward slash "/" and percent sign "%". Otherwise, as long as the file system can handle it, any other character should be acceptable.

Domain Configuration

For each file system that is being monitored for changes a Domain must be created with this exact schema. This is the metadata that is sent from the Source Adaptor to YOUnite, which then passes the information on the Destination Adaptor:

  1. The DR Key Properties MUST be:

    [
      "adaptorUuid",
      "sourceUri"
    ]
  2. The Schema MUST be:

    {
      "adaptorUuid": {
        "type": "string",
        "required": true
      },
      "sourceUri": {
        "type": "string",
        "required": true
      },
      "intermediateUri": {
        "type": "string"
      },
      "destinationUri": {
        "type": "string"
      },
      "action": {
        "type": "enum",
        "required": true,
        "enumType": "string",
        "data": [
          "NEW",
          "UPDATED",
          "DELETED"
        ]
      },
      "contentLength": {
        "type": "int",
        "required": true
      },
      "additionalParameters": {
        "type": "array",
        "items": {
          "key": {
            "type": "string",
            "required": true
          },
          "value": {
            "type": "string",
            "required": true
          }
        }
      }
    }

Adaptor Configuration

All configuration items may be specified as environment variables (usually uppercase with underscores delimiting words) or put in a file /adaptor.properties (usually lowercase with periods delimiting words).

Base Configuration Properties

Before starting on this section, read Managing Adaptors, which discusses how to create and retrieve the adaptor credentials. Once the adaptor credentials have been retrieved, the following properties must be configured either as environment variables or included in the adaptor.properties file.

Required properties:

Property Environment Variable Description Example Value

adaptor.uuid

ADAPTOR_UUID

UUID of the adaptor

3dfcc03d-e5d4-4d57-9e9b-5c5d2db32f9a

client.id

CLIENT_ID

Username used to connect to message bus

8c9167a6-bb83-4f77-bdfc-1947a946f77b

client.secret

CLIENT_SECRET

Password used to connect to message bus

de02e3fa-4b23-46cb-aed6-5665a16e73d3

message.bus.url

MESSAGE_BUS_URL

Message Bus URL

tcp://message-bus-uri:61617

auth.server.url

AUTH_SERVER_URL

OAuth Server to validate adaptor access credentials. The YOUnite Server runs an embedded OAuth server that your implementation may be using. By default it runs on port 8080 so, in this case the value would be http://ip-address-of-the-YOUnite-Server:8080

http://oauth-server-uri

Optional properties:

Property Environment Variable Description Default Value

hostname

HOSTNAME

Name of the host for the health check endpoint. Note that Kubernetes sets this environment variable by default.

localhost

port

PORT

HTTP Port to expose for health checks. The health check endpoint at /health, ie GET host:8001/health

8001

concurrency

CONCURRENCY

Concurrency for incoming data messages, ie, number of threads that will handle incoming messages. Note that this could lead to messages being processed out of order, though they are guaranteed to be processed in order based on their message grouping (the default is unique ID of the record, ie the DR UUID).

1

ops.concurrency

OPS_CONCURRENCY

Concurrency for incoming ops messages, ie, number of threads that will handle incoming messages. These are messages with adaptor status information and metadata, request for stats, shutdown requests. Expressed as a range, ie "1-5" or a single value indicating the maximum (with an implied minimum of 1).

1

assembly.concurrency

ASSEMBLY_CONCURRENCY

Concurrency for incoming assembly messages, ie, number of threads that will handle incoming messages. These are federated GET requests. Expressed as a range, ie "1-5" or a single value indicating the maximum (with an implied minimum of 1).

1

session.cache.size

SESSION_CACHE_SIZE

JMS session cache size.

10

enable.metrics

ENABLE_METRICS

Enable / disable displaying metrics at a given interval.

true

metrics.log.interval

METRIC_LOGS_INTERVAL

Interval in milliseconds to display metrics.

3600000 (one hour)

n/a

LOG_LEVEL

Log level for YOUnite logback messages. Applies to younite-db-adaptor implementations, but not the Adaptor SDK. Must come from an environment variable.

INFO

n/a

ROOT_LOG_LEVEL

Log level for non-YOUnite logback messages. Applies to younite-db-adaptor implementations, but not the Adaptor SDK. Must come from an environment variable.

INFO

Example:

# Configuration

# Adaptor UUID
adaptor.uuid = 3dfcc03d-e5d4-4d57-9e9b-5c5d2db32f9a

# ClientID and Secret to be used by JMS to connect to the message bus
client.id = 8c9167a6-bb83-4f77-bdfc-1947a946f77b
client.secret = de02e3fa-4b23-46cb-aed6-5665a16e73d3

# Message Bus URL
message.bus.url = tcp://192.2.200.25:61616

# Optional configuration
concurrency = 4
ops.concurrency = 1-4
assembly.concurrency = 2-2
session.cache.size = 10

Health Check Endpoint

The adaptor includes a HTTP endpoint to check the health of the adaptor at /health. The default port is 8001.

If the adaptor is healthy, it will return a 200 OK response when querying the endpoint. If it’s not healthy, it will return a 503 Service Unavailable. For example:

{
  "status": "UP"
}

In addition, it will return some information about the adaptor such as the status of each service and the uptime. For example, if the employee v1 domain handler is not working, the adaptor might return something like this:

{
  "status": "DOWN",
  "adaptorService": "UP",
  "upTimeMilliseconds": 100000,
  "customer:1": "UP",
  "employee:1": "DOWN"
}

Additional Configuration Properties

In addition to the above properties, the following optional options are specific to the File System and S3 Adaptor.

Most of this configuration is for the two H2 databases that are used to persist data. These will configure themselves by default but their configuration can be changed with the following options:

Property Environment Variable Description Default Value

data.directory

DATA_DIRECTORY

Directory where databases are stored.

/data in the Docker image, or home (~) when run locally

file.monitor.database.name

FILE_MONITOR_DATABASE_NAME

Name of the file, without extension, of the H2 database in DATA_DIRECTORY to store file monitoring data.

file_monitor

file.monitor.database.username

FILE_MONITOR_DATABASE_USERNAME

Username for the H2 file monitoring database.

sa

file.monitor.database.password

FILE_MONITOR_DATABASE_PASSWORD

Password for the H2 file monitoring database.

sa

file.monitor.database.connection.pool.size

FILE_MONITOR_DATABASE_CONNECTION_POOL_SIZE

Size of the connection pool.

5

n/a

AWS_REGION

When using AWS S3, the region to use. Required for S3.

n/a

AWS_ACCESS_KEY_ID

Access Key ID for AWS S3. Required for S3.

n/a

AWS_SECRET_ACCESS_KEY

Secret Access Key for AWS S3. Required for S3.

Adaptor Metadata Configuration

The Adaptor Metadata Configuration contains the rest of the configuration for the YOUnite File System Adaptor. This configuration is stored in JSON format in the metadata of an adaptor. See Managing Adaptors, which discusses how to assign metadata to an adaptor, which can be done when the adaptor is created or later by updating the adaptor metadata in the UI or via the API.

Source Adaptor Metadata

A Source Adaptor monitors one or more paths for changes. Each path that is monitored will belong to a domain and version and be identified by a URI.

Example:

{
  "type": "source",
  "paths": [
    {
      "domain": "widgets",
      "version": "1",
      "uri": "s3://widgets-source/"
    }
  ],
  "queueSize": 10,
  "scanIntervalSeconds": 5
}

Destination Adaptor Metadata

A Destination Adaptor listens for messages from YOUnite regarding file changes and performs the work of copying the file from a source or intermediate location to a destination.

Example:

{
  "type": "destination",
  "sourceUris": [
    "s3://widgets-source/"
  ],
  "destinations": [
    {
      "domain": "widgets",
      "version": "1",
      "destinations": [
        {
          "baseUri": "s3://widgets-destination/",
          "allowRelativeUri": true
        }
      ]
    }
  ]
}

Source Adaptor Metadata Schema

{
    "type": "source",
    "paths": [ {SourcePath}, ... ],
    "queueSize": integer,
    "scanIntervalSeconds": integer
}
Property Description Type Default Value Required

type

Must be "source".

String

none

Yes

paths

List of path specifications.

List of SourcePath

none

Yes

queueSize

Size of the queue for asynchronous processing, or 0 for synchronous processing.

Integer

0

No

scanIntervalSeconds

Interval at which to scan the path for modifications.

Integer

10

No

SourcePath

A SourcePath contains the details of a path to be monitored.

{
    "domain": "string",
    "version": "string",
    "uri": "string",
    "filter": {Filter},
    "writeToIntermediate": boolean,
    "baseIntermediateUri": "string",
    "intermediateRewriteRules": [ {Transformation}, ... ],
    "baseDestinationUri": "string",
    "destinationRewriteRules": [ {Transformation}, ... ],
}
Property Description Type Default Value Required

domain

Domain name.

String

none

Yes

version

Domain version.

String

none

Yes

uri

URI of the folder to be monitored.

String

none

Yes

filter

Filter to limit the files / folders monitored.

Filter

none

No

writeToIntermediate

Write files to an intermediate file system.

Boolean

false

No

baseIntermediateUri

Base URI of the intermediate file system.

String

none

No

intermediateRewriteRules

Transformations to apply to to the intermediate URI.

List of Transformation

none

No

baseDestinationUri

Base URI of the destination file system.

String

none

No

destinationRewriteRules

Transformations to apply to to the destination URI.

List of Transformation

none

No

Filter

A Filter contains specifications to limit the items that are monitored in a path. Certain filter types (AND, OR, NOT) have child Filter entries which allow for creation of complex rules.

{
    "type": "string",
    "value": object,
    "uri": "string",
    "children": [{Filter}, ...]
}
Property Description Type Default Value Required

type

Filter type. See types below.

String

none

Yes

value

Filter value. Differs depending on type. See types below.

String or Integer

none

Depends on type

children

Child Filter entries. Applicable to AND, OR and NOT. See types below.

List of Filter

none

Depends on type

Filter Types
Type Description Value Type Children?

AND

Evaluates to true if all child entries evaluate to true.

n/a

One or more

OR

Evaluates to true if any child entry evaluates to true.

n/a

One or more

NOT

Evaluates to true if its child entry evaluates to false.

n/a

Exactly one

IS_FILE

Include files only.

n/a

n/a

IS_DIRECTORY

Include directories only.

n/a

n/a

MODIFIED_AFTER

Include files modified after a date, as a Unix timestamp in milliseconds.

Integer

n/a

CAN_READ

Include readable files only.

n/a

n/a

HIDDEN

Include hidden files only.

n/a

n/a

VISIBLE

Include visible files only (not hidden).

n/a

n/a

REGEX

Includes files and folders whose name matches a regex pattern.

String

n/a

PREFIX

Include files and folders whose name starts with a prefix.

String

n/a

SUFFIX

Include files and folders whose name ends with a suffix.

String

n/a

MINIMUM_SIZE

Minimum file size.

Integer

n/a

MAXIMUM_SIZE

Maximum file size.

Integer

n/a

WARNING!

Filters are applied to both files AND folders. So to find files that end in .txt, for example, you would need an OR filter with two children: IS_DIRECTORY, SUFFIX (".txt"). Otherwise, folders would also be checked to see if they end in .txt.

Transformation

A Transformation defines rules to transform a String using Regular Expressions:

{
    "filterRegex": "string",
    "findRegex": "string",
    "replaceRegex": "string",
    "replaceAll": boolean
}
Property Description Type Default Value Required

filterRegex

Regular expression used to determine whether to apply the transformation. The value must fully match the regular expression to be transformed.

String

none

No

findRegex

Regular expression to find the elements in the string to replace.

String

none

Yes

replaceRegex

Regular expression used to replace the elements in the string.

String

none

Yes

replaceAll

True or false indicating whether to replace all or just the first occurrence.

Boolean

true

No

Notes:

  • Regular Expressions are in Java syntax and the methods called are matches(filterRegex), replaceAll(findRegex, replaceRegex) and replaceFirst(findRegex, replaceRegex).

  • filterRegex must fully match the string, not just a part of the string.

Destination Adaptor Metadata Schema

{
    "type": "string",
    "sourceUris": ["string", ...],
    "destinations": [{DestinationPaths}, ...]
}
Property Description Type Default Value Required

type

Must be "destination".

String

none

Yes

sourceUris

List of base URIs from which this adaptor can transfer files.

List of String

none

Yes

destinations

List of destinations.

List of DestinationPaths

none

Yes

DestinationPaths

DestinationPaths is a specification of one or more destinations for data events for a domain:

{
    "domain": "string",
    "version": "string",
    "destinations": [{DestinationPath}, ...]
}
Property Description Type Default Value Required

domain

Domain name.

String

none

Yes

version

Domain version.

String

none

Yes

destinations

List of destinations.

List of DestinationPath

none

Yes

DestinationPath

A DestinationPath is a possible destination for a file:

{
    "baseUri": "string",
    "allowRelativeUri": boolean,
    "filterRegex": "string",
    "rewriteRules: [{Transformation}, ...]
}
Property Description Type Default Value Required

baseUri

Base URI of the destination, ie the top-level folder at which to copy files.

String

none

Yes

allowRelativeUri

Allow relative URIs? See below.

Boolean

false

No

filterRegex

Regular expression used to determine whether to use this destination. See below.

String

none

No

rewriteRules

Destination URI transformations.

List of Transformation

none

No

Notes:

  • The ultimate URI where the file will be written is resolved by the following order of operations:

    • First the regex filter is checked and if the destination URI does not match it will not be allowed. The destination URI must match the regular expression completely. This corresponds to the Java String.matches() method.

    • Second, the rewrite rules are run to transform the destination URI.

    • Third, if the URI is relative (it has no scheme) and allowRelativeUri = true, it will be resolved against the base URI. For example, if the base URI is file:///destination-folder/ and the destination URI is subfolder/abc.txt, the final URI will be file:///destination-folder/subfolder/abc.txt.

Configuration Patterns

The following are some examples of how to configure the Source and Destination adaptors metadata for common use cases.

Relative URIs from source to destination

The simplest and most typical configuration involves copying files from one system to another without changing anything about them. It is important with this pattern to set allowRelativeUri on the destination to true.

Source Metadata:

{
  "type": "source",
  "paths": [
    {
      "domain": "widgets",
      "version": "1",
      "uri": "s3://widgets-source/"
    }
  ]
}

Destination Metadata:

{
  "type": "destination",
  "sourceUris": [
    "s3://widgets-source/"
  ],
  "destinations": [
    {
      "domain": "widgets",
      "version": "1",
      "destinations": [
        {
          "baseUri": "s3://widgets-destination/",
          "allowRelativeUri": true
        }
      ]
    }
  ]
}

Using a Cloud-based intermediate location

This example is similar to the one above, but uses a cloud-based intermediate location because the destination adaptor does not have access to the source file system. The source adaptor detects changes and pushes them into S3. The destination adaptor receives notifications of changes and reads them from S3 before writing them to their destination.

Source Metadata:

{
  "type": "source",
  "paths": [
    {
      "domain": "widgets",
      "version": "1",
      "uri": "file:///widgets-source/",
      "writeToIntermediate": true,
      "baseIntermediateUri": "s3://widgets-intermediate/"
    }
  ]
}

Destination Metadata:

{
  "type": "destination",
  "sourceUris": [
    "s3://widgets-intermediate/"
  ],
  "destinations": [
    {
      "domain": "widgets",
      "version": "1",
      "destinations": [
        {
          "baseUri": "file:///widgets-destination/",
          "allowRelativeUri": true
        }
      ]
    }
  ]
}

Writing to different folders based on file name

This example adds extra configuration to the destination adaptor to write files to different destination folders based on their extension. .pdf files are written to one location and .txt and .log files to another, with .log files having their extension renamed to .txt.

Source Metadata:

{
  "type": "source",
  "paths": [
    {
      "domain": "widgets",
      "version": "1",
      "uri": "s3://widgets-source/"
    }
  ]
}

Destination Metadata:

{
  "type": "destination",
  "sourceUris": [
    "s3://widgets-source/"
  ],
  "destinations": [
    {
      "domain": "widgets",
      "version": "1",
      "destinations": [
        {
          "baseUri": "s3://widgets-pdf-files/",
          "allowRelativeUri": true,
          "filterRegex": ".+\\.pdf"
        },
        {
          "baseUri": "s3://widgets-txt-files/",
          "allowRelativeUri": true,
          "filterRegex": ".+(\\.txt|\\.log)",
          "rewriteRules": [
              {
                "filterRegex": ".+\\.log",
                "findRegex": "^(.+)\\.log$",
                "replaceRegex": "$1\\.txt"
              }
          ]
        }
      ]
    }
  ]
}

Appendix A: Docker / Kubernetes Setup

Database Persistence

In a production environment, the databases should be persisted so that in the event of a failure, they will continue from where they left off on restart. By default, the databases are stored in /data. This volume should be mounted to a persistent location.

File System Volumes

The Docker image comes with two folders for convenience that may be mounted to a shared file system to monitor changes:

/monitor/sources
/monitor/destinations

For example, if /monitor/sources is mounted to a shared file system that has a folder widgets that needs to be monitored, the source URI would be file:///monitor/sources/widgets/.

These volumes may be useful when the file system that needs to be monitored is in another Docker container / pod.

Other File Sharing Options

  • Samba and NFS are both suitable options for accessing files on remote systems. The base YOUnite File System Adaptor docker image could be used with some additional commands to mount the necessary systems.

  • Apache VFS has FTP/SFTP/FTPS support built in, which is another simple option (though it will not be as fast as Samba or NFS).