Read Me First!
Before reading this document, it is recommended that Managing Adaptors is read as it contains
information about the entire process of configuring and deploying an Adaptor, whereas this document deals specifically
with the off-the-shelf YOUnite File System Adaptor
.
Overview
The File System Adaptor is actually two types of adaptors: a source and a destination. The source adaptor monitors folders for changes and sends them to YOUnite. The destination adaptor receives messages from YOUnite and performs the file operation.
The source and destination cannot be the same adaptor.
Features
-
File system support: everything Apache VFS supports as well as Amazon S3. See http://commons.apache.org/proper/commons-vfs/filesystems.html
-
NOTE: For monitoring to succeed it needs to know the last modified time and size of each file. Some file systems supported by VFS may not be able to supply this information and therefore cannot be monitored.
-
-
Fast backing database (H2) runs on the local file system, ensures that on restart or recovery from failure that file system monitoring resumes where it left off.
-
Scanning and processing changes are asynchronous, for efficiency.
Limitations
-
Entire list of monitored files is stored in memory during monitoring. It is backed by a database only for restart or recovery from failure.
-
H2 is only currently supported database.
File Name Limitations
-
File names can contain nearly any character EXCEPT forward slash "/" and percent sign "%". Otherwise, as long as the file system can handle it, any other character should be acceptable.
Domain Configuration
For each file system that is being monitored for changes a Domain must be created with this exact schema. This is the metadata that is sent from the Source Adaptor to YOUnite, which then passes the information on the Destination Adaptor:
-
The DR Key Properties MUST be:
[ "adaptorUuid", "sourceUri" ]
-
The Schema MUST be:
{ "adaptorUuid": { "type": "string", "required": true }, "sourceUri": { "type": "string", "required": true }, "intermediateUri": { "type": "string" }, "destinationUri": { "type": "string" }, "action": { "type": "enum", "required": true, "enumType": "string", "data": [ "NEW", "UPDATED", "DELETED" ] }, "contentLength": { "type": "int", "required": true }, "additionalParameters": { "type": "array", "items": { "key": { "type": "string", "required": true }, "value": { "type": "string", "required": true } } } }
Adaptor Configuration
All configuration items may be specified as environment variables (usually uppercase with underscores delimiting words)
or put in a file /adaptor.properties
(usually lowercase with periods delimiting words).
Base Configuration Properties
Before starting on this section, read Managing Adaptors, which discusses how to create
and retrieve the adaptor credentials. Once the adaptor credentials have been retrieved, the following properties must
be configured either as environment variables or included in the adaptor.properties
file.
Required properties:
Property | Environment Variable | Description | Example Value |
---|---|---|---|
adaptor.uuid |
ADAPTOR_UUID |
UUID of the adaptor |
|
client.id |
CLIENT_ID |
Username used to connect to message bus |
|
client.secret |
CLIENT_SECRET |
Password used to connect to message bus |
|
message.bus.url |
MESSAGE_BUS_URL |
Message Bus URL |
|
auth.server.url |
AUTH_SERVER_URL |
OAuth Server to validate adaptor access credentials. The YOUnite Server runs an embedded OAuth server that your implementation may be using. By default it runs on port 8080 so, in this case the value would be http://ip-address-of-the-YOUnite-Server:8080 |
Optional properties:
Property | Environment Variable | Description | Default Value |
---|---|---|---|
hostname |
HOSTNAME |
Name of the host for the health check endpoint. Note that Kubernetes sets this environment variable by default. |
localhost |
port |
PORT |
HTTP Port to expose for health checks. The health check endpoint at /health, ie GET host:8001/health |
8001 |
concurrency |
CONCURRENCY |
Concurrency for incoming data messages, ie, number of threads that will handle incoming messages. Note that this could lead to messages being processed out of order, though they are guaranteed to be processed in order based on their message grouping (the default is unique ID of the record, ie the DR UUID). |
1 |
ops.concurrency |
OPS_CONCURRENCY |
Concurrency for incoming ops messages, ie, number of threads that will handle incoming messages. These are messages with adaptor status information and metadata, request for stats, shutdown requests. Expressed as a range, ie "1-5" or a single value indicating the maximum (with an implied minimum of 1). |
1 |
assembly.concurrency |
ASSEMBLY_CONCURRENCY |
Concurrency for incoming assembly messages, ie, number of threads that will handle incoming messages. These are federated GET requests. Expressed as a range, ie "1-5" or a single value indicating the maximum (with an implied minimum of 1). |
1 |
session.cache.size |
SESSION_CACHE_SIZE |
JMS session cache size. |
10 |
enable.metrics |
ENABLE_METRICS |
Enable / disable displaying metrics at a given interval. |
true |
metrics.log.interval |
METRIC_LOGS_INTERVAL |
Interval in milliseconds to display metrics. |
3600000 (one hour) |
n/a |
LOG_LEVEL |
Log level for YOUnite logback messages. Applies to |
INFO |
n/a |
ROOT_LOG_LEVEL |
Log level for non-YOUnite logback messages. Applies to |
INFO |
Example:
# Configuration
# Adaptor UUID
adaptor.uuid = 3dfcc03d-e5d4-4d57-9e9b-5c5d2db32f9a
# ClientID and Secret to be used by JMS to connect to the message bus
client.id = 8c9167a6-bb83-4f77-bdfc-1947a946f77b
client.secret = de02e3fa-4b23-46cb-aed6-5665a16e73d3
# Message Bus URL
message.bus.url = tcp://192.2.200.25:61616
# Optional configuration
concurrency = 4
ops.concurrency = 1-4
assembly.concurrency = 2-2
session.cache.size = 10
Health Check Endpoint
The adaptor includes a HTTP endpoint to check the health of the adaptor at /health. The default port is 8001.
If the adaptor is healthy, it will return a 200 OK
response when querying the endpoint. If it’s not healthy,
it will return a 503 Service Unavailable
. For example:
{
"status": "UP"
}
In addition, it will return some information about the adaptor such as the status of each service and the uptime. For example, if the employee v1 domain handler is not working, the adaptor might return something like this:
{
"status": "DOWN",
"adaptorService": "UP",
"upTimeMilliseconds": 100000,
"customer:1": "UP",
"employee:1": "DOWN"
}
Additional Configuration Properties
In addition to the above properties, the following optional options are specific to the File System and S3 Adaptor.
Most of this configuration is for the two H2 databases that are used to persist data. These will configure themselves by default but their configuration can be changed with the following options:
Property | Environment Variable | Description | Default Value |
---|---|---|---|
data.directory |
DATA_DIRECTORY |
Directory where databases are stored. |
/data in the Docker image, or home (~) when run locally |
file.monitor.database.name |
FILE_MONITOR_DATABASE_NAME |
Name of the file, without extension, of the H2 database in DATA_DIRECTORY to store file monitoring data. |
file_monitor |
file.monitor.database.username |
FILE_MONITOR_DATABASE_USERNAME |
Username for the H2 file monitoring database. |
sa |
file.monitor.database.password |
FILE_MONITOR_DATABASE_PASSWORD |
Password for the H2 file monitoring database. |
sa |
file.monitor.database.connection.pool.size |
FILE_MONITOR_DATABASE_CONNECTION_POOL_SIZE |
Size of the connection pool. |
5 |
n/a |
AWS_REGION |
When using AWS S3, the region to use. Required for S3. |
|
n/a |
AWS_ACCESS_KEY_ID |
Access Key ID for AWS S3. Required for S3. |
|
n/a |
AWS_SECRET_ACCESS_KEY |
Secret Access Key for AWS S3. Required for S3. |
Adaptor Metadata Configuration
The Adaptor Metadata Configuration contains the rest of the configuration for the YOUnite File System Adaptor
. This
configuration is stored in JSON format in the metadata of an adaptor. See Managing Adaptors, which
discusses how to assign metadata to an adaptor, which can be done when the adaptor is created or later by updating the
adaptor metadata in the UI or via the API.
Source Adaptor Metadata
A Source Adaptor monitors one or more paths for changes. Each path that is monitored will belong to a domain and version and be identified by a URI.
Example:
{
"type": "source",
"paths": [
{
"domain": "widgets",
"version": "1",
"uri": "s3://widgets-source/"
}
],
"queueSize": 10,
"scanIntervalSeconds": 5
}
Destination Adaptor Metadata
A Destination Adaptor listens for messages from YOUnite regarding file changes and performs the work of copying the file from a source or intermediate location to a destination.
Example:
{
"type": "destination",
"sourceUris": [
"s3://widgets-source/"
],
"destinations": [
{
"domain": "widgets",
"version": "1",
"destinations": [
{
"baseUri": "s3://widgets-destination/",
"allowRelativeUri": true
}
]
}
]
}
Source Adaptor Metadata Schema
{
"type": "source",
"paths": [ {SourcePath}, ... ],
"queueSize": integer,
"scanIntervalSeconds": integer
}
Property | Description | Type | Default Value | Required |
---|---|---|---|---|
type |
Must be "source". |
String |
none |
Yes |
paths |
List of path specifications. |
List of SourcePath |
none |
Yes |
queueSize |
Size of the queue for asynchronous processing, or 0 for synchronous processing. |
Integer |
0 |
No |
scanIntervalSeconds |
Interval at which to scan the path for modifications. |
Integer |
10 |
No |
SourcePath
A SourcePath contains the details of a path to be monitored.
{
"domain": "string",
"version": "string",
"uri": "string",
"filter": {Filter},
"writeToIntermediate": boolean,
"baseIntermediateUri": "string",
"intermediateRewriteRules": [ {Transformation}, ... ],
"baseDestinationUri": "string",
"destinationRewriteRules": [ {Transformation}, ... ],
}
Property | Description | Type | Default Value | Required |
---|---|---|---|---|
domain |
Domain name. |
String |
none |
Yes |
version |
Domain version. |
String |
none |
Yes |
uri |
URI of the folder to be monitored. |
String |
none |
Yes |
filter |
Filter to limit the files / folders monitored. |
none |
No |
|
writeToIntermediate |
Write files to an intermediate file system. |
Boolean |
false |
No |
baseIntermediateUri |
Base URI of the intermediate file system. |
String |
none |
No |
intermediateRewriteRules |
Transformations to apply to to the intermediate URI. |
List of Transformation |
none |
No |
baseDestinationUri |
Base URI of the destination file system. |
String |
none |
No |
destinationRewriteRules |
Transformations to apply to to the destination URI. |
List of Transformation |
none |
No |
Filter
A Filter contains specifications to limit the items that are monitored in a path. Certain filter types (AND, OR, NOT) have child Filter entries which allow for creation of complex rules.
{
"type": "string",
"value": object,
"uri": "string",
"children": [{Filter}, ...]
}
Property | Description | Type | Default Value | Required |
---|---|---|---|---|
type |
Filter type. See types below. |
String |
none |
Yes |
value |
Filter value. Differs depending on type. See types below. |
String or Integer |
none |
Depends on type |
children |
Child Filter entries. Applicable to AND, OR and NOT. See types below. |
List of Filter |
none |
Depends on type |
Filter Types
Type | Description | Value Type | Children? |
---|---|---|---|
AND |
Evaluates to true if all child entries evaluate to true. |
n/a |
One or more |
OR |
Evaluates to true if any child entry evaluates to true. |
n/a |
One or more |
NOT |
Evaluates to true if its child entry evaluates to false. |
n/a |
Exactly one |
IS_FILE |
Include files only. |
n/a |
n/a |
IS_DIRECTORY |
Include directories only. |
n/a |
n/a |
MODIFIED_AFTER |
Include files modified after a date, as a Unix timestamp in milliseconds. |
Integer |
n/a |
CAN_READ |
Include readable files only. |
n/a |
n/a |
HIDDEN |
Include hidden files only. |
n/a |
n/a |
VISIBLE |
Include visible files only (not hidden). |
n/a |
n/a |
REGEX |
Includes files and folders whose name matches a regex pattern. |
String |
n/a |
PREFIX |
Include files and folders whose name starts with a prefix. |
String |
n/a |
SUFFIX |
Include files and folders whose name ends with a suffix. |
String |
n/a |
MINIMUM_SIZE |
Minimum file size. |
Integer |
n/a |
MAXIMUM_SIZE |
Maximum file size. |
Integer |
n/a |
WARNING!
Filters are applied to both files AND folders. So to find files that end in .txt
, for example, you would need an
OR
filter with two children: IS_DIRECTORY
, SUFFIX
(".txt"). Otherwise, folders would also be checked to see if they
end in .txt
.
Transformation
A Transformation defines rules to transform a String using Regular Expressions:
{
"filterRegex": "string",
"findRegex": "string",
"replaceRegex": "string",
"replaceAll": boolean
}
Property | Description | Type | Default Value | Required |
---|---|---|---|---|
filterRegex |
Regular expression used to determine whether to apply the transformation. The value must fully match the regular expression to be transformed. |
String |
none |
No |
findRegex |
Regular expression to find the elements in the string to replace. |
String |
none |
Yes |
replaceRegex |
Regular expression used to replace the elements in the string. |
String |
none |
Yes |
replaceAll |
True or false indicating whether to replace all or just the first occurrence. |
Boolean |
true |
No |
Notes:
-
Regular Expressions are in Java syntax and the methods called are matches(filterRegex), replaceAll(findRegex, replaceRegex) and replaceFirst(findRegex, replaceRegex).
-
filterRegex must fully match the string, not just a part of the string.
Destination Adaptor Metadata Schema
{
"type": "string",
"sourceUris": ["string", ...],
"destinations": [{DestinationPaths}, ...]
}
Property | Description | Type | Default Value | Required |
---|---|---|---|---|
type |
Must be "destination". |
String |
none |
Yes |
sourceUris |
List of base URIs from which this adaptor can transfer files. |
List of String |
none |
Yes |
destinations |
List of destinations. |
List of DestinationPaths |
none |
Yes |
DestinationPaths
DestinationPaths is a specification of one or more destinations for data events for a domain:
{
"domain": "string",
"version": "string",
"destinations": [{DestinationPath}, ...]
}
Property | Description | Type | Default Value | Required |
---|---|---|---|---|
domain |
Domain name. |
String |
none |
Yes |
version |
Domain version. |
String |
none |
Yes |
destinations |
List of destinations. |
List of DestinationPath |
none |
Yes |
DestinationPath
A DestinationPath is a possible destination for a file:
{
"baseUri": "string",
"allowRelativeUri": boolean,
"filterRegex": "string",
"rewriteRules: [{Transformation}, ...]
}
Property | Description | Type | Default Value | Required |
---|---|---|---|---|
baseUri |
Base URI of the destination, ie the top-level folder at which to copy files. |
String |
none |
Yes |
allowRelativeUri |
Allow relative URIs? See below. |
Boolean |
false |
No |
filterRegex |
Regular expression used to determine whether to use this destination. See below. |
String |
none |
No |
rewriteRules |
Destination URI transformations. |
List of Transformation |
none |
No |
Notes:
-
The ultimate URI where the file will be written is resolved by the following order of operations:
-
First the regex filter is checked and if the destination URI does not match it will not be allowed. The destination URI must match the regular expression completely. This corresponds to the Java String.matches() method.
-
Second, the rewrite rules are run to transform the destination URI.
-
Third, if the URI is relative (it has no scheme) and allowRelativeUri = true, it will be resolved against the base URI. For example, if the base URI is file:///destination-folder/ and the destination URI is subfolder/abc.txt, the final URI will be file:///destination-folder/subfolder/abc.txt.
-
Configuration Patterns
The following are some examples of how to configure the Source and Destination adaptors metadata for common use cases.
Relative URIs from source to destination
The simplest and most typical configuration involves copying files from one system to another without changing anything
about them. It is important with this pattern to set allowRelativeUri
on the destination to true
.
Source Metadata:
{
"type": "source",
"paths": [
{
"domain": "widgets",
"version": "1",
"uri": "s3://widgets-source/"
}
]
}
Destination Metadata:
{
"type": "destination",
"sourceUris": [
"s3://widgets-source/"
],
"destinations": [
{
"domain": "widgets",
"version": "1",
"destinations": [
{
"baseUri": "s3://widgets-destination/",
"allowRelativeUri": true
}
]
}
]
}
Using a Cloud-based intermediate location
This example is similar to the one above, but uses a cloud-based intermediate location because the destination adaptor does not have access to the source file system. The source adaptor detects changes and pushes them into S3. The destination adaptor receives notifications of changes and reads them from S3 before writing them to their destination.
Source Metadata:
{
"type": "source",
"paths": [
{
"domain": "widgets",
"version": "1",
"uri": "file:///widgets-source/",
"writeToIntermediate": true,
"baseIntermediateUri": "s3://widgets-intermediate/"
}
]
}
Destination Metadata:
{
"type": "destination",
"sourceUris": [
"s3://widgets-intermediate/"
],
"destinations": [
{
"domain": "widgets",
"version": "1",
"destinations": [
{
"baseUri": "file:///widgets-destination/",
"allowRelativeUri": true
}
]
}
]
}
Writing to different folders based on file name
This example adds extra configuration to the destination adaptor to write files to different destination folders based
on their extension. .pdf
files are written to one location and .txt
and .log
files to another, with .log
files
having their extension renamed to .txt
.
Source Metadata:
{
"type": "source",
"paths": [
{
"domain": "widgets",
"version": "1",
"uri": "s3://widgets-source/"
}
]
}
Destination Metadata:
{
"type": "destination",
"sourceUris": [
"s3://widgets-source/"
],
"destinations": [
{
"domain": "widgets",
"version": "1",
"destinations": [
{
"baseUri": "s3://widgets-pdf-files/",
"allowRelativeUri": true,
"filterRegex": ".+\\.pdf"
},
{
"baseUri": "s3://widgets-txt-files/",
"allowRelativeUri": true,
"filterRegex": ".+(\\.txt|\\.log)",
"rewriteRules": [
{
"filterRegex": ".+\\.log",
"findRegex": "^(.+)\\.log$",
"replaceRegex": "$1\\.txt"
}
]
}
]
}
]
}
Appendix A: Docker / Kubernetes Setup
Database Persistence
In a production environment, the databases should be persisted so that in the event of a failure, they will
continue from where they left off on restart. By default, the databases are stored in /data
. This volume should be mounted
to a persistent location.
File System Volumes
The Docker image comes with two folders for convenience that may be mounted to a shared file system to monitor changes:
/monitor/sources /monitor/destinations
For example, if /monitor/sources
is mounted to a shared file system that has a folder widgets
that needs to be monitored,
the source URI would be file:///monitor/sources/widgets/
.
These volumes may be useful when the file system that needs to be monitored is in another Docker container / pod.
Other File Sharing Options
-
Samba and NFS are both suitable options for accessing files on remote systems. The base
YOUnite File System Adaptor
docker image could be used with some additional commands to mount the necessary systems. -
Apache VFS has FTP/SFTP/FTPS support built in, which is another simple option (though it will not be as fast as Samba or NFS).