Overview

Data can be accessed using the Data Virtualization service (DVS) from tools such as business intelligence (BI) tools.

In this example we will use Pentaho Data Integration BI tool to read some customer records from the YOUnite DVS.

Things to Know About Kettle

  • Jobs typically contain one or more jobs and transformations.

  • Variables created in a transformation are called field names and need to be converted to environment variables before setting them in the parent job using the "Set variables" step.

  • Transformation options are confusing:

    • Sometimes a transformation works when these are all set:

      • Execute Every input row

      • Clear results rows before execution

      • Clear results files before execution

    • Other times it is just these:

      • Clear results rows before execution

      • Clear results files before execution

  • Logging - use "Write to Log"""

    • field names: Select the field names from the "Field" section - no fancy output options

    • variables: Use the "Write to Log" window

  • The "Modified JavaScript value" is very useful but understand that the field name it is creating or modifying should have its "Replace value 'Fieldname' or 'Rename to' " set to "Y" or a duplicate field name is created with a _1 appended to it.

  • Variable scopes:

    • Valid in root job: makes the variable global to all sub jobs and transformations

    • Valid in Java Virtual Machine: makes the variable global to ALL jobs running in the running instance of PDI. So all parent jobs - watch out.

  • You will not be able to see variable value in the same transformation, where you have set/changed it. Basically, variable values for transformation are read on its initialization. So, you can change variable value for parent/grandparent job, but not for same transformation.

  • Sequence matters when setting fields in a transformation. Set the local only fields (e.g. w/Data Grid) first and then read any variables into fields with "Get Variables" as a follow-on step.

  • Log Levels: Nearly always set to "Minimal"