Data Load Task

This type of task loads data from an external source into the Selligent database. When a data load task is executed, a temporary table is automatically created.  Per task only one temporary table can be created.  At the end of the job, a Stored Procedure is called that merges the data in the temporary table into a Selligent table.

NOTE: Customers who have used the previous version of the Selligent Dataloader, check the following in the table DATALOADER_FILEPROC: the field SIMJOBID of type int must be present. If this is not the case, add it.

 

1. Select the import source location (Source tab)

Load from file

A selection of media is available. The list is the same as for the Data export: FTP/FTPS/SFTP, Repository and URL. Depending on the selected medium, different parameters are defined.

Repository:

Load file from disk, the file must be in one of the Jobagent repository locations. These are pre-defined locations (Selligent settings) that can be used by the Jobagent to store or load files.

  • Data exchange: repository root location where the file is located.
  • Subfolder: subfolder in the repository. Use the browse button to configure it.
FTP/FTPS/SFTP:
  • Server: the FTP server where the file is located
  • Login
  • Password
  • Subfolder: the subfolder on the FTP server where the file is located. If the subfolder is empty the file will be loaded from the FTP root.
URL:
  • URL: the URL that points to the file that must be loaded
  • Login: the login for the URL
  • Password: the password for the URL

 

It is possible to define a filter on the files imported: an extension is entered to limit import only to these files.

A trigger file can be used. Just check the option and enter the extension of the file. This trigger file serves to check if the import file is complete. If the trigger file exists, the import can start.

NOTE: the trigger file has the same name as the import file; the different lies in the extension.

 

Several additional import options are available:

  • Allow multiple imports of the same file name: a certain file can be uploaded more than once. The list of uploaded files is maintained over job executions so if this option is unchecked you cannot import a file named “my_data.csv” on a daily basis for instance.
  • Fail if no files are found: determines whether the load job fails if no files are found that match the file filter.
  • Delete after processing
  • Keep a backup file: enable this setting to backup the files that have been processed to the backup folder of the Job agent. The backup retention specifies how many backup files are kept in the backup folder (i.e. if the retention is set to 10 and the job runs once per day there will be 10 days of backups). Besides the retention setting there is also a configurable backup quota in the Selligent settings. This backup has priority over retention (i.e., If you specify a retention of 10 executions and a quota of 500MB than if each backup file = approx. 100MB in size there will only be 5 backups in the backup folder).

 

File options

When ‘Load from file’ is selected, additional File options are defined at the bottom of the page:

A file type needs to be selected. A choice can be made between Comma separated, RSS, TAB, XLS, XML and FIXED width. Depending on the selected type, different options are set:

  • RSS: the root node must be indicated. All child nodes within the root are retrieved.
  • Comma separated:  following file options are provided.
  • Tab: when a CSV file is imported with TAB delimiter, a number of file options can be defined for the file structure.
  • XLS:  Use the options below to define how the XLS is structured and should be interpreted by the Jobagent.
    • Sheet name: the name of the sheet in the xls file that is subject of the import
    • Rows to skip: indicate the number of rows in the file that should be excluded
    • First line contains column headers: if the xls file contains column headers, check this option. The row will be interpreted as such.
  • FixedLen: all fields have a fixed length. The options allow setting the length of the different columns in the source. The structure of the column length is as follows: Col1Name;Col1Length|Col2Name;Col2Length (i.e., ID;10|NAME;50|DESC;250)
  • XML: the incoming XML file must be parsed. The XML format descriptor allows defining tables and fields that need to be distracted from the XML file. (technical note on XML parsing)
  • JSON: The incoming JSON must be parsed. The format descriptor allows defining tables and fields that need to be distracted from the JSON.

Some of the above file types have an additional option: Batch size. If configured, the file processor is executed for every batch and data load is performed in batches with a maximum of x rows.

Technical note: when using batches, the temporary sync table does not contain data. When no batch files are used, this table will contain all data.

The option 'Check MD5 from file name' is used to double check the content of the file. This MD5 key is added to the file name and created based on the content of the file. When the option is activated, the MD5 key is checked against the content of the file and enables the detection of changes in the file.

 

2. Set encryption and unzip option (Processing tab)

If the source file is zipped, check the option to unzip the files first.

If the source file is encrypted with PGP, the user must enter the pass phrase and the recipient. The recipient is in fact an email address, used to retrieve the public key. The Passphrase is used as a password to decrypt.

If the option 'Use processing folders' is used, customers will be able to control more and check more what happens during the processing of the file. While being treated files are put in the processing folder, when completed in the Completed folder and when something went wrong these files will be stored in the Failed folder. The folders are located on the (S)FTP(S) server .

 

3. Select the intermediate destination table (Destination tab)

The 'Destination' tab is used to determine the temporary table where data is stored during the loading process. Plus, it allows identifying the Stored Procedure executed afterwards that is in charge of merging data in the temporary table with the Selligent table.

Only one table can be created per task. Enter the name of the table.

NOTE: the name of the table can be entered freely. However, following limitations are in place:
the name of an existing database table is not allowed
The table may not be in use by a Selligent list
The table name cannot start with 'SURVEY_' or 'INTERFACE_'

Include user-identifier MAID column — Check the option to include a 'MAID'column if repeated imports occur. If this checkbox is activated, an extra column 'MAID' is added to the table created by the job agent. This is handy to update the records in the main table with the corresponding records of the temporary table.

To process data, a Stored Procedure must be selected. This procedure is executed after the transfer and updates the Selligent database. Check the option ‘Process data’ and press the 'Details' button to select a Stored Procedure.

NOTE : To change the selected Stored Procedure, you need to press 'Clear' first and then 'Details' to select a different one.

 

4. Export rejected records (Rejection tab)

The 'Reject management' tab is used when the data source needs to be validated. All the records that are not valid are exported in a separate file.

Basic concept:

  • Data is transferred into a temporary table
  • When 'Reject Management' is activated, two more columns are added (OPTI_REJECTED & OPTI_REJECTED_REASON) to the temporary table
  • The Stored Procedure performs checks on data. If some records have corrupted information, the column OPTI_REJECTED has to be set to ‘1’ meaning that the record is rejected. A custom reject message can be set into the OPTI_REJECTED_REASON field.
  • Afterwards, the Stored Procedure transfers all the valid data from the temporary table into the Selligent table.

The list of possible media types comprises repository, FTP, SFTP, FTPS,  and URL. The parameters to fill out, depend on the selected media type.

The user can select a subfolder to store the file. The file type is by default Comma separated.

Back to Tasks