Read File Node
Once you’ve uploaded data into Clario using an FTP Client, and have added the data to your project, you still need to add it to the workflow; to do that, use the Read File node. The Clario node Read File is used to read in either a delimited or fixed-length flat file. Files to be read in must be associated with the Clario project in which you created your workflow.
Note
All workflows must begin with one or more Read File nodes.
The Read File node has two configuration tabs: Define File, and Define Attributes
Define File tab
The first step in configuring a Read File node is specifying the the input file name(s). This can be accomplished in one of two ways: Clicking [New] and entering the file name; Clicking [Select] and selecting one or more files with the file browser.
Create a new file name by clicking [New] and entering the file name. Adding file name(s) in this fashion allows you to embed Project Constant references in the file name. To reference a constant in a file name, place the constant name in squiggly braces. For example, to reference the “AsOfDate” constant, enter the file name like: Contacts-{AsOfDate}.csv (see Figure)
File name with constant reference
Select one or more files by clicking [Select] which launches the File Browser. Only files that have been associated with the project will be displayed. Select the desired file(s) from the list and click [OK]. If multiple file names are specified, read file will append the files together during execution (in the same fashion as the Append node) in the order the file names are specified.
Note
ALL files must have identical structures (e.g. type, enclosure, header rows, attributes, etc.) for the append to work.
Once one or more file(s) have been specified, select a file from the list and click the [Preview] button to display a Raw File Preview of the first 50 rows of the selected file, unformatted. This is a great way to learn the structure of the file. To close raw file preview, click [Close] in the bottom right corner.
The second step is to select a file type. Options are delimited or fixed. After selecting a file, the user interface will display different options depending on your selection.
For delimited data files, select the attribute delimiter from the drop down list of ‘Delimiter’. The available delimiters are comma, semi-colon, pipe, double pipe, tab and space. Next, if one is present, the enclosure (single quote or double quote) must be specified. Leave enclosure blank if none is present.
For either delimited or fixed data files, if there are header rows present in the file, click on the check box and specify the number of rows for each header. The header rows will be excluded from any further data processing, but will be used in the define attributes tab by the Attribute Guesser.
To read and process a sequential subset (from the beginning) of the data in the selected data file, uncheck [All] and specify a value in the [# Rows to Read From File] field. Leave the [All] box checked to read all rows in the file.
If there are any errors in reading data you may choose to set these errors to null values; simply check the box [Set Parse Errors to Null]. Otherwise leave the box unchecked. For example, if the defined attribute is numeric, but there is one row that contains an alpha, the value for the row and attribute will be set to NULL. The next box asks whether you want to discard malformed rows. Checking this box will place null values in malformed rows (missing delimiters, errant carriage returns) and will return a count of malformed rows in the Run Log. If this box is not checked and the read node encounters malformed rows it will result in a failed run.
Define Attributes tab
The Define Attributes user interface changes slightly depending on what type of file was selected in the define file node tab.
If you specified a delimited file on the define file tab, there is an Attribute Guesser button in the lower left corner. Clicking this button will use the define file configuration to guess the Name, Type and, in the case of Date Types, Format of each attribute. Name and Type are required for each attribute in the data file. If an attribute’s type is Date, a Format is required. If the [header rows] check box is checked on the define file node tab, the data in the first header row will be used to fill in the Attribute Names. If the Header Rows check box is not checked the Attribute Names default to attribute1, attribute2, etc. The first 100 data rows (after skipping the number of rows specified as headers in the # Rows value on the define file node tab) will be used to determine the field Type (String, Number or Date).
If you specified a fixed-length file for each attribute in the data file, you must enter the Name, Type, Start Position and Length of each attribute; Format is only required for Date Types. Each data row starts in position 1. For each attribute, enter a valid Name, select the correct Type, and enter the Start Position and Length value. It is possible to group together a block of contiguous data attributes by entering the start position of the first attribute and the total length of all attributes combined. This is helpful if you want to treat a group of data as one attribute and just pass it through processing. Selected attributes can be read from the file by entering the Name, Type, Start and Length of those attributes you are interested in and ignoring the remainder of the data file. Also, the same attribute can be read in multiple times. For example, you may want to read all digits of a zipcode (5 digits), all digits of a zip+4 (9 digits) as well as just sectional centers, the first three digits of zipcode (3 digits).
Regardless of the type of file selected in the define file node tab, there is a Formatted File Preview button in the bottom right that will display the first 50 rows of data in a grid using the configuration from both the define file and define attributes tabs. This is useful to check that you have properly defined all attributes.
If you want to modify an attribute, double-click on Attribute Name (or Type) for the row you wish to change and make the desired modifications in the resulting popup; click [Save] to accept the changes, or [Cancel] to discard them.
Edit attribute popup
Additionally, the Read File node allows users to Import Attribute Definitions directly from a comma separated file via the [Import] button. It can be helpful to save the Read File attribute definitions for future use in another workflow. To do so simply click on [Export] and you’ll be prompted to enter a filename before downloading the spreadsheet.
Note
Valid Clario data types are String, Number, and Date. For Dates, valid Formats are listed in a drop-down box. If the Date Format is not listed, you can read the attribute in as a string or number and then transform it into the desired format using the Transform node.
Invalid Attribute Error
The Read File node gives you the ability to name attributes, see tips on Valid Characters for Attribute Names. If the Attribute Guesser has been pressed and invalid attribute names appear, they become highlighted in red and an error occurs. To fix the error you must first click the specific attribute to highlight it, click once more to access the text box, then use the valid keys listed above to rename the attribute.
See the Read File Node Results. It is assumed the Read File node will be connected to another node to actually process the data. In fact, a valid workflow must have at least two nodes (a Read File node and at least one more node).
The Read File node is designed to stream data into other nodes to explore, manipulate, cleanse, and model the data. The data can be exported at any point in a workflow by using the Write File node.