Outliers

../_images/outliersnodeicon.png

Outliers Node icon

The Clario Outliers node enables you to cap outliers, within the top and bottom 10 outliers, by searching for a maximum percentage difference between one extreme value to the next. For each attribute, standardized scores (z scores) are used to identify possible and problematic outliers using an interquartile range procedure (IQR) for each variable’s minimum and maximum value. For variables having minimum and/or maximum standard scores > 3 and problematic outliers identified by IQR, suggested recodes are determined by examining the percentage change between the ten most extreme minimum values and the ten most extreme maximum values. The suggested recode is the value having the least extreme percentage change greater than the defined threshold. 200 is the default Change Threshold Percentage, meaning that if a 200+ percentage difference is found, a cap is applied to the attribute value just before the 200+ percentage difference.

The top node connector is reserved for input data from the Univariate node, typically as a direct connection from Univariate. Alternatively, the top input node can be connected to a Read File node that reads an output data file generated by a Univariate node in a separate workflow. The bottom node connector is used for the input data that requires the Outliers logic to be applied. This can be a Read File node, or any node that outputs data that you wish to recode.

Configuration

The Outliers node has only one configuration tab

../_images/outliersconfig.png

Configuration Tab

Configuration Tab

The configuration tab contains an Available Attributes and a Selected Attributes list box. The Available Attributes list box displays all of the numeric attributes for the input data stream connected to the input link node connector. Select, drag and drop the desired attributes from the Available Attributes list box to the Selected Attributes list box The Univariate input is used for determining recodes for the Outliers node. At least 1 attribute must be placed in the ‘Selected Attributes’ list. If desired, the Change Threshold Percentage (default 200%) can be changed by typing a value in the text box, or, by clicking the small up and down arrows. See tips on Finding and Selecting Attributes.

Results

There is one results set with two tabs (Summary and Pseudo Code) for the Outliers node.

Summary Tab

Summary - Contains the summary of outlier processing with one row of data for each attribute selected, showing upper and lower limit thresholds along with upper and lower limit outliers.

../_images/outlierssummary.png

Summary

Pseudo Code Tab

Pseudo Code - Contains rules for how the outliers will be recoded.

../_images/outlierspseudocode.png

Pseudo Code

Output Stream

The newly cleaned dataset is ready for immediate use in other nodes to explore, manipulate, cleanse, and model the data. The data can be exported by using the Write File node.

Table Of Contents

Previous topic

Missing

Next topic

Rank