Univariate Node icon
The Clario Univariate node allows you to understand and explore your data, by looking at frequency distributions, graphs, and a variety of statistical metadata for all attributes. You can connect the univariate node to a cleanse node (Missing, Outliers), in order to utilize the univariate output in the file cleansing process. The node connector can be connected to a variety of nodes, (e.g. Read File, Aggregate, Append, Missing, etc.), but requires a valid stream of data.
The Univariate node has only one configuration tab.
The Configuration tab contains an Available Attributes and a Selected Attributes list box. The Available Attributes list box displays all of the attributes available on the input data stream connected to the input link node connector. Univariate results will be returned for attributes chosen as Selected Attributes.
Configuration Tab
The first and only step is to select the attribute(s) to be analyzed by clicking on them in the Available Attributes box; these selected attributes will then become highlighted. Drag and drop the desired attributes from the Available Attributes list box to the Selected Attributes list box. You must select at least one attribute to run through the Univariate node. See tips on Finding and Selecting Attributes.
There is one results set with two tabs (Numeric Summary and Results Explorer) for the Univariate node.
Results Dialog Tab
Click the Numeric Summary tab for a high level summary of all attributes.
Numeric Summary Tab
Each attribute is listed, along with several summary statistics. Statistics include: Non-missing rows, Mean, Minimum, Maximum, Standard Deviation, Sum and Median. Any of these columns can be sorted for display, by clicking on the column heading.
To export these statistics to a spreadsheet, click on the Export to Spreadsheet button located on the Toolbar, and enter a filename when prompted.
Results Explorer: String
Results Explorer: Number
Click the Results Explorer tab for more detailed statistics and charts for one attribute at a time. To export these statistics for a single attribute to a spreadsheet, click on the Export to Spreadsheet button located on the Toolbar, and enter a filename when prompted. Click on any attribute listed in the left hand box, and details will be displayed in three different sections.
Summary statistics
Bar graph
Detailed Statistics
For numeric attributes, these include:
| Location | Variability | Moments | Quartile | Extremes |
|---|---|---|---|---|
| Mean | Standard Deviation | Skewness | Minimum | 10 Minimum Values |
| Median | Variance | Kurtosis | Quartile 1 | 10 Maximum Values |
| Mode | Range | Uncorrected Sum of Squares | Median (Quartile 2) | |
| Interquartile Range | Corrected Sum of Squares | Quartile 3 | ||
| Standard Error of the Mean | Maximum | |||
| Percent Coefficient of Variation | ||||
| Variable Sum | ||||
| z of Minimum | ||||
| z of Maximum |
For string attributes, these include only the Mode under Location.
For both string and numeric attributes, Summary statistics include Processed Rows, Missing Values, Non-Missing Rows, Non-Missing Coverage Percent, Distinct Values, and Percent Highest Frequency Value.
Note
Within each of these univariate tabs, you can resize the columns by clicking and dragging the column heading to the right or left. You can also resize the Results Explorer page by dragging on the three vertical lines button (|||), between the attributes list and the detailed univariate results, or between the univariate results and the graph.
The Univariate node can be connected to a Missing node or Outliers node to aid in replacing missing values or capping outliers. The Univariate output can also be written out to a file, for future use, using the Write File node.