Univariate

../_images/univariatenodeicon.png

Univariate Node icon

The Clario Univariate node allows you to understand and explore your data, by looking at frequency distributions, graphs, and a variety of statistical metadata for all attributes. You can connect the univariate node to a cleanse node (Missing, Outliers), in order to utilize the univariate output in the file cleansing process. The node connector can be connected to a variety of nodes, (e.g. Read File, Aggregate, Append, Missing, etc.), but requires a valid stream of data.

Configuration

The Univariate node has only one configuration tab.

Configuration Tab

The Configuration tab contains an Available Attributes and a Selected Attributes list box. The Available Attributes list box displays all of the attributes available on the input data stream connected to the input link node connector. Univariate results will be returned for attributes chosen as Selected Attributes.

../_images/univconfig.png

Configuration Tab

The first and only step is to select the attribute(s) to be analyzed by clicking on them in the Available Attributes box; these selected attributes will then become highlighted. Drag and drop the desired attributes from the Available Attributes list box to the Selected Attributes list box. You must select at least one attribute to run through the Univariate node. See tips on Finding and Selecting Attributes.

Results

There is one results set with two tabs (Numeric Summary and Results Explorer) for the Univariate node.

../_images/univresults.png

Results Dialog Tab

Numeric Summary Tab

Click the Numeric Summary tab for a high level summary of all attributes.

../_images/univnumericsummary.png

Numeric Summary Tab

Each attribute is listed, along with several summary statistics. Statistics include: Non-missing rows, Mean, Minimum, Maximum, Standard Deviation, Sum and Median. Any of these columns can be sorted for display, by clicking on the column heading.

To export these statistics to a spreadsheet, click on the Export to Spreadsheet button located on the Toolbar, and enter a filename when prompted.

../_images/univresultsexpstring.png

Results Explorer: String

../_images/univresultsexpnum.png

Results Explorer: Number

Results Explorer Tab

Click the Results Explorer tab for more detailed statistics and charts for one attribute at a time. To export these statistics for a single attribute to a spreadsheet, click on the Export to Spreadsheet button located on the Toolbar, and enter a filename when prompted. Click on any attribute listed in the left hand box, and details will be displayed in three different sections.

Summary statistics

  • The gray box in the upper center portion of the screen contains the summary statistics. For numeric attributes, these include Mean, Minimum, Maximum, Standard Deviation, Sum, Median and Non-missing rows.

Bar graph

  • For numeric attributes, a bar graph representation of the data is displayed along the right hand side of the screen. As you hover over each bar of the graph, actual value ranges and counts (non-missing rows) are displayed. For string attributes, this area contains distinct values of the attribute and frequencies of each value; there is also a [Copy Values to Clipboard] button to save this value frequency information.

Detailed Statistics

  • The lower center portion of the screen contains additional univariate statistics. These statistics are displayed in a series of tabs.

For numeric attributes, these include:

Location Variability Moments Quartile Extremes
Mean Standard Deviation Skewness Minimum 10 Minimum Values
Median Variance Kurtosis Quartile 1 10 Maximum Values
Mode Range Uncorrected Sum of Squares Median (Quartile 2)  
  Interquartile Range Corrected Sum of Squares Quartile 3  
    Standard Error of the Mean Maximum  
    Percent Coefficient of Variation    
    Variable Sum    
    z of Minimum    
    z of Maximum    

For string attributes, these include only the Mode under Location.

For both string and numeric attributes, Summary statistics include Processed Rows, Missing Values, Non-Missing Rows, Non-Missing Coverage Percent, Distinct Values, and Percent Highest Frequency Value.

Note

Within each of these univariate tabs, you can resize the columns by clicking and dragging the column heading to the right or left. You can also resize the Results Explorer page by dragging on the three vertical lines button (|||), between the attributes list and the detailed univariate results, or between the univariate results and the graph.

Output Stream

The Univariate node can be connected to a Missing node or Outliers node to aid in replacing missing values or capping outliers. The Univariate output can also be written out to a file, for future use, using the Write File node.

Table Of Contents

Previous topic

Transform

Next topic

Write File