Missing

../_images/missingnodeicon.png

Missing Node icon

The Clario Missing node gives you the ability to make appropriate assignments for attribute missing values. The top node connector is reserved for input data from the Univariate node, either as a direct connection between nodes, or as a Read File node that is configured to read data from an output data file generated by a Univariate node in a separate workflow. The node containing the input data stream with missing data to be replaced is connected to the bottom node connector of the Missing node.

Configuration

The Missing node has two configuration tabs: Configure and Summary.

Configure Tab

The Configure tab contains a list of all input data stream attributes from the data stream connected to the bottom input link node connector. Each treatment generates and applies a logical expression that is resolved against its assigned attribute in every row of data. The Configure tab also contains an Available Attributes and a Selected Attributes list box. The Available Attributes box displays all of the attributes for the input data stream connected to the input link node connector. The Univariate input connected to the top input link node connector is used for executing the Mean, Mode, Minimum, or Maximum methods listed below.

To use an existing treatment, select a treatment from the Treatments box (will highlight in green), select some attribute(s) from the Available Attributes box (will highlight in green), then drag and drop the desired attributes to the Selected Attributes box. To remove an attribute from a particular treatment, select the treatment from the Treatments box, select the attribute(s) from the Selected Attributes box, then drag and drop them into the Available Attributes box where the attribute will reappear. To create a treatment not already listed in the Treatments box, click the plus [+] at the bottom left of the Treatments box, choose the appropriate Type (String, Number or Date), and Value to assign, then click [Save]. The new treatment will appear at the bottom of the Treatments box. To use this new treatment, select it in the Treatments box, and drag and drop attributes from the Available Attributes box to the Selected Attributes box, just as before. To remove a treatment, select the treatment in the Treatments box, and click the minus [-] located at the bottom left.

../_images/missingconfig.png

Configuration Tab

Treatments

  • Delete Row - deletes any row missing this attribute
  • Mean - use the Mean value from Univariate
  • Mode - use the Mode value from Univariate
  • Minimum - use the Min value from Univariate
  • Maximum - use the Max value from Univariate
  • Value | Type - use the user entered value in the Value column to assign a constant value.

Summary Tab

../_images/missingsummary.png

Summary Tab

The Summary tab lists each attribute, its type, the type of Missing Value Treatment (eg. Min, Mean, Number, etc.) and the actual replacement value (where known). Replacement types that rely on Univariate results (eg. Min, Mean, Max, etc.) will not be shown. It is possible to save the Missing node configuration for future use in another workflow; to do so click on the [Export] button (bottom) and you’ll be prompted to enter a filename before downloading the spreadsheet.

Results

There is one results set with two different tabs (Summary and Pseudo Code) for the Missing node.

Summary Tab

The Summary Tab shows the missing value processing summary with results containing the number of instances read, updated, and deleted. Each attribute selected for a missing value treatment is shown on separate rows of the Results grid. Results for each attribute include the attribute type, number of missing values, and the value used to replace missing values (whether a mean, median, assigned value, etc.).

../_images/missingsummaryresults.png

Summary Results Tab

Pseudo Code Tab

The Pseudo Code tab contains rules for how the Missing values will be applied.

../_images/missingpseudocode.png

Pseudo Code Tab

Output Stream

The newly cleansed dataset is ready for immediate use in other nodes to explore, manipulate and model the data. The data can be exported at any point in a workflow by using the Write File node.

Table Of Contents

Previous topic

Logistic Regression

Next topic

Outliers