Aggregate Node icon
The Clario Aggregate node allows you to summarize data to a level specified by any attribute(s) on the input data stream. You may aggregate attributes using functions such as min, max, average, and sum. Examples include: summarizing all items purchased to the customer level, summarizing demographics for all residents by state, or summarizing test scores by classroom. The node connector can be connected to a variety of nodes, (e.g. Read File, Outliers, Missing, etc.), but requires a valid stream of data. The incoming data stream must be sorted by the Group-By Attribute(s).
The Aggregate node has two configuration tabs: Configure and Summary.
The Configure tab has four different list boxes: Functions, Available Attributes, Selected/Aggregate Attributes, and Pivot Groups. All of these boxes are used to do one of the following: group by, summarize, or create new attributes.
To select which attribute will define the level of aggregation, do the following:
The selected attribute(s) will be used to set the aggregation level. That is, each new value of the selected ‘Group By’ attributes will correspond to a new row in the output data stream. Attributes in the ‘Group By’ list will not be available for use in other aggregation functions.
Configure Tab
Note
The data must first be sorted using the Group By attribute(s).
To define aggregations, do the following:
When you select a function, only the attributes with the appropriate data type (string, numeric, or date) for that function are listed in the Available Attributes list box. For numeric attributes, the valid aggregate functions are: average, count, count distinct, min, max, sum, first, and last. For string attributes, the valid aggregate functions are: count, count distinct, first, and last. For date attributes, the valid functions aggregate functions are: average, count, count distinct, min, max, first, and last.
To delete defined aggregate attributes, select the function from the Functions list box, then drag and drop the attribute from the Aggregate Attributes list box into the Available Attributes list box.
Note
The Count function in the Aggregate node follows SQL conventions. It will only count non-NULL values.
Pivot Groups
Pivot groups allow you to summarize data by referencing values of two different attributes. Examples include Orders by Region and Transactions by Year. To define a pivot group, do the following:
Data Types
The Data Type tab allows you to specify the data types of the attributes created by the Pivot Groups. To set a data type, click on a data type (Number or String) in the Types box, and drag and drop the attribute(s) from the Available Attributes list box to the Selected Attributes list box. If you need a data type that is not listed, click the [+] at the bottom of the Types box, then select the appropriate dropdown values. To delete a data type, click the [-].
Newly created pivot attributes will now be listed in the Available Attributes box when selecting a function to aggregate attributes by. The naming convention for attributes created by Pivot Groups is ‘Attribute Name | Pivot Group’.
Summary Tab
The Summary tab contains a table summarizing the aggregation. The table includes the name of the source attribute, the role of the attribute, the aggregate function, the name of the outgoing attribute, and the outgoing attribute’s data type. The outgoing attribute names default to (attribute name)_(function). For example, dollars_SUM would be the aggregated sum of the dollars attribute.
The Aggregate node allows users to rename Outgoing Attributes directly from a file via the [Rename] button. Clicking [Rename] brings up a popup dialog in which you may manually edit attribute names (see tips on Valid Characters for Attribute Names), or paste names copied from a file. The Aggregate node also allows users to export the aggregate summary to a comma separated file via the [Export] button; it can be helpful to review this summary later. To do so simply click [Export] and you’ll be prompted to enter a filename before downloading the spreadsheet.
There are no results for the Aggregate node. It is assumed the Aggregate node will be connected to another node to reduce variables, model, or otherwise analyze the newly summarized dataset.
The new summarized dataset is ready for immediate use in various nodes located throughout Clario. Alternatively, the data can be exported using the Write File node and can then be used again by reading in the file with the Read File node.