Aggregate

../_images/aggregatenodeicon.png

Aggregate Node icon

The Clario Aggregate node allows you to summarize data to a level specified by any attribute(s) on the input data stream. You may aggregate attributes using functions such as min, max, average, and sum. Examples include: summarizing all items purchased to the customer level, summarizing demographics for all residents by state, or summarizing test scores by classroom. The node connector can be connected to a variety of nodes, (e.g. Read File, Outliers, Missing, etc.), but requires a valid stream of data. The incoming data stream must be sorted by the Group-By Attribute(s).

Configuration

The Aggregate node has two configuration tabs: Configure and Summary.

Configure Tab

The Configure tab has four different list boxes: Functions, Available Attributes, Selected/Aggregate Attributes, and Pivot Groups. All of these boxes are used to do one of the following: group by, summarize, or create new attributes.

Define Group By

To select which attribute will define the level of aggregation, do the following:

  1. Select ‘Group By’ in the Functions list box.
  2. Drag and drop the desired attribute(s) from the Available Attributes list box to the Selected Attributes list box. The Available Attributes list box displays all of the attributes from the input data stream connected to the input link node connector. See tips on Finding and Selecting Attributes.

The selected attribute(s) will be used to set the aggregation level. That is, each new value of the selected ‘Group By’ attributes will correspond to a new row in the output data stream. Attributes in the ‘Group By’ list will not be available for use in other aggregation functions.

../_images/aggregateconfigure.png

Configure Tab

Note

The data must first be sorted using the Group By attribute(s).

Define Aggregations

To define aggregations, do the following:

  1. Select a function in the Functions list box.
  2. Drag and drop these desired attributes into the Aggregate Attributes list box.
  3. Repeat these steps for additional aggregations.

When you select a function, only the attributes with the appropriate data type (string, numeric, or date) for that function are listed in the Available Attributes list box. For numeric attributes, the valid aggregate functions are: average, count, count distinct, min, max, sum, first, and last. For string attributes, the valid aggregate functions are: count, count distinct, first, and last. For date attributes, the valid functions aggregate functions are: average, count, count distinct, min, max, first, and last.

To delete defined aggregate attributes, select the function from the Functions list box, then drag and drop the attribute from the Aggregate Attributes list box into the Available Attributes list box.

Note

The Count function in the Aggregate node follows SQL conventions. It will only count non-NULL values.

Pivot Groups

../_images/aggregatepivotgroups.png

Pivot Groups

Pivot groups allow you to summarize data by referencing values of two different attributes. Examples include Orders by Region and Transactions by Year. To define a pivot group, do the following:

  1. Click the [+] below the Pivot Group Box.
  2. Enter a name for your Pivot Group in the popup box.
  3. Drag and drop an attribute from the Available Attributes list box into the Key Attribute box. New attributes will be created based on the values of the Key Attribute. For example, if your Key Attribute is Region, a new attribute will be created for each Region you specify in step 4.
  4. Define the values of the Key Attribute (and the new attribute names created from these values) by clicking the [+] on the bottom left of the Key Attribute area. Enter a match value and attribute name (see tips on Valid Characters for Attribute Names). Repeat for each value you wish to summarize. Entering values that are not found in the data doesn’t cause any errors, but these new pivot attributes will not contribute to the aggregation.
  5. Drag and drop another attribute into the Value Attribute box. The values of this attribute will be aggregated and become the values for the new attributes created based on the Key Attributes values.
../_images/aggregatepivotdatatypes.png

Data Types

The Data Type tab allows you to specify the data types of the attributes created by the Pivot Groups. To set a data type, click on a data type (Number or String) in the Types box, and drag and drop the attribute(s) from the Available Attributes list box to the Selected Attributes list box. If you need a data type that is not listed, click the [+] at the bottom of the Types box, then select the appropriate dropdown values. To delete a data type, click the [-].

Newly created pivot attributes will now be listed in the Available Attributes box when selecting a function to aggregate attributes by. The naming convention for attributes created by Pivot Groups is ‘Attribute Name | Pivot Group’.

Summary Tab

../_images/aggregatesummary.png

Summary Tab

The Summary tab contains a table summarizing the aggregation. The table includes the name of the source attribute, the role of the attribute, the aggregate function, the name of the outgoing attribute, and the outgoing attribute’s data type. The outgoing attribute names default to (attribute name)_(function). For example, dollars_SUM would be the aggregated sum of the dollars attribute.

The Aggregate node allows users to rename Outgoing Attributes directly from a file via the [Rename] button. Clicking [Rename] brings up a popup dialog in which you may manually edit attribute names (see tips on Valid Characters for Attribute Names), or paste names copied from a file. The Aggregate node also allows users to export the aggregate summary to a comma separated file via the [Export] button; it can be helpful to review this summary later. To do so simply click [Export] and you’ll be prompted to enter a filename before downloading the spreadsheet.

Results

There are no results for the Aggregate node. It is assumed the Aggregate node will be connected to another node to reduce variables, model, or otherwise analyze the newly summarized dataset.

Output Stream

The new summarized dataset is ready for immediate use in various nodes located throughout Clario. Alternatively, the data can be exported using the Write File node and can then be used again by reading in the file with the Read File node.

Table Of Contents

Previous topic

Node Library

Next topic

Append