Group and Aggregate Node

Description

Use the Group and Aggregate node to generate summary statistics for groups of rows. You'll start with the group columns, which are used to place the rows from the input table into groups:

If you choose one group column, then the rows are grouped by each unique value in that column.
If you choose multiple group columns, then the rows are grouped by each unique combination of values across the columns.
If you don't choose a group column, all rows will go in one group.

For the next step, you decide what metrics to calculate to summarize each group. These will be aggregation metrics like SUM or MEAN. You can compute as many aggregations as you like. By default, we'll show the size (COUNT) of each group.

The output table will show each group column, along with each aggregation.

Tips

For certain columns, all the values will be unique and grouping won't do anything.
If you are grouping a numerical column, first use the Convert node to create buckets of numerical ranges (i.e. '0', '1-10', '11-20', '21+) before grouping for best results.

SQL Equivalent

{column1}
    ,COUNT(*) as group_size
from {data_source}
group by {column1}

↩️ Back to Workflow Settings

Updated on: 05/10/2024

Was this article helpful?

Thank you!