Binning
Contents
Introduction
Binning is the process of dividing a continuous measure such as a time duration into discrete intervals called bins, for example we can divide the range of time duration into the following three bins: "less than 4 hours", "between 4 hours and 2 days", and "more than 2 days". In effect, binning converts continuous data into categories.
The example chart shows the frequency distribution of the time duration as a percent of total in each category. You can see that in the "Database" category, 43% of the items took less than 4 hours while in the "Inquiry / Help" category such items made a significantly smaller percentage.
Explore Analytics allows you to apply binning to data fields of the following types: decimal, integer, date/time, duration, and age. The last type, age, is a datetime field that is viewed as age. The age is calculated as the time duration between the date/time value and now.
Binning can be used in Pivot and Charts to group data by bins.
Defining Bins
You can define bins anywhere in a Pivot or Chart view where you select a field to define categories. A single view can have any number of binned fields. You can define bins for any field of type integer, decimal, date/time, or duration.
Bins in a Pivot View
To use binning in a Pivot view, begin by selecting a field into the row area or column area of your pivot. Open the Field Settings Dialog and check the box "Group into Bins". This would bring up the Binning Dialog.
The Binning Dialog is described in the next section. For a field that already has binning defined, you can return to the Binning Dialog using the "Edit Bins" button.
Bins for a Date/Time field
The Settings Dialog for a Date/Time field has various grouping options. To use binning, select the "Group into Bins" grouping. Selecting this grouping will de-select the other groupings.
Bins in Chart Views
Binning is possible in the following places and is configured using the field settings dialog, similar to a Pivot view.
- Timeline Chart: Category field
- Category Chart: Category field and Secondary Category field
- XY Chart: Color Category field, Marker Category field, Category and Tooltip fields
- Map Chart: Category field, Color Category field, Marker Category field, Pie Category field, Category and Tooltip fields
Bins for Chart Date/Time Field
When selecting a date/time field for any of the category fields listed in the previous section, the "Grouped by" selection had an option "Group into bins".
When you make the selection, the Binning Dialog appears. To return to this dialog later, use the field settings dialog.
Bins in Animation
The animation field can also use binning. Defining the bins requires two steps:
- The first step is to create the animation using the Chart Animation dialog. Select a field and click OK to create the animation.
- Return to the Chart Animation dialog and use the "Group into Bins" option. For a Date/Time field, you'll find this option in the grouping drop-down selection.
The Binning Dialog
All the binning options are defined using the Binning Dialog. In this section we describe each option in turn.
Bins Defined Automatically
You can let Explore Analytics define the bins automatically for you. This is useful for quickly defining a contiguous set of bins of regular size. You'll need to specify the desired number of bins.
When defining bins automatically, you have two options:
- Linear – in this option, the range of values for the field is divided into equally-sized bins (some rounding may apply). For example, if the lowest value is zero, the highest is 100, and you select 10 bins, then the first bin will go from 0 to 10, the second from 10 to 20, and so on with the last going from 90 to 100.
- Diverging – in this option, you'll define a "neutral point", to create an equal number of bins on either side of the neutral point. For example, if the lowest value is -8 and the highest value is 38 and you select the neutral point to be 0 and the number of bins to be 4, then the first two bins will be -8 to -4 and -4 to 0 and the last two bins will be 0 to 19 and 19 to 38. In other words, we divide the range on each side of the neutral point into equal-sized bins.
When defining bins automatically several things happen that you should be aware of. If this automatic behavior does not suite your needs, then your best option is to define the bins yourself.
- Each time the view is processed, the range will be calculated by querying for the lowest and highest value. This range will then be dynamically divided into bins. It means that if data changes, so do the bin definitions. If you filter some data out, instead of seeing fewer bins, you may see the remaining data divided into smaller bins to maintain the specified number of bins.
- The labels are automatically generated too. For example, if the boundaries of the bin are 0 to 100, the label will be "0 - 100".
Define the Bins by Yourself
Defining the bins yourself allows you to determine the exact boundaries of the bins, to specify the label exactly and to potentially leave the first and/or last bin open-ended.
You specify the "From" value for the first bin or leave it blank to let the first bin be open-ended. You then specify the "To" value to define the upper boundary of the bin. Each subsequent bin starts exactly where the previous bin ended.
You must specify a label for each bin.
You can remove bins using the button with the x on the right. You must have at least one bin.
You can add bins using the "Add Bin" button at the bottom of the dialog. You can define as many bins as you need.
The last bin can be left open-ended as demonstrated on the image above.
If you make the first bin or last bin closed-ended, data that falls outside all bins will be excluded from the view.
Bin Boundaries
A value cannot belong in more than one bin. For example, if you have a bin that goes from 10 to 20 and another that goes from 20 to 30, then the value "20" must only belong to one of these two bins.
- Bin includes its lower boundary – in our example, "20" will be in the "20 to 30" bin
- Bin includes its upper boundary – in our example, "20" will be in the "10 to 20" bin
Binning a Duration Field
Time duration is measured in days, hours, minutes and seconds. You can apply binning to a duration field as demonstrated in the following image.
Duration values are entered using the duration/age picker.
Binning a Date/Time Field
Date/time can also be grouped into bins. This can allow you to create custom date ranges as demonstrated in the following image.
Treating Date as Age
Sometime it's useful to treat a date such as the date that an invoice was created as an age instead and look at the number of days that the invoice was outstanding. By checking the box "Treat Date as Age" you can specify the bin boundaries in terms of age (time duration between the date and now). This is demonstrated in the following image.
The actual bin boundaries will be recalculated each time the view is processed. For example, "30 days 00:00:00" will become the date and time exactly 30 days before now. If the view is processed again later, the boundaries will shift and remain relative to the time that the view is processed.
Important note: the calculation of the date and time boundaries is subject to the time zone of your data source. To ensure an accurate calculation, please check that the time zone property of your data source definition is set correctly.
Binning of Null/Empty Values
If the field being binned contains null (empty) values, these values go into a separate category labelled "(empty)". Thus if you use 4 bins, you may get 5 categories – the 4 bins plus "(empty)". To eliminate the (empty) category simply use "Filter Data" to remove these values from your view.
Performance Implications of Binning
Explore Analytics does not expect your data source to support binning in its query language or API. Instead, the binning is performed by Explore Analytics.
Explore Analytics employs one of two strategies for performing a binned query.
- For a small number of bins – if the number of bins is small, a separate query is performed for each bin and the results are combined.
- For a large number of bins – if there are many bins, a single query is performed. This query does not request the data source to perform any grouping. The binning and grouping is performed on the results.
Using separate queries has the advantage of performing the grouping at the data source and only fetching the desired aggregated data. It has the disadvantage of potentially take longer to request and execute multiple queries. This is manageable as long as the number of bins is reasonable and it is the preferred method.
When we say "a small number", we mean under about 20. This may be, for example, 20 bins for a single field, or perhaps one field with 4 bins and another field with 5 bins (resulting in 20 bin combinations). The drawback of the second method of performing a single query without grouping is that it may fetch a large number of rows. If there are more than about 100,000 rows coming back, results will be truncated.
Binning and Calculated Columns
You can create a calculated column based on a binned field. Please note the following
- You must define the bins yourself to create bins with static labels. The calculation refers to these labels
- When you turn binning on for a field, any previously defined calculated columns are removed.
Empty Bins are Not Included
Your view may display fewer bins than you have defined. This is because there was no data in those bind. For example, if you have a bin for values between 10 and 20 and the data has no record with such a value, then the bin will be omitted.
Omitting empty bins is consistent with omitting empty categories. For example, if you have a choice field with values of "active" and "inactive" and all the records have the value of "active", you will not see a category of "inactive" in the view. This is true for both Pivot and Chart views.