SQL Theory: What is Cohort analysis

A cohort is a group of people who share a common characteristic over a certain period of time.

For example, let’s look at a group of students. All of these students graduated in 2010. This group of students is a cohort. All of the students graduated in the same year, and this is their commonality.

Cohort analysis is a study that focuses on the activities of a particular cohort. If we were to calculate the average income of these students over the course of a five year period following their graduation, we would be conducting a cohort analysis.
Average Income ($)Average Income for the Graduating Class of 2010Avg Income2011201220132014201520k25k30k35k40kAvg Income2014: $36998Highcharts.com

This graph shows us that the average member of the 2010 graduating class increases their income by approximately $16K over their first five post-graduation years.

Cohort analysis gets more interesting when we compare cohorts over a period of time. Imagine that there is another cohort of students that graduated in 2011.
Graduating Cohorts

  1. Cohort 1

  3. Cohort 2
Cohort analysis allows us to identify relationships between the characteristics of a population and that population’s behavior. Looking at the average income over the five years after graduation in comparison to the income of the 2011 students over the same relative interval allows for a unique apples-to-apples comparison of these groups. In this case, there appears to be a relationship between a student’s year of graduation and their income.
Average Income ($)Average Income for the Graduating Classes of 2010 vs 201120102011Year 1Year 2Year 3Year 4Year 520k30k40k50k60k2011Year 2: $27090Highcharts.com

Here, we can see that both graduating classes increase in their average income per year. However, by the third year out, the 2011 grads make more on average than their 2010 counterparts (by an increasing margin).

Cohort Analysis for Business

Imagine that instead of graduating students, we were studying your customers. We could group them by how they were originally referred to your business and track how much money they spent over time.
Spending ($)Customer Spending by Referral SourceDirectFacebookGoogleBlogQuarter 1Quarter 2Quarter 3Quarter 40255075100125150175Quarter 2: $89.8Highcharts.com

Above, we see that customers referred by the blog deliver strong, consistent long-term spending. Search engines and other channels, however, refer customers who spend a decreasing amount over time.

Perhaps the most popular cohort analysis is one that groups customers based on their “join date,” or the date when they made their first purchase. Studying the spending trends of cohorts from different periods in time can indicate if the quality of the average customer being acquired is increasing or decreasing in over time.
Spending ($)Average Cumulative Customer Spending based on Date of 1st PurchaseJan 2012Feb 2012Mar 2012Apr 2012Month 1Month 2Month 3Month 4Month 5Month 630405060708090100Jan 2012Month 3: $69.45Highcharts.com

In the chart above, the average customer in newer cohorts is spending less as time goes on. This would be a red flag for many investors or acquirers because it implies that the value of recently-acquired customers is less than those acquired in the past.

Perform Your Own Cohort Analysis

Tip: Most professionals use tools like RJMetrics to perform cohort analysis automatically.

Step 1: Pull the Raw Data

Typically, the data required to conduct cohort analysis lives inside of a database of some kind and needs to be exported into spreadsheet software. In this example, we use MySQL and Microsoft Excel.
If you’re studying customer purchase behavior, you want to end up with a table of data that includes one record per customer purchase. Each record contains the customer’s ID (typically either a unique number or an e-mail address), the date and time of the purchase, the amount of the purchase, and the customer’s “cohort date” (this is typically the date of the customer’s first purchase). In a typical “orders” database table, the MySQL query to pull such information might look something like this:

SELECT orders.customerid,




FROM orders

JOIN (SELECT customerid,

Min(transactiondate) AS cohortDate

FROM orders

GROUP BY customerid) AS cohorts

ON orders.customerid = cohorts.customerid;

view rawgistfile1.sql hosted with ❤ by GitHub
Ideally, however, you would want to include additional attributes such as the customer’s referral source, the first product they purchased, geographic and demographic information, and more. The more information about the customer you have, the more ways you’ll be able to segment your cohorts. However, each of these additional attributes may require additional database joins. Tools like RJMetrics make these attributes available to you automatically.

Step 2: Create Cohort Identifiers

Open the data you’ve pulled into Excel. Since we pulled the “cohort date” attribute in the example above, we’ll conduct the popular cohort analysis in which we compare groups of customers based on when they made their first purchase. Assuming we want to group our cohorts based on the month in which they made their first purchase, we’ll need to translate each “cohort date” value into a “bucket” that represents the year and month of their first purchase. Assuming cohort date is in “Column D,” the following Excel formula does the trick:
=YEAR(D2) & “-” & MONTH(D2)

Step 3: Calculate Lifecycle Stages

Once we know the cohort that each customer belongs to, we also need to determine the “lifecycle stage” at which each event happened for that cohort member. For example, if a customer made their first purchase on January 10th, 2012, and their second purchase on March 15th, 2012, they would be in the “January 2012” cohort, their first purchase would be in the “Month 1” lifecycle stage, and their second purchase would be in their “Month 3” lifecycle stage, because it happened in their third month after becoming a customer. To calculate lifecycle stage, we’ll need to determine the amount of time between the customer’s first purchase and the purchase in question. Assuming transaction date is in “Column C” and cohort date is in “Column D,” a function like the one below will do the trick:
When you’re done, you should have a table in Excel that looks like the one below.

Step 4: Create a Pivot Table and Graph

Pivot tables allow you to calculate an aggregation such as a sum or average across multiple dimensions of your data. The pivot table we’d like to create here is one that conducts a SUM of transaction amount, shows one row per cohort and one column per relative time period. Its data can be visualized on a basic Excel line graph.
There you have it: an extremely basic cohort analysis built from the ground up. There are hundreds of variations on cohort analysis that you can run based on your needs, a few of which are described below.

Bonus Step: Data Perspectives

The chart we’ve created is a cohort analysis, but it isn’t very easy to interpret in this format. Another way to look at this chart would be to view each cohort’s spending as a cumulative value over time. This will effectively build a curve that allows you to watch total customer lifetime spending grow over time per cohort.
Even more helpful is to normalize this data by the size of the cohort. In order to do this, each data point for a cohort must be divided by the number of members in that cohort. That way, you can view the average value per cohort member side-by-side without a bias from the size of the cohort. To do this, you’ll have to create a second pivot table to calculate cohort size and then divide one by the other.
These data perspectives are another area where an automated tool like RJMetrics can be extremely valuable. Data perspectives such as “cumulative average per cohort member” can be applied with just a few clicks.

Try it in RJMetrics

RJMetrics is currently offering a free 14 day trial for visitors of CohortAnalysis.com. In this 14 day period, your data will be fully integrated with the RJMetrics system, and you’ll be able to build and explore your cohorts in an easy-to-use interface. Give it a try today!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s