Computing Day-Over-Day Changes With Window Functions

10 Jul 2014

Day/Day Deltas

In most sophisticated analysis, the rate of change is at least as important as the raw values. This makes life tough for a SQL analyst, where adding daily deltas to your result set can be difficult. In this post, we'll show you how to pull results like these:

     dt     |   ct   | ct_yesterday | daily_delta 
------------+--------+--------------+-------------
 2014-07-01 |   6917 |       200816 |     -96.56%
 2014-06-30 | 200816 |        93729 |     114.25%
 2014-06-29 |  93729 |       115240 |     -18.67%
 2014-06-28 | 115240 |       243014 |     -52.58%
 2014-06-27 | 243014 |       219843 |      10.54%
 2014-06-26 | 219843 |       184825 |      18.95%
 2014-06-25 | 184825 |       239193 |     -22.73%
 2014-06-24 | 239193 |       234194 |       2.13%
 2014-06-23 | 234194 |        79145 |     195.90%
 2014-06-22 |  79145 |       131561 |     -39.84%

To start, let's count the daily events for the last 10 days:

select
  date(created_at) dt,
  count(1) ct
from events
group by 1 order by 1 desc
limit 10

 

The Good Kind of Lag

In Postgres and Redshift, we'll use the lag() window function to add our daily_delta column. lag(a, N) grabs the value in column a that is N rows below the current row. For our query we want yesterday's ct:

lag(count(1), 1) over (order by dt) as ct_yesterday

The order by dt tells the query planner how to order the rows in the window, count(1) is the count of events for each dt, and the 1 says to grab one row below the current row. If you wanted to compare to the same day of the week last week, just use 7 instead of 1. Here's the full query so far:

select
  date(created_at) dt,
  count(1) ct,
  lag(count(1), 1) over (order by dt) as ct_yesterday
from events
group by 1 order by 1 desc
limit 10

Now we have ct_yesterday, yesterday's count, in every row. But we really want the change between today and yesterday. To do that, we calculate (ct - ct_yesterday) / ct_yesterday. We can simply add that to an outer query:

select
  dt, 
  ct, 
  ct_yesterday,
  (ct - ct_yesterday) / ct_yesterday as daily_delta
from (
  select
    date(created_at) dt,
    count(1) ct,
    lag(count(1), 1) over (order by dt) as ct_yesterday
  from events
  group by 1 order by 1 desc
  limit 10
) t

That gives us a decimal between 0 and 1, so we'll multiply by 100, round to two decimal places, and tack on a percent for style. (We would be remiss if we didn't mention that these formatting steps are unnecessary in Periscope!)

round(
  100.0 * (ct - ct_yesterday) / ct_yesterday, 2
) || '%' as daily_delta

Here's the full query:

select
  dt, ct, ct_yesterday,
  round(
    100.0 * (ct - ct_yesterday) / ct_yesterday, 2
  ) || '%' as daily_delta
from (
  select
    date(created_at) dt,
    count(1) ct,
    lag(count(1), 1) over (order by dt) as ct_yesterday
  from events
  group by 1
  order by 1 desc
  limit 10
) t

Now, to visualize that day-over-day change, sign up for Periscope and it'll be a breeze to make these kinds of delta charts:

Thank you