Faster Redshift Queries with Materialized Views — Lifetime Daily ARPU

14 May 2014

Want to learn how to write faster SQL?

Get our eBook

The best way to make your SQL queries run faster is to have them do less work, and a great way to do less work is to query a materialized view that's already done the heavy lifting.

Materialized views are particularly nice for analytics queries, where many queries do math on the same basic atoms, data changes infrequently (often as part of hourly or nightly ETLs), and those ETL jobs provide a convenient home for view creation and maintenance logic.

Redshift doesn't yet support materialized views out of the box, but with a few extra lines in your import script (or a tool like Periscope), creating and maintaining materialized views as tables is a breeze.

Lifetime Daily ARPU (average revenue per user) is common metric and often takes a long time to compute. Let's speed it up with materialized views.

Calculating Lifetime Daily ARPU

This common metric shows the changes in how much money you're making per user over the lifetime of the your product.

Lifetime ARPU (Date) = Sum of purchases up to Date
Unique user count up to Date

For that we'll need a purchases table and a gameplays table, and the lifetime accumulated values for each date. Here's the SQL for calculating lifetime gameplays:

with
lifetime_gameplays as (
  select
    dates.d,
    count(distinct gameplays.user_id) as count_users
  from (
    select distinct date(created_at) as d
    from gameplays
  ) as dates
  inner join gameplays
    on date(gameplays.created_at) <= dates.d
  group by d
),

The range join in the correlated subquery lets us recalculate the distinct number of users for each date.

Here's the SQL for lifetime purchases in the same format:

lifetime_purchases as (
  select
    dates.d,
    sum(price) as sum_purchases
  from (
    select distinct date(created_at) as d
    from purchases
  ) as dates
  inner join purchases
    on date(purchases.created_at) <= dates.d
  group by d
)

Now that the setup is done, we can calculate lifetime daily ARPU:

with
lifetime_gameplays as (...),
lifetime_purchases as (...)

select
  lifetime_gameplays.d as date,
  round(
      lifetime_purchases.sum_purchases /
      lifetime_gameplays.count_users
    , 2
  ) as arpu
from lifetime_purchases inner join lifetime_gameplays
  on lifetime_purchases.d = lifetime_gameplays.d
order by lifetime_gameplays.d

That's a monster query and it takes minutes to run on a database with 2B gameplays and 3M purchases. That's way to slow, especially if we want to quickly slice by dimensions like what platform the game was played on. Plus, similar lifetime metrics will need to recalculate the same data over and over again!

Easy View Materialization on Redshift

Conveniently, we wrote our query in a format that makes it obvious which parts can be extracted into materialized views: lifetime_gameplays and lifetime_purchases.

We'll fake view materialization in Redshift by creating tables, and Redshift makes it easy to create tables from snippets of SQL:

create table lifetime_purchases as (
  select
    dates.d,
    sum(price) as sum_purchases
  from (
    select distinct date(created_at) as d
    from purchases
  ) as dates
  inner join purchases
    on date(purchases.created_at) <= dates.d
  group by d
)

Do the same thing for lifetime_gameplays, and and calculating Lifetime Daily ARPU now takes less than a second to complete!

Redshift is especially great for this kind of optimization because data on a cluster usually changes infrequently, often as a result of hourly or nightly ETLs.

Remember to drop and recreate these tables every time you upload data to your Redshift cluster to keep them fresh. Or create views in Periscope instead, and we'll keep them up to date automatically!

Thank you