Here is a list of some useful SQL hacks that I use almost every day to make my work a little bit easier:
- keep queries compact for easy debugging and to save query cost.
- use “session” mode in bigquery, this stores your various query results for a couple of hours
→ this reduces both, query runtime and query cost! - reduce query complexity by creating a “reference” table at logical increments.
→ use CREATE OR REPLACE TABLE, and then referg to the table instead of continuing to build a very long query with many cte’s
→ this has the added benefit that you can easily sanity check stuff in the different tables instead of having to dig through the full query every time - reduce input table size by adding filters where possible.
→ listing only the necessary columns
→ using the WHERE clause right in the beginning
- use “session” mode in bigquery, this stores your various query results for a couple of hours
- think ahead by creating actions that can be adjusted easily.
- start the query with a cte for each table youre using so you can just refer to the cte instead of type the full name of the table each time
- assign an alias to a column that will be used later for joining
→ this enables you to assign a join column with “USING (consistentColumnName)” instead of “ON table1.columnName = table2.differentColumnName” - always start your WHERE clause with “WHERE 1=1”
→ enables you to easily comment out other where filters during testing
- future-proof the query by explaining everything unusual that’s going on in the query and avoiding clutter.
- using table aliases for each table in a join and using these for each column added to the select statement
→ lets you easily identify where a column came from - keep a short note on each cte to state what it does
- add an explanation for every filter in the WHERE clause
- absolutely no cte within a cte! these are very difficult to read quickly
- using table aliases for each table in a join and using these for each column added to the select statement
Here’s a code that covers what I described in the bullet points bk who wants to read text if you can just read code, hehe
WITH
-- in this query we are pulling sales data for the past 90 days
-- on weekly level and seller account segment
-- for very important stakeholder ofc and for important reason
calendar AS (
SELECT
*,
date_day AS amsterdamEventDate
FROM 'bq_project.subfolder.dim_financial_calendar'
WHERE
date_day BETWEEN CURRENT_DATE()-90 AND CURRENT_DATE()
),
accounts AS (
SELECT
*
FROM 'bq_project.subfolder.dim_account_list'
),
sales AS (
SELECT
*
FROM 'bq_project.subfolder.fct_sales'
WHERE
-- only last 90 days are relevant to request
amsterdamEventDate BETWEEN CURRENT_DATE()-90 AND CURRENT_DATE()
),
-- in this CTE we are joining all tables
-- and aggregating on segment and week level
sales_aggregations AS (
SELECT
calendar.date_week AS amsterdamEventWeek,
accounts.segment,
SUM(sales.orderTotal) AS salesSegment,
SUM(sales.orderTotal)
OVER (PARTITION BY calendar.date_week) AS salesAllSegments
FROM calendar
LEFT JOIN sales USING (amsterdamEventDate)
LEFT JOIN accounts USING (accountId)
GROUP BY ALL, calendar.date_week
)
SELECT
*,
ROUND(
SAFE_DIVIDE(salesSegment, salesAllSegments)
,2) AS proportionSegment
FROM sales_aggregations
WHERE 1=1
--AND segment = "Longtail"
ORDER BY amsterdamEventWeek
Image source: one of the first 5 images on google search