In a List
Here is a list of some useful SQL hacks that I use almost every day to make my work a little bit easier:
- keep queries compact for easy debugging and to save query cost.
- use “session” mode in bigquery, this stores your various query results for a couple of hours
→ this reduces both, query runtime and query cost! - reduce query complexity by creating a “reference” table at logical increments.
→ use CREATE OR REPLACE TABLE, and then referg to the table instead of continuing to build a very long query with many cte’s
→ this has the added benefit that you can easily sanity check stuff in the different tables instead of having to dig through the full query every time - reduce input table size by adding filters where possible.
→ listing only the necessary columns
→ using the WHERE clause right in the beginning
- use “session” mode in bigquery, this stores your various query results for a couple of hours
- think ahead by creating actions that can be adjusted easily.
- start the query with a cte for each table youre using so you can just refer to the cte instead of type the full name of the table each time
- assign an alias to a column that will be used later for joining
→ this enables you to assign a join column with “USING (consistentColumnName)” instead of “ON table1.columnName = table2.differentColumnName” - always start your WHERE clause with “WHERE 1=1”
→ enables you to easily comment out other where filters during testing
- future-proof the query by explaining everything unusual that’s going on in the query and avoiding clutter.
- using table aliases for each table in a join and using these for each column added to the select statement
→ lets you easily identify where a column came from - keep a short note on each cte to state what it does
- add an explanation for every filter in the WHERE clause
- absolutely no cte within a cte! these are very difficult to read quickly
- using table aliases for each table in a join and using these for each column added to the select statement
As a query example
Here’s a code that covers what I described in the bullet points bk who wants to read text if you can just read code, hehe
1WITH2-- in this query we are pulling sales data for the past 90 days3-- on weekly level and seller account segment4-- for very important stakeholder ofc and for important reason5
6calendar AS (7 SELECT8 *,9 date_day AS amsterdamEventDate10 FROM 'bq_project.subfolder.dim_financial_calendar'11 WHERE12 date_day BETWEEN CURRENT_DATE()-90 AND CURRENT_DATE()13 ),14
15accounts AS (39 collapsed lines
16 SELECT17 *18 FROM 'bq_project.subfolder.dim_account_list'19 ),20
21sales AS (22 SELECT23 *24 FROM 'bq_project.subfolder.fct_sales'25 WHERE26 -- only last 90 days are relevant to request27 amsterdamEventDate BETWEEN CURRENT_DATE()-90 AND CURRENT_DATE()28 ),29
30-- in this CTE we are joining all tables31-- and aggregating on segment and week level32sales_aggregations AS (33 SELECT34 calendar.date_week AS amsterdamEventWeek,35 accounts.segment,36 SUM(sales.orderTotal) AS salesSegment,37 SUM(sales.orderTotal)38 OVER (PARTITION BY calendar.date_week) AS salesAllSegments39 FROM calendar40 LEFT JOIN sales USING (amsterdamEventDate)41 LEFT JOIN accounts USING (accountId)42 GROUP BY ALL, calendar.date_week43
44)45
46SELECT47 *,48 ROUND(49 SAFE_DIVIDE(salesSegment, salesAllSegments)50 ,2) AS proportionSegment51FROM sales_aggregations52WHERE 1=153--AND segment = "Longtail"54ORDER BY amsterdamEventWeekImage source: one of the first 5 images on google search