December 23, 2025

Performance Hints for BigQuery

Vlad Duta

Christopher Garnatz

Andrew Hannigan

Rahul Raina

Key takeaways

Optimize slot time, not data scanned: Slot time consumed determines cost under capacity pricing, not bytes processed.‍
Encode hex as binary: 50-800% size reduction: Convert hex to BYTES (50% savings) or 64-bit integers for joins (8x reduction).‍
Always filter on partitions: 75x scan reduction: Partition filtering is the highest-impact optimization—avoid full table scans.‍
Salt hot join keys to eliminate skew: Distribute hot keys across partitions for parallel processing (8x speedups).‍
Avoid inequality joins — they trigger CROSS JOINs: Pre-compute with window functions, then join on equality.

Jeff Dean and Sanjay Ghemawat's "Performance Hints" document just got a refresh, and predictably it’s gone viral — again! In it, they lay out general principles for thinking about performance: the importance of estimation, the value of measurement, and the compounding power of many small improvements. Their key insight? "In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal."

This post applies that same thinking to BigQuery, and attempts to share some of our hard won lessons over the years. Much of these learnings are built on the shoulders of giants — so please don’t consider these “novel”. Many of these hints are also not specific to BigQuery, and can be broadly applied to data processing systems in general (e.g., Spark, Snowflake, Postgres, etc).

If there are things we are missing, we would love to hear from you. Send us a email at: eng-blog@trmlabs.com.

Why this matters

The scale of blockchain data is exploding. Consider:

Bitcoin processed 12 billion transactions over 12 years
Solana hit 75 billion in just five years — 15x faster

Regulatory tailwinds are accelerating this trend. In December 2025, SEC Chair Paul Atkins predicted the entire US financial market moving on-chain. That same month, DTCC partnered with Digital Asset to tokenize trillions in US Treasury securities.

At TRM, we need to cover more blockchains, faster, in the next three years than we did in the last eight. We're expecting a 100x increase in raw blockchain data (10x throughput × 10x chains), and are already seeing total bytes processed increase 250x[1] in the last six months alone

Optimizing BigQuery performance lets us scale to meet this demand without breaking the bank.

Understanding BigQuery costs

Before optimizing, understand what you're optimizing for. Under BigQuery's capacity-based pricing model, there are two metrics that matter:

Slot time consumed is what you're actually paying for. It represents the vCPU time used to execute your query. Under slot pricing, the amount of data processed is irrelevant to cost —slot utilization is what affects your bottom line.
Execution duration is wall clock time. This affects how long you wait, but not directly what you pay.

The key insight: data processed ≠ cost. A query that scans 1TB but finishes in 10 seconds might cost less than a query that scans 100GB but runs for 10 minutes with complex joins.

To understand query cost, check the Execution Details tab in the BigQuery console and look at "Slot time consumed." Unfortunately, BigQuery doesn't estimate slot time before execution — you need to run the query (or examine similar historical queries) to understand the cost under the capacity compute pricing model.

Performance hints

A. Data encoding and minimization

Blockchain data is inherently random transaction hashes, addresses, and signatures are typically the output of cryptographic functions. This means traditional compression techniques (which exploit patterns and repetition) provide little benefit. Our solution is to encode data more efficiently at the source.

Data minimization provides optimization benefits throughout the stack: smaller storage footprint, reduced network transfer, better memory utilization, and improved CPU register efficiency.

<h4 class="anchor-link-rich-text" id="encode-hex-strings-as-binary">1. Encode hex strings as binary</h4>

Hex strings are a human-readable encoding of binary data, but they're twice as large as necessary. A 64-character hex string (common for transaction hashes) uses 64 bytes as text, but only 32 bytes as binary.

-- Store as BYTES instead of STRING
-- '0xce4d540d...' (67 bytes as string) → '\\xce4d540d...' (36 bytes as BYTES)

-- Encode on write
SELECT FROM_HEX(SUBSTR(tx_hash, 3), 'hex') AS tx_hash_bytes  -- strips '0x' prefix
FROM raw_transactions

-- Decode on read
SELECT CONCAT('0x', TO_HEX(tx_hash_bytes, 'hex')) AS tx_hash
FROM encoded_transactions

Results: ~50% size reduction for hex-encoded data. For base64-encoded chains (Algorand, Tezos), use 'base64' encoding for ~25% reduction.

<h4 class="anchor-link-rich-text" id="use-integer-hashing-for-join-keys">2. Use integer hashing for join keys</h4>

For columns used primarily as join or group keys (where human readability isn't required), compress to 64-bit integers using MD5 hashing. A 64-character hex string (64 bytes) becomes an 8-byte integer — an 8x improvement.

-- MD5 hash to 64-bit integer
CREATE TEMP FUNCTION md5_int64(s STRING)
RETURNS INT64 AS (
  CAST(CONCAT('0x', SUBSTR(TO_HEX(MD5(s)), 1, 16)) AS INT64)
);

-- Use for grouping/joining (8x smaller keys)
SELECT
  md5_int64(sender_address_chain) AS sender_id,
  chain,
  md5_int64(tx_hash) AS tx_id,
  MAX(sent_amount_usd) AS sent_amount_usd
FROM transfers
GROUP BY 1, 2, 3

‍Tradeoffs to consider:

Usability: Moving between representations adds debugging overhead. Invest in tooling to make this seamless.
Collision risk: Compressing to 64-bit creates collision risk via the birthday paradox — at ~5 billion items, you hit 50% collision probability. For datasets approaching this scale, use 128-bit hashes or add collision detection. For smaller datasets (millions to low billions), 64-bit is practical.
Even distribution: By leveraging a uniform hashing function we avoid the buildup of hot shards. However, this can also lead to more reshuffling between query execution stages as data needs to be redistributed between workers with the potential to negate original/final table clustering.

Why this matters at scale: Smaller keys mean faster joins. 64bit operation typically occur within 1 clock cycle while 256 bit can take >3 clock cycles (depending on CPU platform)

B. Data access patterns

<h4 class="anchor-link-rich-text" id="always-filter-on-partition-columns">3. Always filter on partition columns</h4>

This is the single highest-impact optimization. Without partition filters, BigQuery scans the entire table.

-- Avoid: scans entire table (1.1 TB)
SELECT *
FROM `your-project.blockchain_data.btc_transactions` AS t
WHERE t.hash = '21d6cf816ab9550d8359bb12b14bad5dafd91d82a45122e45144cc63a38c7e64'

-- Do: scans single partition (14.5 GB) — 75x reduction
SELECT *
FROM `your-project.blockchain_data.btc_transactions` AS t
WHERE t.hash = '21d6cf816ab9550d8359bb12b14bad5dafd91d82a45122e45144cc63a38c7e64'
AND t.block_timestamp_month = '2021-11-01'

<h4 class="anchor-link-rich-text" id="use-clustered-tables-for-filtered-joins">‍4. Use clustered tables for filtered joins</h4>

If you frequently filter on specific columns (sender address, receiver address, transaction hash), use tables clustered on those columns. The performance difference is substantial. The internal table structure is a collection of files and the clustered columns are sorted within each file, much like pages and words in the Oxford Dictionary. This allows the query engine to deploy efficient lookup algorithms and data structures like indexes and binary search. Thus clustering works best with ultra high cardinality columns.

-- Avoid: unclustered table
SELECT transaction_hash_chain
FROM `your-project.transfers.account_asset_transfer_eth`
WHERE sender_address_chain = "0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be:eth"

-- Do: use table clustered by the filter column
SELECT transaction_hash_chain
FROM `your-project.transfers.account_asset_transfer_clustered_by_sender_eth`
WHERE sender_address_chain = "0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be:eth"

<h4 class="anchor-link-rich-text" id="use-tablesample-for-exploration">5. Use TABLESAMPLE for exploration</h4>

When exploring large tables, LIMIT 100 still performs a full scan. Use TABLESAMPLE to sample data without scanning everything.

SELECT *
FROM `your-project.blockchain_data.transactions`
TABLESAMPLE SYSTEM (0.001 PERCENT)

<h4 class="anchor-link-rich-text" id="use-the-preview-feature">‍6. Use the Preview feature</h4>

For quick table inspection, use BigQuery's Preview tab in the console instead of SELECT * LIMIT 10. The Preview feature reads directly from stored metadata and sample blocks — no query execution, no slot consumption.

Cost: $0. Duration: instant. Click the table name in the explorer, then click "Preview."

C. Join optimization

<h4 id="order-joins-largest-table-first-then-smallest">7. Order joins: largest table first, then smallest</h4>

When joining multiple tables, place the largest table first, then the smallest followed by tables of decreasing size. This enables broadcast joins, where BigQuery sends the smaller table to each slot processing the larger table.

-- Do: largest table first
SELECT ...
FROM large_transactions t         -- 10B rows
JOIN mini_properties p ON ...     -- 100K rows
JOIN medium_addresses a ON ...    -- 2B rows
JOIN small_labels a ON ...        -- 1B rows

<h4 class="anchor-link-rich-text" id="avoid-hidden-cross-joins">8. Avoid hidden CROSS JOINs</h4>

Inequalities in JOIN clauses cause BigQuery to perform a CROSS JOIN under the hood, then filter. This can explode slot time and bytes shuffled.

-- Avoid: hidden CROSS JOIN (finds latest external transfer before each transfer)
SELECT t.transfer_index, MAX(ext.transaction_timestamp) AS latest_prior_external
FROM transfers t
JOIN external_transfers ext
  ON ext.transaction_timestamp < t.transaction_timestamp  -- inequality = CROSS JOIN
GROUP BY t.transfer_index

The fix is to pre-compute the answer into a lookup table, avoiding the inequality join entirely:

-- Do: pre-compute a cumulative lookup table
CREATE TEMP TABLE _SESSION.external_transfer_timeline AS
SELECT
  transaction_timestamp,
  MAX(transaction_timestamp) OVER (
    ORDER BY transaction_timestamp
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
  ) AS latest_prior_external
FROM external_transfers;

-- Then join on equality
SELECT t.transfer_index, timeline.latest_prior_external
FROM transfers t
JOIN _SESSION.external_transfer_timeline timeline
  ON t.transaction_timestamp = timeline.transaction_timestamp

<h4 class="anchor-link-rich-text" id="avoid-or-in-join-conditions">9. Avoid OR in JOIN conditions</h4>

BigQuery optimizes joins based on the join column. Using OR prevents this optimization.

-- Avoid
SELECT t1.*
FROM table_1 AS t1
JOIN table_2 AS t2
ON t1.a = t2.a OR t1.b = t2.b

-- Do: use UNION ALL instead
SELECT t1.*
FROM table_1 AS t1
JOIN table_2 AS t2 ON t1.a = t2.a
UNION ALL
SELECT t1.*
FROM table_1 AS t1
JOIN table_2 AS t2 ON t1.b = t2.b

-- Depending on the nature of the input tables table_1 and table_2, the recommended solution may end up with duplicate rows.
-- Do: Use NOT to find the intersection of the complements of t1.a = t2.a and t1.b=t2.b, then obtain the complement of this intersection - which is the union of the join keys that we desire.

Select t1.*
FROM table_1 AS t1
JOIN table_2 AS t2
ON NOT (
	t1.a <> t2.a
	AND t1.b <> t2.b
)

<h4 class="anchor-link-rich-text" id="join-on-unnested-values-not-array-membership">10. JOIN on unnested values, not array membership</h4>

Joining on value IN UNNEST(array) forces BigQuery to evaluate array containment for every row pair — effectively a nested loop. Unnest the array first and join on equality, which enables hash joins.

-- Avoid: array containment check on every row (30 min runtime)
SELECT t.*, labels.category
FROM transfers t
JOIN address_labels labels
  ON t.sender_address IN UNNEST(labels.addresses)  -- slow: checks array for each row

-- Do: unnest early, then join on equality (3 min runtime)
WITH labels_unnested AS (
  SELECT category, address
  FROM address_labels,
  UNNEST(addresses) AS address
)
SELECT t.*, l.category
FROM transfers t
JOIN labels_unnested l
  ON t.sender_address = l.address  -- fast: hash join on equality

<h4 class="anchor-link-rich-text" id="use-salting-to-fix-partition-skew">11. Use salting to fix partition skew</h4>

This is the "Justin Bieber problem" — when a small number of keys account for a disproportionate share of data. Instagram famously encountered this when celebrity accounts caused hot shards in their infrastructure.

Blockchain data exhibits the same power-law distribution. A handful of addresses — major exchanges like Binance, DeFi protocols like Uniswap — account for 90%+ of on-chain transaction activity. When you join on these "hot" addresses, BigQuery's distributed join becomes pathologically inefficient.

Understanding the problem: BigQuery executes joins by shuffling data — redistributing rows across worker slots based on join key hash values. Rows with the same key land on the same slot. When one key dominates, a single slot processes the vast majority of rows while others sit idle. If that slot runs out of memory, data spills to disk, adding I/O overhead. You'll see this in the Execution Details as high "bytes shuffled" and "bytes spilled to disk."

The solution: Salt the join keys to distribute hot values across multiple partitions.

-- Step 1: Explode the smaller table with salt values
CREATE TEMP TABLE _SESSION.addresses_salted AS
SELECT
  address_chain,
  label,
  CONCAT(address_chain, '_', salt) AS salted_address_chain
FROM addresses,
UNNEST(GENERATE_ARRAY(0, 9)) AS salt;  -- 10 salt values

-- Step 2: Join using salted keys with random distribution
SELECT t.*, a.label
FROM transfers t
JOIN _SESSION.addresses_salted a
  ON CONCAT(t.sender_address_chain, '_', CAST(FLOOR(RAND() * 10) AS INT64)) = a.salted_address_chain

‍Why this works: Instead of all Binance transactions hitting one slot, they're randomly distributed across 10 slots. Yes, you've 10x'd the smaller table's size — but you've converted a serial bottleneck into parallel processing.

Results: In production, we've seen salting reduce execution time by 8x in cases where a single hot shard was processing 85% of rows. The total slot time consumed stays roughly the same, but the wall-clock time drops dramatically because work is parallelized.

When to use: Check your Execution Details. If you see one stage with a max slot time far exceeding the average, you likely have partition skew. Salting is the fix.

D. CTEs and temp tables‍

<h4 class="anchor-link-rich-text" id="use-temp-tables-when-a-cte-is-referenced-multiple-times">12. Use temp tables when a CTE is referenced multiple times</h4>

CTEs are re-computed on each reference. If you reference the same CTE in multiple subqueries (e.g., in a UNION ALL), use a temp table instead.

-- Avoid: CTE referenced twice = computed twice
WITH transfers AS (
    SELECT tx_hash, sender, receiver
    FROM raw_transfers
    WHERE tx_timestamp > '2024-01-01'
)
SELECT * FROM transfers WHERE sender = 'A'
UNION ALL
SELECT * FROM transfers WHERE receiver = 'A'

-- Do: materialize to temp table
CREATE TEMP TABLE _SESSION.transfers AS
SELECT tx_hash, sender, receiver
FROM raw_transfers
WHERE tx_timestamp > '2024-01-01';

SELECT * FROM _SESSION.transfers WHERE sender = 'A'
UNION ALL
SELECT * FROM _SESSION.transfers WHERE receiver = 'A'

<h4 class="anchor-link-rich-text" id="use-session-table-name-for-temp-tables">13. Use _SESSION.table_name for temp tables</h4>

Without the _SESSION prefix, temp tables persist for up to 24 hours and count against your storage quota. The _SESSION qualifier scopes the table to your current session — it's automatically dropped when the session ends.

-- Avoid: persists for 24 hours
CREATE TEMP TABLE my_temp_table AS SELECT ...

-- Do: dropped when session ends
CREATE TEMP TABLE _SESSION.my_temp_table AS SELECT ...

<h4 class="anchor-link-rich-text" id="filter-early-in-ctes-and-temp-tables">14. Filter early in CTEs and temp tables</h4>

Every byte that flows through a CTE or temp table gets shuffled, stored, and read again downstream. Filter aggressively at the earliest stage to minimize this overhead.

-- Avoid: filtering late (processes all transfers, then filters)
WITH all_transfers AS (
  SELECT * FROM raw_transfers
)
SELECT * FROM all_transfers WHERE chain = 'eth' AND amount > 1000

-- Do: filter early (only processes eth transfers > 1000)
WITH filtered_transfers AS (
  SELECT * FROM raw_transfers
  WHERE chain = 'eth' AND amount > 1000
)
SELECT * FROM filtered_transfers

E. Aggregation tricks

<h4 class="anchor-link-rich-text" id="use-array-agg-instead-of-row-number-rank-for-latest-record-patterns">15. Use ARRAY_AGG instead of ROW_NUMBER/RANK for "latest record" patterns</h4>

ARRAY_AGG with ORDER BY ... LIMIT 1 is more efficient than window functions for selecting the most recent record.

-- Avoid
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY created_at DESC) AS rn
  FROM events
) WHERE rn = 1

-- Do
SELECT event.*
FROM (
  SELECT id, ARRAY_AGG(t ORDER BY t.created_at DESC LIMIT 1)[OFFSET(0)] event
  FROM events t
  GROUP BY id
)

<h4 class="anchor-link-rich-text" id="use-approx-count-distinc-or-hyperloglog">16. Use APPROX_COUNT_DISTINCT or HyperLogLog</h4>

When exact counts aren't required, approximate functions are dramatically faster.

-- Exact (slow)
SELECT COUNT(DISTINCT user_id) FROM transactions

-- Approximate (fast, ~1% error)
SELECT APPROX_COUNT_DISTINCT(user_id) FROM transactions

HyperLogLog allows re-aggregation of sketches — useful for pre-computing distinct counts that can be merged later.

<h4 class="anchor-link-rich-text" id="avoid-select-distinct-with-union-distinct">17. Avoid SELECT DISTINCT with UNION DISTINCT</h4>

BigQuery evaluates DISTINCT three times: once per SELECT, once for the UNION. Use SELECT ... UNION DISTINCT SELECT ... without the individual DISTINCTs.

-- Avoid: DISTINCT evaluated 3 times
SELECT DISTINCT col FROM table_a
UNION DISTINCT
SELECT DISTINCT col FROM table_b

-- Do: DISTINCT evaluated once
SELECT col FROM table_a
UNION DISTINCT
SELECT col FROM table_b

<h4 class="anchor-link-rich-text" id="reconciling-large-datasets-without-using-join">18. Reconciling large datasets without using JOIN</h4>

A common task is checking for missing keys between two tables. If the expectation is for the two tables to match all the time, then we can avoid a slow and expensive JOIN and use a hash-based comparison instead:

-- Avoid: Full table join
SELECT a.id, b.order_id
FROM orders a
FULL OUTER JOIN order_details b
ON a.id = b.order_id

-- Do: Single table scan
(SELECT SUM(FARM_FINGERPRINT(a.id)) FROM orders a)
=
(SELECT SUM(FARM_FINGERPRINT(b.order_id)) FROM order_details b)

Same hashing idea can be used to find duplicates in a dataset that has no primary key:

SELECT FARM_FINGERPRINT(a.*), COUNT(*) AS cnt
FROM unstructured_table a
GROUP BY 1
HAVING cnt > 1

‍Expanding from there, we can get creative. If we want to validate that we are not missing any blocks in a sequence of consecutive blockchain blocks, we can use the Gauss formula:

-- Avoid: Full table join
SELECT seq
FROM blocks b
RIGHT JOIN UNNEST(GENERATE_ARRAY(1, n)) seq
ON b.block_number = seq
WHERE block_number is NULL

-- Do: Single table scan
SELECT gauss * 2 == n * (n+1) FROM (
	SELECT SUM(block_number) as gauss, COUNT(*) as n
	FROM blocks
)

F. Query structure‍

<h4 class="anchor-link-rich-text" id="order-where-expressions-by-selectivity">19. Order WHERE expressions by selectivity</h4>

BigQuery evaluates WHERE expressions in order and doesn't reorder them. Put the most selective filter first.

-- Do: most selective first
WHERE amount > 0                    -- eliminates 90% of rows
AND memo LIKE 'OUT:%'               -- expensive LIKE only runs on 10%

<h4 class="anchor-link-rich-text" id="avoid-transforming-partition-columns">20. Avoid transforming partition columns</h4>

Applying functions to partition columns (CAST, DATETIME) prevents partition pruning.

-- Avoid: no partition pruning (4hr 29min slot time)
WHERE DATETIME(t.transaction_timestamp, "EST") >= '2022-11-11T10:00:00'

-- Do: partition pruning works (2min 56sec slot time)
WHERE t.transaction_timestamp >= '2022-11-11T10:00:00'

<h4 class="anchor-link-rich-text" id="handle-nulls-explicitly-in-joins-and-filters">21. Handle NULLs explicitly in joins and filters</h4>

NULL values behave unexpectedly in joins and filters. They don't match anything, including other NULLs.

-- Avoid: silently drops rows where bool_value is NULL
SELECT d1.*
FROM data AS d1
JOIN data AS d2 USING(id, bool_value)

-- Do: handle NULLs explicitly
SELECT d1.*
FROM data AS d1
JOIN data AS d2
ON d1.id = d2.id
AND (d1.bool_value = d2.bool_value
     OR (d1.bool_value IS NULL AND d2.bool_value IS NULL))

Similarly, NOT IN and <> exclude NULL rows:

-- Avoid: excludes NULLs
WHERE status NOT IN ('active')

-- Do: include NULLs explicitly
WHERE status NOT IN ('active') OR status IS NULL

‍`‍`Measurement and iteration

You can't optimize what you can't measure. Key metrics to track in the Execution Details tab:

Slot time consumed: Your actual cost
Bytes shuffled: Data moved between stages (high values indicate join/aggregation issues)
Bytes spilled to disk: Memory pressure (indicates query may need restructuring)

Tools we use (custom-in-house):

BigQuery “Job Report Card” Dashboard for tracking slot utilization trends over time
SummarizeSlotUsage operator in our DAGs to inject slot usage reports into logs

Pro tip: Don't re-run production queries to measure cost. Fetch the job details via the BigQuery console link to see historical execution details.

Closing

Performance optimization is a continuous process. The improvements described here aren't one-time wins — they're habits to build into how you write queries.

Start with the highest-impact tips for your workload: data encoding for storage efficiency, partition filtering for scan reduction, and join optimization for compute efficiency will likely give you the biggest wins. Then iterate, measure, and compound those gains over time.

The data tsunami is coming. Make sure your queries are ready. 🏄🏽

[1] Includes downstream bytes processed for analytical use cases. Include both production and development workloads