March 12, 2026

How Data Engineering Makes TRM's Blockchain Intelligence Possible

Michael Andrews

How Data Engineering Makes TRM's Blockchain Intelligence Possible

I lead the teams responsible for building and operating our data platform.

TRM delivers blockchain intelligence that helps financial institutions, crypto businesses, and government agencies detect, investigate, and respond to crypto-related financial crime. Our products are used to trace flows tied to scams, hacks, and terrorism financing, and to help investigators move faster when every hour matters.

My team owns and operates TRM's data platform, a self-service system that allows teams across the company to ingest raw blockchain and intelligence data. The platform transforms that data into reliable, queryable datasets at petabyte scale across dozens of blockchains — delivering the freshness, correctness, and performance investigators rely on to trace illicit funds and disrupt active laundering networks.

When we increase blockchain coverage or ship a new data product, we're not just exposing data for business intelligence — we're directly improving the tools investigators rely on to trace illicit activity and respond faster.

At TRM, we "build to protect civilization." For data engineering, that means building systems that deliver accurate, timely data that investigators can trust when it matters most.

What the team shipped in 2025 — and what it means for investigators

Crypto doesn't stand still. In 2025, neither did TRM's data platform.

Blockchain coverage at scale: The data platform now supports 55+ blockchains, enabling TRM's most complex risk exposure computations — graph traversals that trace flows of funds across addresses, entities, and chains, including cross-chain swaps. We onboarded 20+ new blockchains in 2025 alone, and onboarding a new chain is now largely self-service and measured in days, not quarters — allowing us to expand coverage quickly as the ecosystem evolves.
New data products for investigators: On top of that foundation, we shipped 25+ new data products, including Universal Wallet Screening, Entity Screening, Portfolio Balance, Unlimited Custom Entities, and Risk Indicator Trends. These aren't small features — they're net-new experiences and APIs our customers use to block risky transactions in real time, monitor high-risk entities, and understand exposure across the full graph of crypto activity.
Foundational data warehouse at exabyte scale: Our foundational data warehouse stores petabytes of blockchain and intelligence data while processing roughly 1 exabyte of data each year across pipelines and analytical workloads. An orchestration layer of 750+ Airflow DAGs coordinates millions of tasks every day to keep the platform's datasets fresh, reliable, and ready for investigator workflows.
Petabyte-scale lakehouse serving platform: Under the hood, we launched our next-generation serving platform: a StarRocks + Iceberg lakehouse that allows us to run fast, cost-efficient analytics over petabyte-scale blockchain datasets stored in cloud object storage. The platform's serving layer now operates over more than 6 petabytes of blockchain intelligence data, yet backfilling large datasets still takes hours instead of days — dramatically accelerating how quickly we can launch new blockchains and data products.
High-throughput infrastructure: We also released high-throughput infrastructure that can handle Solana-scale write throughput (~90K TPS).
AI agents for data platform operations: We've begun deploying AI agents that assist with data quality monitoring, incident triage, and platform optimization — early steps toward an AI-native data platform where routine operational work can increasingly be automated.

What's at stake is simple: if our systems fail to keep up with the scale and complexity of blockchain activity, investigators fall behind. If we succeed, we give the good actors a structural advantage.

How the team operates: From raw blockchain data to investigator-ready intelligence

The data engineering and data platform teams build and operate the platform that supports the data lifecycle end-to-end — enabling teams across TRM to ingest raw data and turn it into the analytics and APIs investigators rely on:

Self-service ingestion and normalization of blockchain, attribution, and intelligence data across the company.
Large-scale data processing and transformation through our foundational data warehouse.
Workflow orchestration across hundreds of pipelines and data workflows.
Observability and service-level objectives that ensure the platform remains reliable, correct, and cost-efficient.
Analytics and serving through our data lakehouse and serving layer (Iceberg + StarRocks), which powers internal analytics and customer-facing APIs.

We operate as an internal platform team with strong product sensibilities. Our direct "customers" are TRM's product engineers, data scientists, analysts, and threat researchers — and indirectly, every investigator and compliance analyst who relies on TRM.

A few principles shape how we work:

SLOs over vibes. We measure freshness, correctness, and completeness — not just uptime — and we review them continuously. If a pipeline misses, it's treated as an incident.
Self-service by default. Onboarding a new blockchain or data product increasingly follows a standardized workflow: define schemas and tests, plug into common pipelines, and let the platform handle the heavy lifting.
AI + automation first. We're embedding AI-driven workflows into cost management, data quality monitoring, and incident triage so engineers spend more time on architecture and less time babysitting jobs.
Hard tradeoffs in the open. We operate under explicit goals around cost efficiency, incident management, observability, data quality, and developer velocity. When we prioritize one, we're clear about the tradeoffs with the others.

Our work centers on deep platform-level infrastructure — distributed systems, data modeling, query optimization, and AI-native system design — while collaborating closely with product, data science, and go-to-market teams. We spend as much time understanding investigator workflows and regulatory constraints as we do optimizing the systems that power them.

How we evolved TRM’s data platform — and the principles behind it

Over the last few years, I've had the privilege of helping lead the evolution of TRM's data platform — from a first-generation system originally built 0 → 1 to support product-market fit and scale to our first 50 blockchains, to a next-generation lakehouse that is more scalable, cost-efficient, and built to be AI-native.

One of the most meaningful parts of that journey has been building and growing the team that made the next-generation leap possible.

One example I’m particularly proud of is the next generation data platform migration. We moved one of our largest and most business-critical workloads onto StarRocks + Iceberg, eliminating significant legacy storage and operational complexity while maintaining effectively zero customer impact during the transition.

It was a cost, SLO, and developer experience triple win, but it only happened because engineers across the team treated reliability as non-negotiable.

My bias as a leader:

I care deeply about clarity of outcomes: cost per query, freshness SLAs, time to onboard a new chain, and time to recover from incidents.
I believe the data platform should be treated as a product — fast, reliable, and intuitive for every internal user, not a leaky abstraction that requires constant babysitting. Our goal is to make the platform increasingly anti-fragile, improving as we scale.
I believe in strong ownership. Engineers should be able to own and operate meaningful parts of the platform, and collaborate with teammates to continuously evolve and improve the system.
I believe AI-native tooling will be essential to how data platforms operate in the future. For a team managing systems at this scale, it's the only path to the step-function improvements in productivity and reliability we're aiming for.

The kind of engineer who does their best work here

Engineers who tend to thrive on this team are:

Engineers who are mission-motivated: you want your work on query engines, pipelines, and SLOs to show up in stories about recovered funds and disrupted illicit networks — not just lower latency charts.
Engineers who thrive with ownership: you ship systems, you're accountable for their reliability, and you help define the metrics that measure success.
Engineers who enjoy hard migrations and 0 → 1 builds: re-platforming to a lakehouse, designing AI-native APIs, or making onboarding a new chain a self-service, low-friction workflow.
Systems thinkers who enjoy the intersection of distributed systems, databases, and data modeling — and who can reason about how a decision deep in the data platform ultimately affects what investigators see on screen.

People who generally don't thrive here are:

Those who want a narrow, stable slice of the stack. Our scope spans ingestion, storage, compute, and serving, and the architecture is evolving as fast as the ecosystem.
Engineers who are allergic to ambiguity or on-call. We're deliberate about SLOs and incident response, but this is a live-fire environment: things will break, and we expect engineers to help fix and improve them.
Engineers who prefer purely theoretical work. We value strong fundamentals, but at TRM the bar is simple: does this help investigators and customers do their jobs better this quarter?

What makes this role different from most data engineering jobs

You could spend the next chapter of your career optimizing ad auctions or feed ranking. Or you could help build a petabyte-scale data platform investigators rely on to trace illicit funds and disrupt money laundering, terrorism financing, and large-scale fraud on the blockchain.

As a data engineer here, you'll:

Design and operate systems that process data across hundreds of blockchains, including the cross-chain activity criminals use to obscure the movement of funds.
Build and evolve the lakehouse and serving platform that powers the tools investigators and compliance teams use every day.
Help define what an AI-native data platform looks like in a domain where correctness, explainability, and safety are non-negotiable.

Compared to many "AI" or "data" roles you might be considering, TRM gives you a few unique things:

A mission with real-world stakes, where the systems you build directly support investigations into scams, hacks, and illicit finance.
Petabyte-scale data and infrastructure challenges typically seen at hyperscalers, but inside a startup where engineers can still meaningfully shape the architecture and the platform.
A team investing heavily in AI agents and automated workflows to multiply engineers' impact, not replace it.

How to apply

If you're a data engineer who wants to build an AI-powered data platform at petabyte scale — and you care that your work makes it harder for illicit actors to operate — this might be your next mission.

Take a look at our open roles on the TRM careers page: https://www.trmlabs.com/careers.

If you see something that fits, apply or reach out and tell us how you think you could help move this platform — and this mission — forward.