Back to case study

Smart Scaling: How a Healthcare Provider Governed 100K+ Columns of Sensitive Data in Two Weeks — at 10x Less Than SaaS

Interactive project timeline

Discovery

Scoping Governance Requirements

The engagement started with scoping sessions across the data platform team, IT security, and compliance to map the full governance landscape.

Key Findings

  • 10K+ tables and 100K+ columns of data in Snowflake, much of it containing PII/PHI subject to HIPAA/HITECH
  • Existing table-level access controls couldn't restrict at the column level without creating thousands of roles
  • Off-the-shelf SaaS solutions quoted at multiple six figures annually — beyond the budget constraint
  • The data team was lean and couldn't absorb additional governance overhead without new hires
  • Self-service analytics and new platform development were blocked until governance was solved

Approach Decision

Tag-based dynamic data masking was chosen over traditional role-based access control. Tags in dbt YAML files would define policies declaratively, and the existing CI/CD pipeline would enforce them — fitting naturally into the team's existing Snowflake + dbt workflow without introducing new tools.

Build Phase 1

Dynamic Data Masking & Developer Experience

Two parallel workstreams launched: the core masking infrastructure and the developer tools that would make it maintainable.

Tag-Based Dynamic Data Masking

Custom infrastructure-as-code and dbt macros deployed a tag-based masking system in Snowflake. Database, schema, table, and column-level tags in dbt YAML files controlled what was masked and for whom. The CI/CD pipeline enforced masking automatically on every deployment.

Why this was fast: The system built on Snowflake's native dynamic data masking features and the client's existing dbt workflow. No new platform to learn, no new infrastructure to provision. Two weeks from kickoff to masking terabytes of data.

VS Code Devcontainer

A custom development environment for analytics engineers with pre-commit hooks that caught incorrectly defined masking policies before code was committed. Compliance errors surfaced immediately in the developer's local environment — not in production.

Why devcontainers: The goal was to make compliance the path of least resistance. If defining a masking policy was harder than skipping it, engineers would skip it. The devcontainer made correct policies the default.

Build Phase 2

CI/CD Enforcement

With masking deployed and developer tools in place, the next step was automated enforcement — ensuring compliance improved with every deployment, not just when someone remembered to check.

Custom dbt Application

A custom dbt application was built to run in the CI/CD pipeline with three capabilities:

  • Auto-enforce default policies on new or untagged assets
  • Correct misconfigurations where tags conflicted with masking rules
  • Track coverage gaps by flagging assets not yet covered by the governance system

Why automated enforcement: Manual compliance reviews don't scale. With 100K+ columns, any manual process would either miss gaps or consume engineering hours the team didn't have. Automated enforcement meant the system got more compliant with every commit.

Deploy

Phased Rollout

Rolling out dynamic data masking across a live warehouse with dependent applications required controlled enrollment — breaking an application because of a misapplied mask would destroy trust in the new system.

Controlled Enrollment

The team mapped every application interacting with the warehouse, then enrolled them incrementally. Each iteration included testing to verify that masking worked correctly for each application's access patterns.

Why phased: A big-bang rollout would have been faster but riskier. Any masking error affecting a production application would have created an incident — and given governance skeptics ammunition to push back on the entire system. Phased rollout traded speed for confidence.

Deliver

Full Handoff

By end of quarter, all systems were fully migrated and the client was operating independently.

MetricBeforeAfter
Data masking coverageManual, table-level100K+ columns, 10K+ tables
Compliance postureManual checks24/7/365 automated
Time to deployment2 weeks initial
Cost vs. SaaS alternativesMultiple six figures/year~10x less

What Made It Work

Three factors combined:

  1. Building on native Snowflake features — Dynamic data masking was already available; the engineering challenge was making it declarative and automated at scale through dbt and CI/CD
  2. Making compliance the easy path — Devcontainers, pre-commit hooks, and automated enforcement meant doing the right thing was easier than doing the wrong thing
  3. Phased rollout — Controlled enrollment built confidence with stakeholders and prevented incidents that could have derailed adoption

The client now has continuous HIPAA/HITECH compliance across their entire data warehouse — maintained automatically, without dedicated headcount.

Want to read the full case study?

Read the full article