// Project 01·2022 — 2024
Enterprise Cloud Modernization Program
// Headline number
~1 TB
processed / day
// Architecture
live · 9 nodes · 9 edges
Company-wide modernization from legacy Alteryx and MySQL workflows to a cloud-native GCP ecosystem using Python, BigQuery, Airflow, PySpark, Terraform, Dataproc, Cloud Composer, GCS, CI/CD, and metadata-driven automation. Started as a proof of concept and became the architecture used to scale data processing, reduce runtime and cost, and support self-service operations across the business.
// The problem
Legacy Alteryx and MySQL workflows couldn't keep pace with growing data volume. ETL ran for hours, gave teams no operational visibility, and required engineering to onboard every new data source by hand.
// My approach
Designed and built a cloud-native architecture on GCP: BigQuery for warehousing, Cloud Composer + Airflow for orchestration, Dataproc + PySpark for heavy transforms. The core innovation was a metadata-driven framework that lets analysts add new pipelines by editing config files — not writing code. Terraform managed everything as code; CI/CD shipped changes safely. Migrations ran incrementally so production never stopped.
// Stack
// Outcome
- Pipeline runtimes cut from hours to minutes
- ~1 TB processed daily across business domains
- Right-sized Dataproc clusters reduced cost per workload
- Self-service onboarding for non-engineering teams