William Gonzalez

// Project 01·2022 — 2024

Enterprise Cloud Modernization Program

// Headline number

~1 TB

processed / day

// Architecture

live · 9 nodes · 9 edges

// active jobs2 running·1 queued

RUN_3421·ingest_users_mysql

24s

RUN_3422·transform_price_history

12s

RUN_3423·load_to_bigquery

3s

RUN_3424·backfill_alteryx_exports

—

Company-wide modernization from legacy Alteryx and MySQL workflows to a cloud-native GCP ecosystem using Python, BigQuery, Airflow, PySpark, Terraform, Dataproc, Cloud Composer, GCS, CI/CD, and metadata-driven automation. Started as a proof of concept and became the architecture used to scale data processing, reduce runtime and cost, and support self-service operations across the business.

// The problem

Legacy Alteryx and MySQL workflows couldn't keep pace with growing data volume. ETL ran for hours, gave teams no operational visibility, and required engineering to onboard every new data source by hand.

// My approach

Designed and built a cloud-native architecture on GCP: BigQuery for warehousing, Cloud Composer + Airflow for orchestration, Dataproc + PySpark for heavy transforms. The core innovation was a metadata-driven framework that lets analysts add new pipelines by editing config files — not writing code. Terraform managed everything as code; CI/CD shipped changes safely. Migrations ran incrementally so production never stopped.

// Stack

gcpbigqueryairflowcloud-composerpysparkterraformdataprocgcspythoncloud-migrationalteryxmysql

// Outcome

Pipeline runtimes cut from hours to minutes
~1 TB processed daily across business domains
Right-sized Dataproc clusters reduced cost per workload
Self-service onboarding for non-engineering teams

// Project 02·2023 — 2024

Cortex

// Headline number

~90%

routine investigations · self-serve

// Operations app

live

cortex · pipeline ops

v22

DAGS

4 dags+ new

user_etl

142ms

price_sync

—

refresh_cache

—

tenant_metrics

89ms

End-user App Engine application built with React, Python, BigQuery, MySQL, GCS buckets, Airflow, and GCP services that connects DAGs, metadata, BigQuery processes, MySQL state, logs, errors, and pipeline status into a single operational interface. Gives internal users real-time messages, real-time tracking, process snapshots, monitoring views, data checks, reporting utilities, and metadata controls so they can manage complex cloud workflows without engineering intervention.

// The problem

Engineering owned all operational visibility — logs, pipeline state, data checks, errors. Business teams had to file tickets to see what was happening with their own data. This created bottlenecks, slowed incident response, and eroded trust in the platform.

// My approach

Built a React + App Engine app that consolidates Airflow DAGs, BigQuery state, MySQL metadata, and pipeline logs into one operational interface. Real-time updates for status changes, structured search across logs and errors, one-click data validation, and controls to rerun, skip, or override pipelines without engineering tickets.

// Stack

app-enginereactpythonbigquerymysqlgcsairflowdagsgcpinternal-toolsmetadata-drivenmetadata-management

// Outcome

Engineering tickets for operational issues dropped sharply
Business teams self-serve ~90% of routine investigations
Mean-time-to-detect on data issues materially improved
Designed, built, and maintained as a single-engineer project

// Project 03·2021 — 2022

Automated Mover Modeling System

// Headline number

100s

models trained in parallel

// ML pipeline · stage 1/6

running

123456

Split data

train · test · val

train · 70%

test

val

8,420 rows · stratified split

Metadata-driven machine learning infrastructure supporting parallel model training, retraining, and scoring across large client portfolios with PySpark, Airflow, imbalance correction, and automated orchestration.

// The problem

Modeling client portfolios at scale meant training, retraining, and scoring hundreds of models in parallel. The old workflow lived in notebooks — slow, error-prone, manually triggered, and impossible to audit reliably.

// My approach

Built a metadata-driven ML platform on PySpark + Airflow. Data scientists configure new model runs via YAML; the system handles class imbalance correction, parallel training across client segments, model versioning, and automated scoring. Every run is fully reproducible from its logged config and metrics.

// Stack

machine-learningpysparkairflowmetadata-drivenmodel-orchestrationimbalanced-classificationretrainingscoring

// Outcome

Hundreds of models trained in parallel runs
Manual notebook work eliminated from the loop
Imbalance correction built in by default
Reproducibility through metadata-versioned configs

// Project 04·2020 — 2022

ROI and QBR Reporting Automation

// Headline number

€60k

annual time savings unlocked

// Report sheet · Q4 ROI

computing

fx=SUM(B2:E4)xlsx · auto-saved

	A	B	C	D	E
1		Q1	Q2	Q3	Q4
2	ACME CORP	42k	58k	68k	74k
3	BETA INC	38k	45k	52k	60k
4	GAMMA LTD	28k	33k	41k	48k
5	TOTAL	0k	0k	0k	0k

Automated reporting systems that reduced ROI report generation from minutes or hours to seconds and produced fully formatted QBR PowerPoint reports in under one minute.

// The problem

ROI reports took 30 minutes to hours to generate manually and were prone to formatting errors. Quarterly business reviews (QBRs) ate entire days of slide assembly. Account managers spent more time formatting than analyzing.

// My approach

Built Django + React systems backed by BigQuery to generate ROI reports on demand (seconds) and assemble fully formatted QBR PowerPoint decks programmatically. The QBR engine reads from analytics tables and writes branded slides with consistent typography, charts, and narrative templates.

// Stack

reportingautomationroiqbrpowerpointdjangoreactbigquerypythonanalytics

// Outcome

ROI generation 30 min — hours → seconds
QBR PowerPoint a full day → under one minute
~€60k annual time savings unlocked
Account teams refocus on analysis, not assembly

Selectedwork.

Enterprise Cloud Modernization Program

Cortex

Automated Mover Modeling System

ROI and QBR Reporting Automation

Let's build something.