top of page
Modern Workspace Layout

Real-Time Financial ETL Pipeline

Databricks | PySpark | Spark SQL | Delta Lake | GCP

📌 Business Problem

Financial transaction data was stored in raw format and not optimized for analytics or reporting. The goal was to build a scalable ETL pipeline that transforms raw transactional data into analytics-ready datasets using modern data engineering best practices.

🛠 Tools & Technologies

  • Databricks

  • PySpark

  • Spark SQL

  • Delta Lake

  • Google Cloud Platform (GCP)

  • Medallion Architecture (Bronze/Silver/Gold)

🔄 Solution & Architecture

I designed and implemented a Medallion Architecture pipeline:

Bronze Layer

  • Ingested raw financial transaction data

  • Applied schema enforcement and basic validation

  • Stored immutable raw data for traceability

Silver Layer

  • Cleaned and standardized data

  • Applied deduplication and null handling

  • Performed data type casting and validation checks

  • Built structured tables optimized for transformations

Gold Layer

  • Created curated tables for reporting and analytics

  • Aggregated transaction metrics

  • Optimized datasets for BI tool consumption

📊 Key Outcomes

  • Built scalable ETL pipeline following industry-standard data architecture

  • Reduced transformation complexity for downstream analytics

  • Enabled analytics-ready structured datasets for reporting

  • Demonstrated end-to-end data engineering workflow from ingestion to consumption

I/O Architecture Diagram

architecture.png

Pipeline flow:

  1. Synthetic transaction generator writes JSON batches to GCS (Raw).

  2. Databricks Auto Loader ingests into Bronze (streaming + checkpointing).

  3. Silver cleans and enriches data (dates, hour, weekday, high-value flag).

  4. Gold builds business KPIs (daily/hourly/category/top accounts/high-value table).

  5. Databricks SQL Warehouse serves curated Gold tables to Power BI.

Data Modeling (Power BI)

  • Star schema with a shared DateTable dimension and Merchant Category dimension

  • Measures include:

    • Total Transactions

    • Total Spend

    • High Value Transaction Count

    • DoD % Change (time intelligence)​

Executive.png
insights.png

Gallery

bottom of page