
🛍 Retail Performance & Customer Analytics
Databricks | PySpark | BigQuery | Looker Studio | SQL
📌 Business Problem
Retail transaction data lacked structured reporting, making it difficult to analyze customer behavior, revenue trends, and product performance. The objective was to build a data pipeline and dashboard solution to generate business KPIs and customer insights.
🛠 Tools & Technologies
-
Databricks
-
PySpark
-
Spark SQL
-
Google BigQuery
-
Looker Studio
-
RFM Analysis
🔄 Data Process
Data Engineering Layer
-
Ingested and transformed raw retail transaction datasets
-
Cleaned missing values and standardized formats
-
Built structured Silver tables for transformation
Analytics Layer
-
Created Gold tables for:
-
Net Sales
-
Average Order Value (AOV)
-
Discount Impact
-
Customer segmentation
-
📊 Key Outcomes
-
Implemented RFM segmentation to classify customer behavior
-
Modeled revenue KPIs for executive reporting
-
Built a 3-page interactive dashboard showing:
-
Sales performance trends
-
Product-level performance
-
Customer segments & revenue contribution
I/O Architecture Diagram

Pipeline flow:
-
Create Schema and upload the Raw CSV Files (Raw).
-
Ingests data into Bronze Schema (streaming + checkpointing).
-
Silver cleans and enriches data (dates, hour, weekday, high-value flag).
-
Gold builds business KPIs (daily/hourly/category/top accounts/high-value table).
-
Created dashboard using Looker Studio
Data Modeling (Looker Studio)
The Gold layer follows a business-centric modeling approach:
-
Fact-style aggregated tables for reporting
-
Pre-calculated KPIs (Net Sales, Units Sold, AOV)
-
Promotion impact aggregation
-
Product performance metrics
-
RFM scoring for customer segmentation The model prioritizes:
-
Performance
-
Simplicity for BI tools
-
Clear business definitions
.png)





