🏦 Data Bank — SQL Case Study

Overview

Data Bank is a fictional neo-bank that ties customer cloud data storage limits directly to their account balance. Launched as Case Study #4 of Danny Ma's 8 Week SQL Challenge, this project analyses 500 customers across three tables — node assignments, regions, and transactions — to answer business questions about distributed network security, transaction behaviour, and data storage provisioning under four different allocation models (including a daily compounding interest variant). The project also includes a PowerPoint presentation built from the query results, targeting both investors and the internal management team.

Dataset

Table	Description	Rows
`regions`	The 5 geographic regions where Data Bank nodes operate	5
`customer_nodes`	Each period a customer was assigned to a specific node; `end_date = '9999-12-31'` marks their current active assignment	3,500
`customer_transactions`	All deposits, withdrawals, and purchases made via Data Bank debit card	5,868

Data source note: the case study webpage's sample table shows region 1 = Africa. The canonical dataset (the embedded DB Fiddle linked from the case study page) has region 1 = Australia and region 3 = Africa. This repo uses the canonical data throughout; a comment in databank_setup.sql flags the discrepancy.

Repo Structure

data-bank-sql-case-study/
├── README.md
├── .gitignore
├── LICENSE
├── sql/
│   ├── databank_setup.sql         # schema, full dataset, data quality notes
│   └── databank_solutions.sql     # all case study questions answered and commented
├── screenshots/
└── assets/
    ├── extension-questions.md          # written analysis for the Extension section
    └── data-bank-presentation.pptx    # investor + management PowerPoint

Questions Answered

A. Customer Nodes Exploration

How many unique nodes are there on the Data Bank system?
What is the number of nodes per region?
How many customers are allocated to each region?
How many days on average are customers reallocated to a different node?
Median, 80th and 95th percentile for reallocation days, per region

B. Customer Transactions

Unique count and total amount for each transaction type
Average total deposit counts and amounts per customer
Monthly count of customers making more than 1 deposit and at least 1 purchase or withdrawal
Closing balance for each customer at the end of each month
Percentage of customers who increased their closing balance by more than 5%

C. Data Allocation Challenge

Three provisioning models built and compared:

Option 1: allocation based on end-of-previous-month closing balance
Option 2: allocation based on average running balance during the month
Option 3: real-time allocation based on running balance at each transaction

Plus the three sub-components required to build them: running balance column, monthly closing balance, and per-customer min/avg/max running balance.

D. Extra Challenge — Daily Compounding Interest

A fourth provisioning model applying 6% annual interest compounded daily, using a recursive CTE date spine (118 days × 500 customers) with a daily POWER() calculation. Simple (non-compounding) interest is excluded by design.

Extension

Open-ended analysis for two audiences:

Headline security insights for investors — drawn from Part A node data
Data provisioning comparison for management — comparing all four options

See assets/extension-questions.md and assets/data-bank-presentation.pptx.

SQL Concepts Demonstrated

Window functions: ROW_NUMBER(), LAG(), SUM() OVER(), AVG() OVER()
Manual percentile calculation in MySQL 8 (workaround for missing PERCENTILE_CONT WITHIN GROUP)
Recursive CTEs for generating a date spine across a date range
CROSS JOIN for building a customer × date grid
LEFT JOIN with aggregated CTEs for sparse time-series data
POWER() for daily compound interest calculation
GREATEST() for non-negative data allocation floor
Sentinel value handling (end_date = '9999-12-31')
Conditional aggregation with CASE and boolean SUM

Findings

All 5 nodes are active in every region — the architecture has no node concentration risk by design. Customers are reallocated every 15 days on average, with a 95th-percentile maximum of 28 days, meaning even the slowest-rotating customers move within a month.
Australia leads with 110 customers across the 5 regions, while Europe has the fewest at 88 — no single region holds more than 22% of the customer base.
Deposits dominate transaction activity: 2,671 deposits totalling $1,359,168, versus 1,617 purchases and 1,580 withdrawals. The average customer makes 5.3 deposits at $508.61 each.
Only 23.4% of customers (117 out of 500) grew their closing balance by more than 5% across their active period — suggesting the majority of customers are net spending down their balances over time.
The three provisioning options differ by an order of magnitude in their monthly storage demands: Option 1 peaks at ~$240,602 (March), Option 3 peaks at ~$962,883 (February), and the compounding interest model peaks at ~$8,412,729 (March) — making Option 3 roughly 4× and the compounding model roughly 35× more infrastructure-intensive than the conservative Option 1.

What I Learned

The data allocation challenge (Part C) required thinking carefully about what each option actually means computationally, not just analytically. Option 3 in particular — summing all positive running balance snapshots rather than a single period-end figure — produces numbers that look alarming until you understand why: it's accumulating storage demand across every transaction event in the month, not per customer per month. Getting that semantics right in the query required tracing through a few customers manually before trusting the aggregate result.

Part D's recursive CTE approach also surfaced a MySQL-vs-PostgreSQL portability gap worth noting: the case study's default DB Fiddle runs PostgreSQL, which has PERCENTILE_CONT WITHIN GROUP built in. MySQL 8 doesn't, so A5 required a manual ROW_NUMBER + CEIL/FLOOR approach instead. It produces the same result but is meaningfully more verbose — the kind of thing worth flagging in a technical interview if asked why the query looks more complex than it needs to be.

How to Run

Clone this repo:

git clone https://github.com/raugan/data-bank-sql-case-study.git
cd data-bank-sql-case-study

Load the schema and data:

mysql -u root -p < sql/databank_setup.sql

Run the solutions:

mysql -u root -p < sql/databank_solutions.sql

Both scripts were tested end-to-end on a fresh MySQL 8.0 instance before being committed.

Source

Case study by Danny Ma — 8 Week SQL Challenge, Case Study #4: Data Bank

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏦 Data Bank — SQL Case Study

Overview

Dataset

Repo Structure

Questions Answered

A. Customer Nodes Exploration

B. Customer Transactions

C. Data Allocation Challenge

D. Extra Challenge — Daily Compounding Interest

Extension

SQL Concepts Demonstrated

Findings

What I Learned

How to Run

Source

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
screenshots		screenshots
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🏦 Data Bank — SQL Case Study

Overview

Dataset

Repo Structure

Questions Answered

A. Customer Nodes Exploration

B. Customer Transactions

C. Data Allocation Challenge

D. Extra Challenge — Daily Compounding Interest

Extension

SQL Concepts Demonstrated

Findings

What I Learned

How to Run

Source

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages