Data Bank is a fictional neo-bank that ties customer cloud data storage limits directly to their account balance. Launched as Case Study #4 of Danny Ma's 8 Week SQL Challenge, this project analyses 500 customers across three tables — node assignments, regions, and transactions — to answer business questions about distributed network security, transaction behaviour, and data storage provisioning under four different allocation models (including a daily compounding interest variant). The project also includes a PowerPoint presentation built from the query results, targeting both investors and the internal management team.
| Table | Description | Rows |
|---|---|---|
regions |
The 5 geographic regions where Data Bank nodes operate | 5 |
customer_nodes |
Each period a customer was assigned to a specific node; end_date = '9999-12-31' marks their current active assignment |
3,500 |
customer_transactions |
All deposits, withdrawals, and purchases made via Data Bank debit card | 5,868 |
Data source note: the case study webpage's sample table shows region 1 = Africa. The canonical dataset (the embedded DB Fiddle linked from the case study page) has region 1 = Australia and region 3 = Africa. This repo uses the canonical data throughout; a comment in databank_setup.sql flags the discrepancy.
data-bank-sql-case-study/
├── README.md
├── .gitignore
├── LICENSE
├── sql/
│ ├── databank_setup.sql # schema, full dataset, data quality notes
│ └── databank_solutions.sql # all case study questions answered and commented
├── screenshots/
└── assets/
├── extension-questions.md # written analysis for the Extension section
└── data-bank-presentation.pptx # investor + management PowerPoint
- How many unique nodes are there on the Data Bank system?
- What is the number of nodes per region?
- How many customers are allocated to each region?
- How many days on average are customers reallocated to a different node?
- Median, 80th and 95th percentile for reallocation days, per region
- Unique count and total amount for each transaction type
- Average total deposit counts and amounts per customer
- Monthly count of customers making more than 1 deposit and at least 1 purchase or withdrawal
- Closing balance for each customer at the end of each month
- Percentage of customers who increased their closing balance by more than 5%
Three provisioning models built and compared:
- Option 1: allocation based on end-of-previous-month closing balance
- Option 2: allocation based on average running balance during the month
- Option 3: real-time allocation based on running balance at each transaction
Plus the three sub-components required to build them: running balance column, monthly closing balance, and per-customer min/avg/max running balance.
A fourth provisioning model applying 6% annual interest compounded daily, using a recursive CTE date spine (118 days × 500 customers) with a daily POWER() calculation. Simple (non-compounding) interest is excluded by design.
Open-ended analysis for two audiences:
- Headline security insights for investors — drawn from Part A node data
- Data provisioning comparison for management — comparing all four options
See assets/extension-questions.md and assets/data-bank-presentation.pptx.
- Window functions:
ROW_NUMBER(),LAG(),SUM() OVER(),AVG() OVER() - Manual percentile calculation in MySQL 8 (workaround for missing
PERCENTILE_CONT WITHIN GROUP) - Recursive CTEs for generating a date spine across a date range
CROSS JOINfor building a customer × date gridLEFT JOINwith aggregated CTEs for sparse time-series dataPOWER()for daily compound interest calculationGREATEST()for non-negative data allocation floor- Sentinel value handling (
end_date = '9999-12-31') - Conditional aggregation with
CASEand booleanSUM
- All 5 nodes are active in every region — the architecture has no node concentration risk by design. Customers are reallocated every 15 days on average, with a 95th-percentile maximum of 28 days, meaning even the slowest-rotating customers move within a month.
- Australia leads with 110 customers across the 5 regions, while Europe has the fewest at 88 — no single region holds more than 22% of the customer base.
- Deposits dominate transaction activity: 2,671 deposits totalling $1,359,168, versus 1,617 purchases and 1,580 withdrawals. The average customer makes 5.3 deposits at $508.61 each.
- Only 23.4% of customers (117 out of 500) grew their closing balance by more than 5% across their active period — suggesting the majority of customers are net spending down their balances over time.
- The three provisioning options differ by an order of magnitude in their monthly storage demands: Option 1 peaks at ~$240,602 (March), Option 3 peaks at ~$962,883 (February), and the compounding interest model peaks at ~$8,412,729 (March) — making Option 3 roughly 4× and the compounding model roughly 35× more infrastructure-intensive than the conservative Option 1.
The data allocation challenge (Part C) required thinking carefully about what each option actually means computationally, not just analytically. Option 3 in particular — summing all positive running balance snapshots rather than a single period-end figure — produces numbers that look alarming until you understand why: it's accumulating storage demand across every transaction event in the month, not per customer per month. Getting that semantics right in the query required tracing through a few customers manually before trusting the aggregate result.
Part D's recursive CTE approach also surfaced a MySQL-vs-PostgreSQL portability gap worth noting: the case study's default DB Fiddle runs PostgreSQL, which has PERCENTILE_CONT WITHIN GROUP built in. MySQL 8 doesn't, so A5 required a manual ROW_NUMBER + CEIL/FLOOR approach instead. It produces the same result but is meaningfully more verbose — the kind of thing worth flagging in a technical interview if asked why the query looks more complex than it needs to be.
- Clone this repo:
git clone https://github.com/raugan/data-bank-sql-case-study.git cd data-bank-sql-case-study - Load the schema and data:
mysql -u root -p < sql/databank_setup.sql - Run the solutions:
mysql -u root -p < sql/databank_solutions.sql
Both scripts were tested end-to-end on a fresh MySQL 8.0 instance before being committed.
Case study by Danny Ma — 8 Week SQL Challenge, Case Study #4: Data Bank