The primary objective of this Capstone project is to develop an end-to-end data analysis pipeline that assists the campus facilities team in tracking and optimizing electricity usage. By automating the processing of raw meter data, this dashboard provides actionable insights to support administrative decision-making regarding energy efficiency.
Key goals include:
- Ingesting and cleaning raw data from multiple building sources.
- Implementing Object-Oriented Programming (OOP) to model real-world building systems.
- Visualizing trends, peak loads, and comparisons to identify high-consumption areas[cite: 62].
The dataset used for this project consists of synthetic meter readings generated to simulate realistic campus energy usage.
- Location: The data is located in the
/data/directory of this repository. - Format: The data consists of multiple
.csvfiles (e.g.,building_A.csv,building_B.csv). - Structure: Each file contains the following columns:
timestamp: The date and time of the recording (YYYY-MM-DD HH:MM:SS).kwh: The electricity consumption in Kilowatt-hours.
- Note: The script is designed to handle missing files or corrupt data lines during ingestion.
The solution is implemented in Python and follows a modular structure divided into four main stages:
- utilized the
pathliblibrary to dynamically detect and loop through all CSV files in the data directory. - Implemented
pandasto read files into a master DataFrame, using exception handling (try-except blocks) to manage missing files or invalid data formats.
To ensure code scalability and organization, the system is modeled using three primary classes:
MeterReading: Stores individual timestamp and energy data points.Building: Represents a specific building, managing a list of readings and providing methods to calculate total consumption.BuildingManager: Orchestrates multiple building objects to generate campus-wide summaries.
- Used Pandas functions such as
.groupby()and.resample()to transform raw data into meaningful time-series insights. - Calculated metrics including Daily Totals, Weekly Averages, and Peak Hour Loads.
- Created a unified dashboard using
matplotlib.pyplotwithplt.subplots(). - The dashboard includes three distinct visualizations:
- Trend Line: Displays daily consumption fluctuations over time.
- Bar Chart: Compares average weekly usage across different buildings.
- Scatter Plot: Maps consumption against the hour of the day to identify peak load times.
Upon execution, the script generates a textual summary and data exports in the /output/ folder. Key insights derived from the analysis include:
- Total Campus Consumption: A consolidated metric of energy use across all facilities.
- Highest Consumer: Identification of the building with the highest total energy draw.
- Peak Load Time: Analysis of the specific hour (e.g., 14:00 or 15:00) when campus energy demand is highest.
Generated Files:
dashboard.png: The visual dashboard.cleaned_energy_data.csv: The merged and processed dataset.building_summary.csv: Aggregated statistics per building.summary.txt: A concise executive report.
- Ensure you have Python installed along with the required libraries:
pip install pandas matplotlib
- Place your CSV data files in the
data/folder. - Run the main script:
python dashboard.py
- Check the
output/folder for the results.