Skip to content

DataScientest/total-deep-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Deep Learning Projects - Wednesday Session

Overview

You've got three solid options to choose from. Pick one, work through it methodically, and you'll have a complete end-to-end deep learning project by the end of the afternoon. The key here is to move fast on data acquisition and cleaning so you have real time to experiment with the model itself.

Project 1: Handwritten Digit Recognition (MNIST)

Why This One

This is the classic. You're working with 42k training images of handwritten digits (0-9), each one 28×28 pixels. The data comes pre-cleaned and CSV-formatted, which means you skip a ton of messy data wrangling. That's time back in your pocket for actually building and tuning the CNN. You'll hit good accuracy quickly, which feels like a real win.

Get the Data

Go to Kaggle's Digit Recognizer competition: https://www.kaggle.com/c/digit-recognizer

Download train.csv and test.csv. That's it. No unzipping folders or dealing with image files scattered everywhere.

Project 2: Scene Classification (Intel Images)

Why This One

Real natural world images. You're classifying scenes into 6 categories: buildings, forest, glacier, mountain, sea, street. Each image is 150×150 RGB—bigger and more complex than MNIST, but still manageable in a couple hours. You handle actual folder structures and image directories, which is how real projects work. More interesting visually too.

Get the Data

Kaggle dataset: https://www.kaggle.com/datasets/puneet6060/intel-image-classification

Download the zip. You get three folders: seg_train, seg_test, and seg_pred. Extract everything to your working directory.

Project 3: Sentiment Analysis (IMDB Reviews)

Why This One

This one bridges what you learned about text mining and brings it into deep learning territory. Binary classification—is this movie review positive or negative? You work with sequences instead of images, which is a different beast entirely. Good for understanding how embeddings and LSTMs handle text. Smaller data size means fast training.

Get the Data

Option A (faster): Use Keras built-in dataset

from tensorflow.keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

Option B (more hands-on): Download from Kaggle https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Project Methodology (Pick Any Project Above)

The workflow is the same no matter which project you choose. Here's how to structure your afternoon:

Phase 1: Understand & Explore (0:00 - 0:20)

Load your data, take a breath, and look at what you're working with.

  • Load the dataset into memory
  • Check shape and data types
  • Look at 5-10 actual examples (images or reviews)
  • Plot distributions—how many samples per class? Are they balanced?
  • Calculate any basic statistics (mean pixel value for images, review length for text)

Goal: You should feel comfortable with your data before touching any models. Know what you're dealing with.

20 minutes is plenty here. Don't overthink it.

Phase 2: Preprocess & Visualize (0:20 - 0:40)

Get data in shape for training.

  • Normalize/scale your inputs (divide images by 255, standardize embeddings, etc.)
  • Split into train/validation/test if not already done
  • Create a few visualizations—show distributions, sample predictions, whatever makes sense for your data type
  • Make sure shapes are correct for your model (channels, sequence lengths, etc.)

This is where small mistakes bite you. Spend a few minutes double-checking dimensions.

20 minutes for this too. Most of it is just running code that's pretty mechanical.

Phase 6: Results & Wrap-Up (2:25 - 2:30)

Document what you built.

  • Save your best model
  • Write down final test accuracy
  • Note what worked and what didn't
  • One sentence on what you'd try next

This isn't for a report. It's for you to remember what you did.

5 minutes to write this down.

Quick Tips

  • Don't get stuck on preprocessing. If it's taking more than 20 minutes, move on. Good enough beats perfect when you're learning.
  • Errors are normal. Shape mismatches, type errors, missing values—these happen. Read the error, fix it, move on.
  • Your first model won't be perfect. That's fine. You're learning the workflow, not winning a Kaggle competition.
  • Keep your code simple. One file, straightforward logic. Save fancy architectures for later.
  • Check dimensions obsessively. Most bugs are shape problems. Add print(X.shape) liberally.

Timeline at a Glance

0:00 - 0:20  | Data exploration & visualization
0:20 - 0:40  | Preprocessing
0:40 - 1:00  | Build & test model (quick run)
1:00 - 1:20  | Full training run
1:20 - 2:00  | Monitor, plot, evaluate
2:00 - 2:25  | Iterate on one thing
2:25 - 2:30  | Document results

You've got this. Pick one, follow the flow, and you'll have a working deep learning pipeline from download to evaluation.

About

Project template for Day 3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published