You've got three solid options to choose from. Pick one, work through it methodically, and you'll have a complete end-to-end deep learning project by the end of the afternoon. The key here is to move fast on data acquisition and cleaning so you have real time to experiment with the model itself.
This is the classic. You're working with 42k training images of handwritten digits (0-9), each one 28×28 pixels. The data comes pre-cleaned and CSV-formatted, which means you skip a ton of messy data wrangling. That's time back in your pocket for actually building and tuning the CNN. You'll hit good accuracy quickly, which feels like a real win.
Go to Kaggle's Digit Recognizer competition: https://www.kaggle.com/c/digit-recognizer
Download train.csv and test.csv. That's it. No unzipping folders or dealing with image files scattered everywhere.
Real natural world images. You're classifying scenes into 6 categories: buildings, forest, glacier, mountain, sea, street. Each image is 150×150 RGB—bigger and more complex than MNIST, but still manageable in a couple hours. You handle actual folder structures and image directories, which is how real projects work. More interesting visually too.
Kaggle dataset: https://www.kaggle.com/datasets/puneet6060/intel-image-classification
Download the zip. You get three folders: seg_train, seg_test, and seg_pred. Extract everything to your working directory.
This one bridges what you learned about text mining and brings it into deep learning territory. Binary classification—is this movie review positive or negative? You work with sequences instead of images, which is a different beast entirely. Good for understanding how embeddings and LSTMs handle text. Smaller data size means fast training.
Option A (faster): Use Keras built-in dataset
from tensorflow.keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)Option B (more hands-on): Download from Kaggle https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
The workflow is the same no matter which project you choose. Here's how to structure your afternoon:
Load your data, take a breath, and look at what you're working with.
- Load the dataset into memory
- Check shape and data types
- Look at 5-10 actual examples (images or reviews)
- Plot distributions—how many samples per class? Are they balanced?
- Calculate any basic statistics (mean pixel value for images, review length for text)
Goal: You should feel comfortable with your data before touching any models. Know what you're dealing with.
20 minutes is plenty here. Don't overthink it.
Get data in shape for training.
- Normalize/scale your inputs (divide images by 255, standardize embeddings, etc.)
- Split into train/validation/test if not already done
- Create a few visualizations—show distributions, sample predictions, whatever makes sense for your data type
- Make sure shapes are correct for your model (channels, sequence lengths, etc.)
This is where small mistakes bite you. Spend a few minutes double-checking dimensions.
20 minutes for this too. Most of it is just running code that's pretty mechanical.
Document what you built.
- Save your best model
- Write down final test accuracy
- Note what worked and what didn't
- One sentence on what you'd try next
This isn't for a report. It's for you to remember what you did.
5 minutes to write this down.
- Don't get stuck on preprocessing. If it's taking more than 20 minutes, move on. Good enough beats perfect when you're learning.
- Errors are normal. Shape mismatches, type errors, missing values—these happen. Read the error, fix it, move on.
- Your first model won't be perfect. That's fine. You're learning the workflow, not winning a Kaggle competition.
- Keep your code simple. One file, straightforward logic. Save fancy architectures for later.
- Check dimensions obsessively. Most bugs are shape problems. Add
print(X.shape)liberally.
0:00 - 0:20 | Data exploration & visualization
0:20 - 0:40 | Preprocessing
0:40 - 1:00 | Build & test model (quick run)
1:00 - 1:20 | Full training run
1:20 - 2:00 | Monitor, plot, evaluate
2:00 - 2:25 | Iterate on one thing
2:25 - 2:30 | Document results
You've got this. Pick one, follow the flow, and you'll have a working deep learning pipeline from download to evaluation.