π A tool for splitting image datasets into training and testing sets for classification tasks. Useful for preparing data for machine learning models.
- Splits image datasets into training and testing subsets.
- Supports organizing images by classification labels.
- Generates a YAML configuration file for dataset management.
-
Clone the repository:
git clone https://github.com/Gulciha-n/image-classification-data-split.git cd image-classification-data-split
-
Install required packages:
pip install pyyaml
-
Prepare your dataset:
Ensure your source directory is structured as follows:
source/ βββ train/ βββ belirsiz/ βββ sifir/ βββ on/ βββ yirmi/ βββ otuz/ βββ kirk/ βββ elli/ βββ altmis/ βββ yetmis/ βββ seksen/ βββ doksan/ βββ yuz/
Here,
source
is your dataset directory, and each subdirectory (e.g.,belirsiz
,sifir
, etc.) contains images corresponding to the classification labels. -
Update the script with your paths:
Edit the
main.py
file and update thesource_dir
andtarget_dir
variables with your directory paths.source_dir = r"PATH_TO_YOUR_SOURCE_DIRECTORY" # source directory target_dir = r"PATH_TO_YOUR_TARGET_DIRECTORY" # target directory
-
Run the script:
Execute the following command to run the script:
python main.py
This will organize and split your image dataset into training and testing sets, and generate a
dataset.yaml
file in the target directory.After running the script, your target directory structure will be:
train_test_images/ βββ train/ β βββ belirsiz/ β βββ sifir/ β βββ on/ β βββ yirmi/ β βββ otuz/ β βββ kirk/ β βββ elli/ β βββ altmis/ β βββ yetmis/ β βββ seksen/ β βββ doksan/ β βββ yuz/ βββ test/ βββ belirsiz/ βββ sifir/ βββ on/ βββ yirmi/ βββ otuz/ βββ kirk/ βββ elli/ βββ altmis/ βββ yetmis/ βββ seksen/ βββ doksan/ βββ yuz/
A
dataset.yaml
file will also be created in thetrain_test_images
directory.
- source_dir: Directory where your image data is located.
- target_dir: Directory where the split data will be saved.
- labels: List of classification labels used for organizing images.
This project is licensed under the MIT License