Skip to content

Commit c32a834

Browse files
committed
doc: image folder example
1 parent 1027655 commit c32a834

File tree

1 file changed

+43
-5
lines changed

1 file changed

+43
-5
lines changed

README.rst

+43-5
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ tensorflow 2's ``tf.data.Dataset``.
2828
It provides a simple solution to oversampling / stratification, weighted
2929
sampling, and finally converting to a ``torch.utils.data.DataLoader``.
3030

31+
3132
Install
3233
=======
3334

@@ -41,6 +42,7 @@ Or, for the old-timers:
4142
4243
pip install pytorch-datastream
4344
45+
4446
Usage
4547
=====
4648

@@ -72,6 +74,45 @@ a more extensive list on API and usage.
7274
.state_dict
7375
.load_state_dict
7476
77+
78+
Simple image dataset example
79+
----------------------------
80+
Here's a basic example of loading images from a directory:
81+
82+
.. code-block:: python
83+
84+
from datastream import Dataset
85+
from pathlib import Path
86+
from PIL import Image
87+
88+
# Assuming images are in a directory structure like:
89+
# tests/images/
90+
# class1/
91+
# image1.jpg
92+
# image2.jpg
93+
# class2/
94+
# image3.jpg
95+
# image4.jpg
96+
97+
image_dir = Path("images")
98+
image_paths = list(image_dir.glob("**/*.jpg"))
99+
100+
dataset = (
101+
Dataset.from_paths(image_paths, pattern=r".*/(?P<class_name>\w+)/(?P<image_name>\w+).jpg")
102+
.map(lambda row: dict(
103+
image=Image.open(row["path"]),
104+
class_name=row["class_name"],
105+
image_name=row["image_name"],
106+
))
107+
)
108+
109+
# Access an item from the dataset
110+
first_item = dataset[0]
111+
print(f"Class: {first_item['class_name']}, Image name: {first_item['image_name']}")
112+
113+
This example demonstrates how to create a dataset from image files, extracting class names and image names from the file paths, and loading the images when accessed.
114+
115+
75116
Merge / stratify / oversample datastreams
76117
-----------------------------------------
77118
The fruit datastreams given below repeatedly yields the string of its fruit
@@ -87,6 +128,7 @@ type.
87128
>>> next(iter(datastream.data_loader(batch_size=8)))
88129
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']
89130
131+
90132
Zip independently sampled datastreams
91133
-------------------------------------
92134
The fruit datastreams given below repeatedly yields the string of its fruit
@@ -101,12 +143,8 @@ type.
101143
>>> next(iter(datastream.data_loader(batch_size=4)))
102144
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]
103145
146+
104147
More usage examples
105148
-------------------
106149
See the `documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_
107150
for more usage examples.
108-
109-
Install from source
110-
===================
111-
112-
.. pip install -e .

0 commit comments

Comments
 (0)