@@ -28,6 +28,7 @@ tensorflow 2's ``tf.data.Dataset``.
28
28
It provides a simple solution to oversampling / stratification, weighted
29
29
sampling, and finally converting to a ``torch.utils.data.DataLoader ``.
30
30
31
+
31
32
Install
32
33
=======
33
34
@@ -41,6 +42,7 @@ Or, for the old-timers:
41
42
42
43
pip install pytorch-datastream
43
44
45
+
44
46
Usage
45
47
=====
46
48
@@ -72,6 +74,45 @@ a more extensive list on API and usage.
72
74
.state_dict
73
75
.load_state_dict
74
76
77
+
78
+ Simple image dataset example
79
+ ----------------------------
80
+ Here's a basic example of loading images from a directory:
81
+
82
+ .. code-block :: python
83
+
84
+ from datastream import Dataset
85
+ from pathlib import Path
86
+ from PIL import Image
87
+
88
+ # Assuming images are in a directory structure like:
89
+ # tests/images/
90
+ # class1/
91
+ # image1.jpg
92
+ # image2.jpg
93
+ # class2/
94
+ # image3.jpg
95
+ # image4.jpg
96
+
97
+ image_dir = Path(" images" )
98
+ image_paths = list (image_dir.glob(" **/*.jpg" ))
99
+
100
+ dataset = (
101
+ Dataset.from_paths(image_paths, pattern = r " . * /( ?P<class_name> \w + ) /( ?P<image_name> \w + ) . jpg" )
102
+ .map(lambda row : dict (
103
+ image = Image.open(row[" path" ]),
104
+ class_name = row[" class_name" ],
105
+ image_name = row[" image_name" ],
106
+ ))
107
+ )
108
+
109
+ # Access an item from the dataset
110
+ first_item = dataset[0 ]
111
+ print (f " Class: { first_item[' class_name' ]} , Image name: { first_item[' image_name' ]} " )
112
+
113
+ This example demonstrates how to create a dataset from image files, extracting class names and image names from the file paths, and loading the images when accessed.
114
+
115
+
75
116
Merge / stratify / oversample datastreams
76
117
-----------------------------------------
77
118
The fruit datastreams given below repeatedly yields the string of its fruit
87
128
>> > next (iter (datastream.data_loader(batch_size = 8 )))
88
129
[' apple' , ' apple' , ' pear' , ' banana' , ' apple' , ' apple' , ' pear' , ' banana' ]
89
130
131
+
90
132
Zip independently sampled datastreams
91
133
-------------------------------------
92
134
The fruit datastreams given below repeatedly yields the string of its fruit
@@ -101,12 +143,8 @@ type.
101
143
>> > next (iter (datastream.data_loader(batch_size = 4 )))
102
144
[(' apple' , ' pear' ), (' apple' , ' banana' ), (' apple' , ' pear' ), (' apple' , ' banana' )]
103
145
146
+
104
147
More usage examples
105
148
-------------------
106
149
See the `documentation <https://pytorch-datastream.readthedocs.io/en/latest/ >`_
107
150
for more usage examples.
108
-
109
- Install from source
110
- ===================
111
-
112
- .. pip install -e .
0 commit comments