Skip to content

Commit 64f74e9

Browse files
committed
doc: link to new docs
1 parent ce4fbed commit 64f74e9

File tree

1 file changed

+42
-41
lines changed

1 file changed

+42
-41
lines changed

README.md

+42-41
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ This is a simple library for creating readable dataset pipelines and reusing bes
1010

1111
`Datastream` combines a `Dataset` and a sampler into a stream of examples. It provides a simple solution to oversampling / stratification, weighted sampling, and finally converting to a `torch.utils.data.DataLoader`.
1212

13+
See the [documentation](https://nextml-code.github.io/pytorch-datastream) for more information.
14+
1315
## Install
1416

1517
```bash
@@ -30,23 +32,23 @@ The list below is meant to showcase functions that are useful in most standard a
3032
Dataset.from_subscriptable
3133
Dataset.from_dataframe
3234
Dataset
33-
.map
34-
.subset
35-
.split
36-
.cache
37-
.with_columns
35+
.map
36+
.subset
37+
.split
38+
.cache
39+
.with_columns
3840

3941
Datastream.merge
4042
Datastream.zip
4143
Datastream
42-
.map
43-
.data*loader
44-
.zip_index
45-
.update_weights*
46-
.update*example_weight*
47-
.weight
48-
.state_dict
49-
.load_state_dict
44+
.map
45+
.data*loader
46+
.zip_index
47+
.update_weights*
48+
.update*example_weight*
49+
.weight
50+
.state_dict
51+
.load_state_dict
5052
```
5153

5254
### Simple image dataset example
@@ -71,15 +73,15 @@ image_dir = Path("images")
7173
image_paths = list(image_dir.glob("\*_/_.jpg"))
7274

7375
dataset = (
74-
Dataset.from_paths(
75-
image_paths,
76-
pattern=r".\*/(?P<class_name>\w+)/(?P<image_name>\w+).jpg"
77-
)
78-
.map(lambda row: dict(
79-
image=Image.open(row["path"]),
80-
class_name=row["class_name"],
81-
image_name=row["image_name"],
82-
))
76+
Dataset.from_paths(
77+
image_paths,
78+
pattern=r".\*/(?P<class_name>\w+)/(?P<image_name>\w+).jpg"
79+
)
80+
.map(lambda row: dict(
81+
image=Image.open(row["path"]),
82+
class_name=row["class_name"],
83+
image_name=row["image_name"],
84+
))
8385
)
8486

8587
# Access an item from the dataset
@@ -92,32 +94,31 @@ print(f"Class: {first_item['class_name']}, Image name: {first_item['image_name']
9294

9395
The fruit datastreams given below repeatedly yields the string of its fruit type.
9496

95-
````python
96-
97-
> > > datastream = Datastream.merge([
98-
> > > ... (apple_datastream, 2),
99-
> > > ... (pear_datastream, 1),
100-
> > > ... (banana_datastream, 1),
101-
> > > ... ])
102-
> > > next(iter(datastream.data_loader(batch_size=8)))
103-
> > > ['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']
104-
> > > ```
97+
```python
98+
>>> datastream = Datastream.merge([
99+
>>> ... (apple_datastream, 2),
100+
>>> ... (pear_datastream, 1),
101+
>>> ... (banana_datastream, 1),
102+
>>> ... ])
103+
>>> next(iter(datastream.data_loader(batch_size=8)))
104+
>>> ['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']
105+
>>>
106+
```
105107

106108
### Zip independently sampled datastreams
107109

108110
The fruit datastreams given below repeatedly yields the string of its fruit type.
109111

110112
```python
111-
112-
> > > datastream = Datastream.zip([
113-
> > > ... apple_datastream,
114-
> > > ... Datastream.merge([pear_datastream, banana_datastream]),
115-
> > > ... ])
116-
> > > next(iter(datastream.data_loader(batch_size=4)))
117-
> > > [('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]
118-
> > > ```
113+
>>> datastream = Datastream.zip([
114+
>>> ... apple_datastream,
115+
>>> ... Datastream.merge([pear_datastream, banana_datastream]),
116+
>>> ... ])
117+
>>> next(iter(datastream.data_loader(batch_size=4)))
118+
>>> [('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]
119+
>>>
120+
```
119121

120122
### More usage examples
121123

122124
See the [documentation](https://nextml-code.github.io/pytorch-datastream) for more usage examples.
123-
````

0 commit comments

Comments
 (0)