Skip to content

Commit c6f758b

Browse files
authored
Merge pull request #62 from polaris-hub/feat/improve-code-examples
Updated code-example to work out-of-the-box
2 parents 6f86530 + 86d7546 commit c6f758b

File tree

11 files changed

+195
-93
lines changed

11 files changed

+195
-93
lines changed

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -49,20 +49,26 @@ This library is a Python client to interact with the [Polaris Hub](https://polar
4949
```python
5050
import polaris as po
5151

52-
# Download a benchmark (the associated dataset will be transparently downloaded)
53-
benchmark = po.load_benchmark("org_or_user/name")
52+
# Load the benchmark from the Hub
53+
benchmark = po.load_benchmark("polaris/hello_world_benchmark")
5454

55-
# Retrieve the splits
55+
# Get the train and test data-loaders
5656
train, test = benchmark.get_train_test_split()
5757

58-
# Work your magic!
59-
y_pred = ...
58+
# Use the training data to train your model
59+
# Get the input as an array with 'train.inputs' and 'train.targets'
60+
# Or simply iterate over the train object.
61+
for x, y in train:
62+
...
6063

61-
# Run the evaluation procedure
62-
results = benchmark.evaluate(y_pred)
64+
# Work your magic to accurately predict the test set
65+
predictions = [0.0 for x in test]
6366

64-
# Upload your results to the hub
65-
results.upload_to_hub()
67+
# Evaluate your predictions
68+
results = benchmark.evaluate(predictions)
69+
70+
# Submit your results
71+
results.upload_to_hub(owner="dummy-user")
6672
```
6773

6874
## Documentation

docs/api/load.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2+
::: polaris.load_dataset
3+
4+
---
5+
6+
::: polaris.load_benchmark
7+
8+
---

docs/quickstart.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,26 @@ If all you care about is to partake in a benchmark that is hosted on the hub, it
2424
```python
2525
import polaris as po
2626

27-
benchmark = po.load_benchmark("org_or_user/name")
27+
# Load the benchmark from the Hub
28+
benchmark = po.load_benchmark("polaris/hello_world_benchmark")
29+
30+
# Get the train and test data-loaders
2831
train, test = benchmark.get_train_test_split()
2932

30-
y_pred = ... # Work your magic!
33+
# Use the training data to train your model
34+
# Get the input as an array with 'train.inputs' and 'train.targets'
35+
# Or simply iterate over the train object.
36+
for x, y in train:
37+
...
38+
39+
# Work your magic to accurately predict the test set
40+
predictions = [0.0 for x in test]
41+
42+
# Evaluate your predictions
43+
results = benchmark.evaluate(predictions)
3144

32-
results = benchmark.evaluate(y_pred)
33-
results.upload_to_hub()
45+
# Submit your results
46+
results.upload_to_hub(owner="dummy-user")
3447
```
3548

3649
That's all there is to it to partake in a benchmark. No complicated, custom data-loaders or evaluation protocol. With just a few lines of code, you can feel confident that you are properly evaluating your model and focus on what you do best: Solving the hard problems in our domain!

docs/tutorials/basics.ipynb

Lines changed: 12 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
"name": "stderr",
6464
"output_type": "stream",
6565
"text": [
66-
"\u001b[32m2023-11-06 17:37:18.375\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mpolaris.hub.client\u001b[0m:\u001b[36mlogin\u001b[0m:\u001b[36m262\u001b[0m - \u001b[1mYou are already logged in to the Polaris Hub as lu-valencelabs (lu@valencediscovery.com). Set `overwrite=True` to force re-authentication.\u001b[0m\n"
66+
"\u001b[32m2023-11-27 14:54:08.788\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mpolaris.hub.client\u001b[0m:\u001b[36mlogin\u001b[0m:\u001b[36m262\u001b[0m - \u001b[1mYou are already logged in to the Polaris Hub as cwognum (cas@valencediscovery.com). Set `overwrite=True` to force re-authentication.\u001b[0m\n"
6767
]
6868
}
6969
],
@@ -285,7 +285,7 @@
285285
{
286286
"data": {
287287
"text/html": [
288-
"<table border=\"1\"><tr><th>name</th><td>None</td></tr><tr><th>description</th><td></td></tr><tr><th>tags</th><td></td></tr><tr><th>user_attributes</th><td></td></tr><tr><th>owner</th><td>None</td></tr><tr><th>benchmark_name</th><td>hello_world_benchmark</td></tr><tr><th>benchmark_owner</th><td><table border=\"1\"><tr><th>slug</th><td>polaris</td></tr><tr><th>organization_id</th><td>org_2WG9hRFgKNIRtGw4orsMPcr1F4S</td></tr><tr><th>user_id</th><td>None</td></tr><tr><th>owner</th><td>org_2WG9hRFgKNIRtGw4orsMPcr1F4S</td></tr></table></td></tr><tr><th>github_url</th><td>None</td></tr><tr><th>paper_url</th><td>None</td></tr><tr><th>contributors</th><td>None</td></tr><tr><th>results</th><td><table border=\"1\"><thead><tr><th>Test set</th><th>Target label</th><th>Metric</th><th>Score</th></tr></thead><tbody><tr><td>test</td><td>SOL</td><td>mean_squared_error</td><td>2.6875139821</td></tr><tr><td>test</td><td>SOL</td><td>mean_absolute_error</td><td>1.2735690161</td></tr></tbody></table></td></tr></table>"
288+
"<table border=\"1\"><tr><th>name</th><td>None</td></tr><tr><th>description</th><td></td></tr><tr><th>tags</th><td></td></tr><tr><th>user_attributes</th><td></td></tr><tr><th>owner</th><td>None</td></tr><tr><th>benchmark_name</th><td>hello_world_benchmark</td></tr><tr><th>benchmark_owner</th><td><table border=\"1\"><tr><th>slug</th><td>polaris</td></tr><tr><th>external_id</th><td>org_2WG9hRFgKNIRtGw4orsMPcr1F4S</td></tr><tr><th>type</th><td>organization</td></tr></table></td></tr><tr><th>github_url</th><td>None</td></tr><tr><th>paper_url</th><td>None</td></tr><tr><th>contributors</th><td>None</td></tr><tr><th>artifact_id</th><td>None</td></tr><tr><th>benchmark_artifact_id</th><td>polaris/hello-world-benchmark</td></tr><tr><th>results</th><td><table border=\"1\"><thead><tr><th>Test set</th><th>Target label</th><th>Metric</th><th>Score</th></tr></thead><tbody><tr><td>test</td><td>SOL</td><td>mean_squared_error</td><td>2.6875139821</td></tr><tr><td>test</td><td>SOL</td><td>mean_absolute_error</td><td>1.2735690161</td></tr></tbody></table></td></tr></table>"
289289
],
290290
"text/plain": [
291291
"{\n",
@@ -297,13 +297,14 @@
297297
" \"benchmark_name\": \"hello_world_benchmark\",\n",
298298
" \"benchmark_owner\": {\n",
299299
" \"slug\": \"polaris\",\n",
300-
" \"organization_id\": \"org_2WG9hRFgKNIRtGw4orsMPcr1F4S\",\n",
301-
" \"user_id\": null,\n",
302-
" \"owner\": \"org_2WG9hRFgKNIRtGw4orsMPcr1F4S\"\n",
300+
" \"external_id\": \"org_2WG9hRFgKNIRtGw4orsMPcr1F4S\",\n",
301+
" \"type\": \"organization\"\n",
303302
" },\n",
304303
" \"github_url\": null,\n",
305304
" \"paper_url\": null,\n",
306305
" \"contributors\": null,\n",
306+
" \"artifact_id\": null,\n",
307+
" \"benchmark_artifact_id\": \"polaris/hello-world-benchmark\",\n",
307308
" \"results\": [\n",
308309
" {\n",
309310
" \"Test set\": \"test\",\n",
@@ -341,7 +342,7 @@
341342
},
342343
{
343344
"cell_type": "code",
344-
"execution_count": 17,
345+
"execution_count": 15,
345346
"id": "a601f415-c563-4efe-94c3-0d44f3fd6576",
346347
"metadata": {},
347348
"outputs": [],
@@ -362,7 +363,7 @@
362363
},
363364
{
364365
"cell_type": "code",
365-
"execution_count": 18,
366+
"execution_count": 16,
366367
"id": "60cbf4b9-8514-480d-beda-8a50e5f7c9a6",
367368
"metadata": {
368369
"scrolled": true
@@ -372,16 +373,16 @@
372373
"name": "stderr",
373374
"output_type": "stream",
374375
"text": [
375-
"/Users/lu.zhu/miniconda3/envs/pov3/lib/python3.11/site-packages/pydantic/main.py:309: UserWarning: Pydantic serializer warnings:\n",
376+
"/home/cas/micromamba/envs/polaris/lib/python3.12/site-packages/pydantic/main.py:308: UserWarning: Pydantic serializer warnings:\n",
376377
" Expected `url` but got `str` - serialized value may not be as expected\n",
377378
" Expected `url` but got `str` - serialized value may not be as expected\n",
378379
" return self.__pydantic_serializer__.to_python(\n",
379-
"\u001b[32m2023-11-06 17:38:06.152\u001b[0m | \u001b[32m\u001b[1mSUCCESS \u001b[0m | \u001b[36mpolaris.hub.client\u001b[0m:\u001b[36mupload_results\u001b[0m:\u001b[36m413\u001b[0m - \u001b[32m\u001b[1mYour result has been successfully uploaded to the Hub. View it here: https://polarishub.io/benchmarks/polaris/hello_world_benchmark/YYH033LKM1BaT8byAC5Jc\u001b[0m\n"
380+
"\u001b[32m2023-11-27 14:54:46.649\u001b[0m | \u001b[32m\u001b[1mSUCCESS \u001b[0m | \u001b[36mpolaris.hub.client\u001b[0m:\u001b[36mupload_results\u001b[0m:\u001b[36m428\u001b[0m - \u001b[32m\u001b[1mYour result has been successfully uploaded to the Hub. View it here: https://polarishub.io/benchmarks/polaris/hello_world_benchmark/ns4JrC3hQNK9M1hbVPchy\u001b[0m\n"
380381
]
381382
}
382383
],
383384
"source": [
384-
"client.upload_results(results)\n",
385+
"client.upload_results(results, owner=\"cwognum\")\n",
385386
"client.close()"
386387
]
387388
},
@@ -396,14 +397,6 @@
396397
"\n",
397398
"---"
398399
]
399-
},
400-
{
401-
"cell_type": "code",
402-
"execution_count": null,
403-
"id": "0868ff53-7a42-4e4c-bae4-29fb04c513c7",
404-
"metadata": {},
405-
"outputs": [],
406-
"source": []
407400
}
408401
],
409402
"metadata": {
@@ -422,7 +415,7 @@
422415
"name": "python",
423416
"nbconvert_exporter": "python",
424417
"pygments_lexer": "ipython3",
425-
"version": "3.11.5"
418+
"version": "3.12.0"
426419
}
427420
},
428421
"nbformat": 4,

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ nav:
2222
- Custom Datasets and Benchmarks: tutorials/custom_dataset_benchmark.ipynb
2323
# - Creating Datasets with zarr: tutorials/dataset_zarr.ipynb
2424
- API Reference:
25+
- Load: api/load.md
2526
- Core:
2627
- Dataset: api/dataset.md
2728
- Benchmark: api/benchmark.md

polaris/benchmark/_base.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
from polaris.utils.dict2html import dict2html
2323
from polaris.utils.errors import InvalidBenchmarkError, PolarisChecksumError
2424
from polaris.utils.misc import listit
25-
from polaris.utils.types import AccessType, DataFormat, PredictionsType, SplitType
25+
from polaris.utils.types import AccessType, DataFormat, HubOwner, PredictionsType, SplitType
2626

2727
ColumnsType = Union[str, list[str]]
2828

@@ -371,6 +371,7 @@ def upload_to_hub(
371371
settings: Optional[PolarisHubSettings] = None,
372372
cache_auth_token: bool = True,
373373
access: Optional[AccessType] = "private",
374+
owner: Optional[Union[HubOwner, str]] = None,
374375
**kwargs: dict,
375376
):
376377
"""
@@ -382,7 +383,7 @@ def upload_to_hub(
382383
with PolarisHubClient(
383384
env_file=env_file, settings=settings, cache_auth_token=cache_auth_token, **kwargs
384385
) as client:
385-
return client.upload_benchmark(self, access)
386+
return client.upload_benchmark(self, access=access, owner=owner)
386387

387388
def to_json(self, destination: str) -> str:
388389
"""Save the benchmark to a destination directory as a JSON file.

polaris/dataset/_dataset.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
from polaris.utils.dict2html import dict2html
2424
from polaris.utils.errors import InvalidDatasetError, PolarisChecksumError
2525
from polaris.utils.io import get_zarr_root, robust_copy
26-
from polaris.utils.types import AccessType, HttpUrlString, License
26+
from polaris.utils.types import AccessType, HttpUrlString, HubOwner, License
2727

2828
# Constants
2929
_SUPPORTED_TABLE_EXTENSIONS = ["parquet"]
@@ -201,6 +201,7 @@ def upload_to_hub(
201201
settings: Optional[PolarisHubSettings] = None,
202202
cache_auth_token: bool = True,
203203
access: Optional[AccessType] = "private",
204+
owner: Optional[Union[HubOwner, str]] = None,
204205
**kwargs: dict,
205206
):
206207
"""
@@ -212,7 +213,7 @@ def upload_to_hub(
212213
with PolarisHubClient(
213214
env_file=env_file, settings=settings, cache_auth_token=cache_auth_token, **kwargs
214215
) as client:
215-
return client.upload_dataset(self, access)
216+
return client.upload_dataset(self, access=access, owner=owner)
216217

217218
@classmethod
218219
def from_zarr(cls, path: str) -> "Dataset":

polaris/evaluate/_results.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,7 @@ def upload_to_hub(
182182
settings: Optional[PolarisHubSettings] = None,
183183
cache_auth_token: bool = True,
184184
access: Optional[AccessType] = "private",
185+
owner: Optional[Union[HubOwner, str]] = None,
185186
**kwargs: dict,
186187
):
187188
"""
@@ -193,7 +194,7 @@ def upload_to_hub(
193194
with PolarisHubClient(
194195
env_file=env_file, settings=settings, cache_auth_token=cache_auth_token, **kwargs
195196
) as client:
196-
return client.upload_results(self, access)
197+
return client.upload_results(self, access=access, owner=owner)
197198

198199
def _repr_dict_(self) -> dict:
199200
"""Utility function for pretty-printing to the command line and jupyter notebooks"""

polaris/hub/client.py

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -373,7 +373,12 @@ def get_benchmark(self, owner: Union[str, HubOwner], name: str) -> BenchmarkSpec
373373
)
374374
return benchmark_cls(**response)
375375

376-
def upload_results(self, results: BenchmarkResults, access: AccessType = "private"):
376+
def upload_results(
377+
self,
378+
results: BenchmarkResults,
379+
access: AccessType = "private",
380+
owner: Optional[Union[HubOwner, str]] = None,
381+
):
377382
"""Upload the results to the Polaris Hub.
378383
379384
Info: Owner
@@ -395,9 +400,19 @@ def upload_results(self, results: BenchmarkResults, access: AccessType = "privat
395400
Args:
396401
results: The results to upload.
397402
access: Grant public or private access to result
403+
owner: Which Hub user or organization owns the artifact.
404+
Optional if and only if the `benchmark.owner` attribute is set.
398405
"""
399406

400407
# Get the serialized model data-structure
408+
409+
if results.owner is None:
410+
if owner is None:
411+
raise ValueError(
412+
"The `owner` argument must be specified if the `results.owner` attribute is not set."
413+
)
414+
results.owner = owner if isinstance(owner, HubOwner) else HubOwner(slug=owner)
415+
401416
result_json = results.model_dump(by_alias=True, exclude_none=True)
402417

403418
# Make a request to the hub
@@ -414,7 +429,11 @@ def upload_results(self, results: BenchmarkResults, access: AccessType = "privat
414429
return response
415430

416431
def upload_dataset(
417-
self, dataset: Dataset, access: AccessType = "private", timeout: TimeoutTypes = (10, 200)
432+
self,
433+
dataset: Dataset,
434+
access: AccessType = "private",
435+
timeout: TimeoutTypes = (10, 200),
436+
owner: Optional[Union[HubOwner, str]] = None,
418437
):
419438
"""Upload the dataset to the Polaris Hub.
420439
@@ -432,8 +451,21 @@ def upload_dataset(
432451
dataset: The dataset to upload.
433452
access: Grant public or private access to result
434453
timeout: Request timeout values. User can modify the value when uploading large dataset as needed.
454+
This can be a single value with the timeout in seconds for all IO operations, or a more granular
455+
tuple with (connect_timeout, write_timeout). The type of the the timout parameter comes from `httpx`.
456+
Since datasets can get large, it might be needed to increase the write timeout for larger datasets.
457+
See also: https://www.python-httpx.org/advanced/#timeout-configuration
458+
owner: Which Hub user or organization owns the artifact.
459+
Optional if and only if the `benchmark.owner` attribute is set.
435460
"""
436461

462+
if dataset.owner is None:
463+
if owner is None:
464+
raise ValueError(
465+
"The `owner` argument must be specified if the `dataset.owner` attribute is not set."
466+
)
467+
dataset.owner = owner if isinstance(owner, HubOwner) else HubOwner(slug=owner)
468+
437469
# Get the serialized data-model
438470
# We exclude the table as it handled separately and the cache_dir as it is user-specific
439471
dataset_json = dataset.model_dump(exclude={"cache_dir", "table"}, exclude_none=True, by_alias=True)
@@ -500,7 +532,12 @@ def upload_dataset(
500532

501533
return response
502534

503-
def upload_benchmark(self, benchmark: BenchmarkSpecification, access: AccessType = "private"):
535+
def upload_benchmark(
536+
self,
537+
benchmark: BenchmarkSpecification,
538+
access: AccessType = "private",
539+
owner: Optional[Union[HubOwner, str]] = None,
540+
):
504541
"""Upload the benchmark to the Polaris Hub.
505542
506543
Info: Owner
@@ -520,7 +557,15 @@ def upload_benchmark(self, benchmark: BenchmarkSpecification, access: AccessType
520557
Args:
521558
benchmark: The benchmark to upload.
522559
access: Grant public or private access to result
560+
owner: Which Hub user or organization owns the artifact.
561+
Optional if and only if the `benchmark.owner` attribute is set.
523562
"""
563+
if benchmark.owner is None:
564+
if owner is None:
565+
raise ValueError(
566+
"The `owner` argument must be specified if the `benchmark.owner` attribute is not set."
567+
)
568+
benchmark.owner = owner if isinstance(owner, HubOwner) else HubOwner(slug=owner)
524569

525570
# Get the serialized data-model
526571
# We exclude the dataset as we expect it to exist on the hub already.

0 commit comments

Comments
 (0)