Skip to content

Commit

Permalink
Merge pull request #35 from openweathermap/dev
Browse files Browse the repository at this point in the history
Release v1.1.3
  • Loading branch information
SerGeRybakov authored Dec 29, 2023
2 parents 5db9db8 + 1591bfb commit 4e9008d
Show file tree
Hide file tree
Showing 30 changed files with 1,032 additions and 580 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Deker
# DEKER™

![image](docs/deker/images/logo_50.png)

Expand All @@ -8,22 +8,22 @@
[![codecov](https://codecov.io/gh/openweathermap/deker/branch/main/graph/badge.svg?token=Z040BQWIOR)](https://codecov.io/gh/openweathermap/deker)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Deker is pure Python implementation of petabyte-scale highly parallel data storage engine for
DEKER™ is pure Python implementation of petabyte-scale highly parallel data storage engine for
multidimensional arrays.

Deker name comes from term *dekeract*, the [10-cube](https://en.wikipedia.org/wiki/10-cube).
DEKER™ name comes from term *dekeract*, the [10-cube](https://en.wikipedia.org/wiki/10-cube).

Deker was made with the following major goals in mind:
DEKER™ was made with the following major goals in mind:

* provide intuitive interface for storing and accessing **huge data arrays**
* support **arbitrary number of data dimensions**
* be **thread and process safe** and as **lean on RAM** use as possible

Deker empowers users to store and access a wide range of data types, virtually anything that can be
DEKER™ empowers users to store and access a wide range of data types, virtually anything that can be
represented as arrays, like **geospacial data**, **satellite images**, **machine learning models**,
**sensors data**, graphs, key-value pairs, tabular data, and more.

Deker does not limit your data complexity and size: it supports virtually unlimited number of data
DEKER™ does not limit your data complexity and size: it supports virtually unlimited number of data
dimensions and provides under the hood mechanisms to **partition** huge amounts of data for
**scalability**.

Expand All @@ -40,7 +40,7 @@ dimensions and provides under the hood mechanisms to **partition** huge amounts

## Code and Documentation

Open source implementation of Deker storage engine is published at
Open source implementation of DEKER™ storage engine is published at

* https://github.com/openweathermap/deker

Expand All @@ -52,9 +52,9 @@ API documentation and tutorials for the current release could be found at

### Dependencies

Minimal Python version for Deker is 3.9.
Minimal Python version for DEKER™ is 3.9.

Deker depends on the following third-party packages:
DEKER™ depends on the following third-party packages:

* `numpy` >= 1.18
* `attrs` >= 23.1.0
Expand All @@ -63,7 +63,7 @@ Deker depends on the following third-party packages:
* `h5py` >= 3.8.0
* `hdf5plugin` >= 4.0.1

Also please not that for flexibility few internal Deker components are published as separate
Also please not that for flexibility few internal DEKER™ components are published as separate
packages:

* [`deker-local-adapters`](https://github.com/openweathermap/deker-local-adapters)
Expand All @@ -72,17 +72,17 @@ packages:

### Install

To install Deker run:
To install DEKER™ run:

```bash
pip install deker
```
Please refer to documentation for advanced topics such as running on Apple silicone or using Xarray
with Deker API.
with DEKER™ API.

### First Steps

Now you can write simple script to jump into Deker development:
Now you can write simple script to jump into DEKER™ development:

```python
from deker import Client, ArraySchema, DimensionSchema, TimeDimensionSchema
Expand Down
2 changes: 1 addition & 1 deletion deker/ABC/base_adapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ def create_collection_from_meta(
schema_class = SchemaTypeEnum[collection_data.get("type")].value

try:
dtype = DTypeEnum[data["dtype"].lstrip("numpy.")].value
dtype = DTypeEnum[data["dtype"].split("numpy.")[-1]].value
fill_value = (
dtype(data["fill_value"]) if data["fill_value"] is not None else data["fill_value"]
)
Expand Down
15 changes: 10 additions & 5 deletions deker/ABC/base_schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
from deker.errors import DekerInvalidSchemaError, DekerValidationError
from deker.tools.schema import get_default_fill_value
from deker.types.private.enums import DTypeEnum
from deker.types.private.typings import Numeric
from deker.types.private.typings import Numeric, NumericDtypes


@dataclass(repr=True)
Expand Down Expand Up @@ -90,7 +90,7 @@ class BaseArraysSchema:
dtype: Type[Numeric]
fill_value: Union[Numeric, type(np.nan), None] # type: ignore[valid-type]
dimensions: Union[List[BaseDimensionSchema], Tuple[BaseDimensionSchema, ...]]
attributes: Union[List[BaseAttributeSchema], Tuple[BaseAttributeSchema, ...]]
attributes: Union[List[BaseAttributeSchema], Tuple[BaseAttributeSchema, ...], None]

@property
def primary_attributes(self) -> Optional[Tuple[BaseAttributeSchema, ...]]:
Expand Down Expand Up @@ -121,6 +121,9 @@ def __attrs_post_init__(self) -> None:
if len({d.name for d in self.dimensions}) < len(self.dimensions):
raise DekerValidationError("Dimensions shall have unique names")

if self.dtype not in NumericDtypes:
raise DekerValidationError(f"Invalid dtype {self.dtype}")

try:
if self.dtype == int:
self.dtype = np.int64
Expand Down Expand Up @@ -163,6 +166,10 @@ def named_shape(self) -> Tuple[Tuple[str, int], ...]:
@property
def as_dict(self) -> dict:
"""Serialize as dict."""
error = f'Schema "{self.__class__.__name__}" is invalid/corrupted: '

if self.dtype not in NumericDtypes:
raise DekerInvalidSchemaError(error + f"wrong dtype {self.dtype}")
try:
dtype = DTypeEnum.get_name(DTypeEnum(self.dtype))
fill_value = None if np.isnan(self.fill_value) else str(self.fill_value) # type: ignore[arg-type]
Expand All @@ -174,6 +181,4 @@ def as_dict(self) -> dict:
"fill_value": fill_value,
}
except (KeyError, ValueError) as e:
raise DekerInvalidSchemaError(
f'Schema "{self.__class__.__name__}" is invalid/corrupted: {e}'
)
raise DekerInvalidSchemaError(error + str(e))
16 changes: 11 additions & 5 deletions deker/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@

from deker_tools.data import convert_size_to_human
from deker_tools.path import is_path_valid
from deker_tools.log import set_logger
from psutil import swap_memory, virtual_memory
from tqdm import tqdm

Expand All @@ -42,7 +43,7 @@
)
from deker.integrity import IntegrityChecker
from deker.locks import META_DIVIDER
from deker.log import SelfLoggerMixin, set_logging_level
from deker.log import SelfLoggerMixin, set_logging_level, format_string
from deker.schemas import ArraySchema, VArraySchema
from deker.tools import convert_human_memory_to_bytes
from deker.types import ArrayLockMeta, CollectionLockMeta, LocksExtensions, LocksTypes, StorageSize
Expand Down Expand Up @@ -212,18 +213,23 @@ def __init__(
:param kwargs: a wildcard, reserved for any extra parameters
"""
try:
set_logger(format_string)
set_logging_level(loglevel.upper())
self.__get_plugins()
mem_limit = convert_human_memory_to_bytes(memory_limit)
total_available_mem = virtual_memory().total + swap_memory().total
memory_limit = convert_human_memory_to_bytes(memory_limit)
if memory_limit >= total_available_mem or memory_limit <= 0:
mem_limit = total_available_mem
else:
mem_limit = memory_limit

self.__config = DekerConfig( # type: ignore[call-arg]
uri=uri,
workers=workers if workers is not None else cpu_count() + 4,
write_lock_timeout=write_lock_timeout,
write_lock_check_interval=write_lock_check_interval,
loglevel=loglevel.upper(),
memory_limit=(
virtual_memory().total + swap_memory().total if mem_limit <= 0 else mem_limit
),
memory_limit=mem_limit,
)
self.__uri: Uri = Uri.create(self.__config.uri)
self.__is_closed: bool = True
Expand Down
Loading

0 comments on commit 4e9008d

Please sign in to comment.