Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Binary storage of selected types. #27

Merged
merged 22 commits into from
Nov 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d40a6bc
Database: Store IP addresses as packed binary.
xsedla1o Oct 25, 2024
5c25b90
Config: Can specify EID data type.
xsedla1o Nov 5, 2024
ac4957b
Support validation and enforce eid data_type.
xsedla1o Nov 6, 2024
f81080c
Added MAC address type as eid type and binary subtype.
xsedla1o Nov 8, 2024
464724e
API & DB: Search and replace for generic filter.
xsedla1o Nov 6, 2024
cb8befb
DB: Separate modules for handling snapshot collection logic.
xsedla1o Nov 7, 2024
97e17e8
DB: Use modularized snapshots.
xsedla1o Nov 8, 2024
fdee8d9
Config: Fix Link destination eid data type.
xsedla1o Nov 12, 2024
1265fab
Snapshot: Fix types in entity tuple messages.
xsedla1o Nov 12, 2024
f93af1f
Fix typehints and docstrings.
xsedla1o Nov 12, 2024
169ddf3
Ensure runtime type checking.
xsedla1o Nov 12, 2024
50ab7ce
DB: Fix deleting and Link cache access.
xsedla1o Nov 13, 2024
32027b7
API: Fix serialization of nested types.
xsedla1o Nov 14, 2024
e73efdf
Docs: Updated info about types.
xsedla1o Nov 18, 2024
c6386c8
SchemaUpdate: Bumped schema version, added migration.
xsedla1o Nov 18, 2024
5ba3dda
Docs: Updated list entities endpoint doc.
xsedla1o Nov 18, 2024
66e658d
Chore: Bump Ruff version.
xsedla1o Nov 18, 2024
6fdf16f
API tests: Loosen timing requirements.
xsedla1o Nov 18, 2024
59f1bd9
API tests: Reduce debug logging.
xsedla1o Nov 18, 2024
6a9b3c3
Tests: add missing terminating newlines
dbnk0 Nov 19, 2024
ed9522a
Tests: Dump logs on failure
xsedla1o Nov 19, 2024
251b5bc
Tests: Fixed flaky test.
xsedla1o Nov 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ jobs:
--network container:dp3_api
dp3_interpreter python -m unittest discover -s tests/test_api -v

- name: Dump Logs on Failure
if: failure()
run: docker compose logs

- name: Check worker errors
run: docker compose logs worker | grep "WARNING\|ERROR\|exception" | grep -v "RabbitMQ\|it's\ OK\ now,\ we're\ successfully\ connected" || true

Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ repos:
- id: black

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: 'v0.6.1'
rev: 'v0.7.4'
hooks:
- id: ruff
args: [ "--fix", "." ]
31 changes: 25 additions & 6 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,20 +150,20 @@ More details depends on the particular type of the attribute.
Can be represented using both **plain** attributes and **observations**. The difference will be only
in time specification. Two examples using observations:

**no data - `link<mac>`**: just the eid is sent
**no data - `link<mac>`**: Sent as a dictionary with a single `"eid"` key.

```json
{
"type": "ip",
"id": "192.168.0.1",
"attr": "mac_addrs",
"v": "AA:AA:AA:AA:AA",
"v": {"eid": "AA:AA:AA:AA:AA"},
"t1": "2022-08-01T12:00:00",
"t2": "2022-08-01T12:10:00"
}
```

**with additional data - `link<ip, int>`**: The eid and the data are sent as a dictionary.
**with additional data - `link<ip, int>`**: Sent as a dictionary with `"eid"` and `"data"` keys.

```json
{
Expand Down Expand Up @@ -198,11 +198,27 @@ v -> some_embedded_dict_field

## List entities

List latest snapshots of all ids present in database under entity type.
List latest snapshots of all ids present in database under entity type,
filtered by `generic_filter` and `fulltext_filters`.
Contains only the latest snapshot per entity.

Contains only latest snapshot.
Uses pagination, default limit is 20, setting to 0 will return all results.

Uses pagination.
Fulltext filters are interpreted as regular expressions.
Only string values may be filtered this way. There's no validation that queried attribute
can be fulltext filtered.
Only plain and observation attributes with string-based data types can be queried.
Array and set data types are supported as well as long as they are not multi value
at the same time.
If you need to filter EIDs, use attribute `eid`.

Generic filter allows filtering using generic MongoDB query (including `$and`, `$or`,`$lt`, etc.).
For querying non-JSON-native types, you can use the following magic strings,
as are defined by the search & replace [`magic`][dp3.database.magic] module.

There are no attribute name checks (may be added in the future).

Generic and fulltext filters are merged - fulltext overrides conflicting keys.

### Request

Expand All @@ -212,6 +228,8 @@ Uses pagination.

- skip: how many entities to skip (default: 0)
- limit: how many entities to return (default: 20)
- fulltext_filters: dictionary of fulltext filters (default: no filters)
- generic_filter: dictionary of generic filters (default: no filters)

### Response

Expand Down Expand Up @@ -410,6 +428,7 @@ Returns dictionary containing all entity types configured -- their simplified co
{
"<entity_id>": {
"id": "<entity_id>",
"id_data_type": "<entity_spec.id_data_type>",
"name": "<entity_spec.name>",
"attribs": "<MODEL_SPEC.attribs(e_id)>",
"eid_estimate_count": "<DB.estimate_count_eids(e_id)>"
Expand Down
19 changes: 16 additions & 3 deletions docs/configuration/db_entities.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,23 @@ Entity is described simply by:
| Parameter | Data-type | Default value | Description |
|----------------|---------------------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| **`id`** | string (identifier) | *(mandatory)* | Short string identifying the entity type, it's machine name (must match regex `[a-zA-Z_][a-zA-Z0-9_-]*`). Lower-case only is recommended. |
| `id_data_type` | string | "string" | Data type of the entity id (`eid`) value, see [Supported eid data types](#supported-entity-id-data-types). |
| **`name`** | string | *(mandatory)* | Attribute name for humans. May contain any symbols. |
| **`snapshot`** | bool | *(mandatory)* | Whether to create snapshots of the entity. See [Architecture](../architecture.md#data-flow) for more details. |
| `lifetime` | `Lifetime Spec` | `Immortal Lifetime` | Defines the lifetime of the entitiy, entities are never deleted by default. See the [Entity Lifetimes](lifetimes.md) for details. |

### Supported entity id data types

Only a subset of [primitive data types](#primitive-types) is supported for entity ids. The supported data types are:

- `string` (default)
- `int`: 32-bit signed integer (range from -2147483648 to +2147483647)
- `ipv4`: IPv4 address, represented as [IPv4Address][ipaddress.IPv4Address] (passed as dotted-decimal string)
- `ipv6`: IPv6 address, represented as [IPv6Address][ipaddress.IPv6Address] (passed as string in short or full format)
- `mac`: MAC address, represented as [MACAddress][dp3.common.mac_address.MACAddress] (passed as string)

Whenever writing a piece of code independent of a specific configuration,
the [`AnyEidT`][dp3.common.datatype.AnyEidT] type alias should be used.

## Attributes

Expand Down Expand Up @@ -196,9 +209,9 @@ List of supported values for parameter `data_type`:
- `int64`: 64-bit signed integer (use when the range of normal `int` is not sufficent)
- `float`
- `time`: Timestamp in `YYYY-MM-DD[T]HH:MM[:SS[.ffffff]][Z or [±]HH[:]MM]` format or timestamp since 1.1.1970 in seconds or milliseconds.
- `ip4`: IPv4 address (passed as dotted-decimal string)
- `ip6`: IPv6 address (passed as string in short or full format)
- `mac`: MAC address (passed as string)
- `ipv4`: IPv4 address, represented as [IPv4Address][ipaddress.IPv4Address] (passed as dotted-decimal string)
- `ipv6`: IPv6 address, represented as [IPv6Address][ipaddress.IPv6Address] (passed as string in short or full format)
- `mac`: MAC address, represented as [MACAddress][dp3.common.mac_address.MACAddress] (passed as string)
- `json`: Any JSON object can be stored, all processing is handled by user's code. This is here for special cases which can't be mapped to any other data type.

#### Composite types
Expand Down
67 changes: 55 additions & 12 deletions docs/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,19 @@ and [`on_new_attr`](#attribute-hooks) callbacks,
and you can enable it by passing the `refresh` keyword argument
to the callback registration. See the Callbacks section for more details.

## Type of `eid`

!!! tip "Specifying the `eid` type"

At runtime, the `eid` will be exactly the type as specified in the entity specification.

All the examples on this page will show the `eid` as a string, as that is the default type.
The type of the `eid` is can be configured in the entity specification, as is
detailed [here](configuration//db_entities.md#entity).

The typehint of the `eid` used in callback registration definitions is the [
`AnyEidT`][dp3.common.datatype.AnyEidT] type, which is a type alias of Union of all the allowed
types of the `eid` in the entity specification.

## Callbacks

Expand Down Expand Up @@ -184,27 +197,36 @@ registrar.register_task_hook("on_task_start", task_hook)
Receives eid and Task, may prevent entity record creation (by returning False).
The callback is registered using the
[`register_allow_entity_creation_hook`][dp3.common.callback_registrar.CallbackRegistrar.register_allow_entity_creation_hook] method.
Required signature is `Callable[[str, DataPointTask], bool]`.
Required signature is `Callable[[AnyEidT, DataPointTask], bool]`.

```python
def entity_creation(eid: str, task: DataPointTask) -> bool:
def entity_creation(
eid: str, # (1)!
task: DataPointTask,
) -> bool:
return eid.startswith("1")

registrar.register_allow_entity_creation_hook(
entity_creation, "test_entity_type"
)
```

1. `eid` may not be string, depending on the entity configuration, see [Type of
`eid`](#type-of-eid).

#### Entity `on_entity_creation` hook

Receives eid and Task, may return new DataPointTasks.

Callbacks which are called once when an entity is created are registered using the
[`register_on_entity_creation_hook`][dp3.common.callback_registrar.CallbackRegistrar.register_on_entity_creation_hook] method.
Required signature is `Callable[[str, DataPointTask], list[DataPointTask]]`.
Required signature is `Callable[[AnyEidT, DataPointTask], list[DataPointTask]]`.

```python
def processing_function(eid: str, task: DataPointTask) -> list[DataPointTask]:
def processing_function(
eid: str, # (1)!
task: DataPointTask
) -> list[DataPointTask]:
output = does_work(task)
return [DataPointTask(
model_spec=task.model_spec,
Expand All @@ -224,6 +246,9 @@ registrar.register_on_entity_creation_hook(
)
```

1. `eid` may not be string, depending on the entity configuration, see [Type of
`eid`](#type-of-eid).

The `register_on_entity_creation_hook` method also allows for refreshing of values derived
by the registered hook. This can be done using the `refresh` keyword argument, (expecting a [`SharedFlag`][dp3.common.state.SharedFlag] object, which is created by default for all modules)
and the `may_change` keyword argument, which lists all the attributes that the hook may change.
Expand All @@ -243,10 +268,13 @@ registrar.register_on_entity_creation_hook(
Callbacks that are called on every incoming datapoint of an attribute are registered using the
[`register_on_new_attr_hook`][dp3.common.callback_registrar.CallbackRegistrar.register_on_new_attr_hook] method.
The callback allways receives eid, attribute and Task, and may return new DataPointTasks.
The required signature is `Callable[[str, DataPointBase], Union[None, list[DataPointTask]]]`.
The required signature is `Callable[[AnyEidT, DataPointBase], Union[None, list[DataPointTask]]]`.

```python
def attr_hook(eid: str, dp: DataPointBase) -> list[DataPointTask]:
def attr_hook(
eid: str, # (1)!
dp: DataPointBase,
) -> list[DataPointTask]:
...
return []

Expand All @@ -255,6 +283,9 @@ registrar.register_on_new_attr_hook(
)
```

1. `eid` may not be string, depending on the entity configuration, see [Type of
`eid`](#type-of-eid).

This hook can be refreshed on configuration changes if you feel like the attribute value may change too slowly
to catch up naturally.
This can be done using the `refresh` keyword argument, (expecting a [`SharedFlag`][dp3.common.state.SharedFlag] object, which is created by default for all modules)
Expand Down Expand Up @@ -290,7 +321,6 @@ def timeseries_hook(
) -> list[DataPointTask]:
...
return []


registrar.register_timeseries_hook(
timeseries_hook, "test_entity_type", "test_attr_type",
Expand Down Expand Up @@ -371,8 +401,8 @@ Learn more about the updater module in the [updater configuration](configuration
#### Periodic Update Hook

The [`register_periodic_update_hook`][dp3.common.callback_registrar.CallbackRegistrar.register_periodic_update_hook]
method expects a callable with the following signature:
`Callable[[str, str, dict], list[DataPointTask]]`, where the arguments are the entity type,
method expects a callable with the following signature:
`Callable[[str, AnyEidT, dict], list[DataPointTask]]`, where the arguments are the entity type,
entity ID and master record.
The callable should return a list of DataPointTask objects to perform (possibly empty).

Expand All @@ -382,7 +412,11 @@ The following example shows how to register a periodic update hook for an entity
The hook will be called for all entities of this type every day.

```python
def periodic_update_hook(entity_type: str, eid: str, record: dict) -> list[DataPointTask]:
def periodic_update_hook(
entity_type: str,
eid: str, # (1)!
record: dict,
) -> list[DataPointTask]:
...
return []

Expand All @@ -391,6 +425,9 @@ registrar.register_periodic_update_hook(
)
```

1. `eid` may not be string, depending on the entity configuration, see [Type of
`eid`](#type-of-eid).

!!! warning "Set a Realistic Update Period"

Try to configure the period to match the real execution time of the registered hooks,
Expand All @@ -405,12 +442,15 @@ This hook is useful when the entity record is not needed for the update, meaning

The [`register_periodic_eid_update_hook`][dp3.common.callback_registrar.CallbackRegistrar.register_periodic_eid_update_hook]
method expects a callable with the following signature:
`Callable[[str, str], list[DataPointTask]]`, where the first argument is the entity type and the second is the entity ID.
`Callable[[str, AnyEidT], list[DataPointTask]]`, where the first argument is the entity type and the second is the entity ID.
The callable should return a list of DataPointTask objects to perform (possibly empty).
All other arguments are the same as for the [periodic update hook](#periodic-update-hook).

```python
def periodic_eid_update_hook(entity_type: str, eid: str) -> list[DataPointTask]:
def periodic_eid_update_hook(
entity_type: str,
eid: str, # (1)!
) -> list[DataPointTask]:
...
return []

Expand All @@ -419,6 +459,9 @@ registrar.register_periodic_eid_update_hook(
)
```

1. `eid` may not be string, depending on the entity configuration, see [Type of
`eid`](#type-of-eid).

## Running module code in a separate thread

The module is free to run its own code in separate threads or processes.
Expand Down
6 changes: 5 additions & 1 deletion dp3/api/internal/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
import sys
from enum import Enum

from pydantic import BaseModel, ValidationError, field_validator
from pydantic import (
BaseModel,
ValidationError,
field_validator,
)

from dp3.api.internal.dp_logger import DPLogger
from dp3.common.config import ModelSpec, read_config_dir
Expand Down
39 changes: 26 additions & 13 deletions dp3/api/internal/entity_response_models.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from datetime import datetime
from typing import Annotated, Any, Optional
from typing import Annotated, Any, Optional, Union

from pydantic import BaseModel, Field, NonNegativeInt, RootModel
from pydantic import BaseModel, Field, NonNegativeInt, PlainSerializer

from dp3.common.attrspec import AttrSpecType, AttrType
from dp3.common.datapoint import to_json_friendly


class EntityState(BaseModel):
Expand All @@ -14,11 +15,28 @@ class EntityState(BaseModel):
"""

id: str
id_data_type: str
name: str
attribs: dict[str, Annotated[AttrSpecType, Field(discriminator="type")]]
eid_estimate_count: NonNegativeInt


# This is necessary to allow for non-JSON-serializable types in the model
JsonVal = Annotated[Any, PlainSerializer(to_json_friendly, when_used="json")]

LinkVal = dict[str, JsonVal]
PlainVal = Union[LinkVal, JsonVal]
MultiVal = list[PlainVal]
HistoryVal = list[dict[str, PlainVal]]

Dp3Val = Union[HistoryVal, MultiVal, PlainVal]

EntityEidMasterRecord = dict[str, Dp3Val]

SnapshotType = dict[str, Dp3Val]
EntityEidSnapshots = list[SnapshotType]


class EntityEidList(BaseModel):
"""List of entity eids and their data based on latest snapshot

Expand All @@ -31,12 +49,7 @@ class EntityEidList(BaseModel):
time_created: Optional[datetime] = None
count: int
total_count: int
data: list[dict]


EntityEidMasterRecord = RootModel[dict]

EntityEidSnapshots = RootModel[list[dict]]
data: EntityEidSnapshots


class EntityEidData(BaseModel):
Expand All @@ -48,8 +61,8 @@ class EntityEidData(BaseModel):
"""

empty: bool
master_record: dict
snapshots: list[dict]
master_record: EntityEidMasterRecord
snapshots: EntityEidSnapshots


class EntityEidAttrValueOrHistory(BaseModel):
Expand All @@ -62,8 +75,8 @@ class EntityEidAttrValueOrHistory(BaseModel):
"""

attr_type: AttrType
current_value: Any = None
history: list[dict] = []
current_value: Dp3Val = None
history: HistoryVal = []


class EntityEidAttrValue(BaseModel):
Expand All @@ -72,4 +85,4 @@ class EntityEidAttrValue(BaseModel):
The value is fetched from master record.
"""

value: Any = None
value: JsonVal = None
Loading