Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support of metadata #32

Open
untereiner opened this issue Aug 1, 2022 · 9 comments
Open

support of metadata #32

untereiner opened this issue Aug 1, 2022 · 9 comments

Comments

@untereiner
Copy link

untereiner commented Aug 1, 2022

Hi,

I have avro schemas where metadata have been introduced for records.
As written in the doc:

A JSON object, of the form:

{"type": "typeName" ...attributes...}

where typeName is either a primitive or derived type name, as defined below. Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data.

But has someone a idea how I could generate these metadata as part of an avro trait without serializing them ?
Maybe: #[serde(skip)] to skip both serialization and deserialization ?

Whould you accept a PR with such a feature ?

@lerouxrgd
Copy link
Owner

Could you provide a more specific example of such a feature ? With an example schema and the expected generated struct.

@untereiner
Copy link
Author

untereiner commented Aug 6, 2022

Here is an example:

{
	"type": "record",
	"namespace": "Core",
	"name": "Ping",
	"protocol": "0",
	"messageType": "8",
	"senderRole": "client,server",
	"protocolRoles": "client, server",
	"multipartFlag": false,
  
	"fields":
	[
		{ "name": "currentDateTime", "type": "long" }
	]
}

{
	"type": "record",
	"namespace": "Core",
	"name": "Pong",
	"protocol": "0",
	"messageType": "9",
	"senderRole": "client,server",
	"protocolRoles": "client, server",
	"multipartFlag": false,
  
	"fields":
	[
		{ "name": "currentDateTime", "type": "long" }
	]
}

messagetType, senderRole, protocoleRoles, multipartFlag are metadata I think.
They are in the schema. But they are constants. There value cannot change unlike the fields.

I do not exactly know how to represent them in Rust. Maybe something like:

mod Core

struct Ping { 
   currentDateTime: i64
}

impl Ping {
   const messageType: &str = "8";
   const senderRole: &str = "client,server";
   const protocoleRoles: &str = "client,server";
   const multipartFlag: bool = false;
}

struct Pong { 
   currentDateTime: i64
}

impl Pong {
   const messageType: &str = "9";
   const senderRole: &str = "client,server";
   const protocoleRoles: &str = "client,server";
   const multipartFlag: bool = false;
}

And I think it could reopen #23 because I am not sure this case is handled by the schema generation.

@lerouxrgd
Copy link
Owner

lerouxrgd commented Aug 7, 2022

Sadly those are non standard fields, there is no way to know their (potentially nested) type.
Moreover there is no "catch all" variant for such metadata in the underlying apache-avro Schema enum, therefore I don't think that there is a way to handle such a use-case.

@untereiner
Copy link
Author

untereiner commented Aug 11, 2022

I understand your point. These attributes are not part of the avro spec. However their presence in the schema is allowed by the spec.
For their types I think it could be reasonable to limit the list to the same as those of the avro spec.

I have a data exchange protocol using avro schemas that uses this possibility to add constants (no need of fields) at the protocol level.

@martin-g
Copy link
Contributor

As mentioned by @lerouxrgd custom attributes are not supported yet by apache-avro.

We've just had a big head ache due to the new impl for those in the C++ SDK:

At the end we agreed to make the custom attributes' values string-only. The user application could parse the value if needed.
Please create a new JIRA ticket at https://issues.apache.org/jira/browse/AVRO for adding support for custom attributes in the Rust SDK.
A PR with the actual implementation would be awesome too! :-)

@untereiner
Copy link
Author

@martin-g I looked very quickly at those issues. It is mentioned for « at field level ». Is this a still a general implementation for custom attributes at any level ?

I will open a ticket and try an implementation next week.

@martin-g
Copy link
Contributor

According to the spec attributes/metadata could be next to "type", so I understand it both top-level and field-level.

But top-level looks very much like file metadata.
File metadata is supported in Rust SDK 0.14.0+!

@untereiner
Copy link
Author

First a question: the spec calls these: "metadata", so why calling them "custom attributes" instead of metadata in the implementation ?

I do not know what "Object Container Files" are and used for.
They are things:

  • The schema
  • The actual data to be serde-d

For me the metadata are in the schema only because of this from spec:

but must not affect the format of serialized data

@martin-g
Copy link
Contributor

Better ask these questions in the dev@ mailing list.

First a question: the spec calls these: "metadata", so why calling them "custom attributes" instead of metadata in the implementation ?

Not sure, but for me the answer is - consistency with the other SDKs.

I have started working on this and I will create a draft PR soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants