Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3601: CustomAttributes#getAttribute() now returns boost::optional #1826

Merged
merged 2 commits into from
Aug 15, 2022

Conversation

martin-g
Copy link
Member

Add unit tests for CustomAttributes#getAttribute(string)

Jira

Tests

  • My PR adds unit tests

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

Add unit tests for CustomAttributes#getAttribute(string)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g martin-g requested a review from thiru-mg August 12, 2022 08:43
@github-actions github-actions bot added the C++ Pull Requests for C++ binding label Aug 12, 2022
Comment on lines +460 to +466
"[{\"name\": \"f1\", \"type\": \"long\", "
"\"arrayField\": \"[1]\", "
"\"booleanField\": \"true\", "
"\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", "
"\"nullField\": \"null\", "
"\"numberField\": \"1.23\", "
"\"stringField\": \"\\\"field value with \\\"double quotes\\\"\\\"\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in #1821 (comment), I think this should be

Suggested change
"[{\"name\": \"f1\", \"type\": \"long\", "
"\"arrayField\": \"[1]\", "
"\"booleanField\": \"true\", "
"\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", "
"\"nullField\": \"null\", "
"\"numberField\": \"1.23\", "
"\"stringField\": \"\\\"field value with \\\"double quotes\\\"\\\"\""
"[{\"name\": \"f1\", \"type\": \"long\", "
"\"arrayField\": [1], "
"\"booleanField\": true, "
"\"mapField\": {\"key1\":\"value1\", \"key2\":\"value2\"}, "
"\"nullField\": null, "
"\"numberField\": 1.23, "
"\"stringField\": \"field value with \\\"double quotes\\\"\""

i.e. CustomAttributes.printJson should assume that the std::string values are already in JSON format, and write them out without adding any quotation marks around them or backslashes within them. Likewise, callers of CustomAttributes::addAttribute (especially in Compiler.cc) should provide a JSON-format std::string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CustomAttributes worked that way, then it would be able to use just std::string rather than boost::optional<std::string>, because an empty std::string could mean that the attribute is not present, while an std::string containing two quotation marks "" would mean that the value is an empty JSON string literal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the JSON representation of custom attributes be compatible with Avro IDL? The IDL Language spec is not clear on whether the thing between parentheses in an annotation is always a JSON value.

Copy link
Member Author

@martin-g martin-g Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the content is preserved as whatever the user provided. It could be JSON, XML, base64, ...
It is up-to the user app to encode/decode the values.
You might be right about the non-optional representation ("") but IMO this way it is more clear. Other opinions are also welcome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an Avro schema file, must all custom attributes of fields have string values? I.e. is this invalid:

{
    "type": "record",
    "name": "Demo",
    "fields": [
        {
            "name": "field",
            "type": "string",
            "custom_flag": true
        }
    ]
}

If this schema is not invalid, then is the Avro C++ library able to load it from a file and then write it to another file, preserving the custom attribute?

Copy link
Member Author

@martin-g martin-g Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Avro spec does not say anything about the possible value types of the custom attributes/metadata.

Until AVRO-3547 the C++ SDK didn't support it at all. (The Rust SDK still does not support this too. I expect a user to open a ticket/PR this week).
With AVRO-3601 we found out that using JsonDom.hh for the custom attributes is not recommended, thus the string-based approach.

I guess 1.11.2/1.12.0 will be released in several months, so whoever is interested in better handling of the custom attributes should step up and do it. Here I just tried to fix the broken installation of C++ SDK 1.11.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, minimally, the library should be able to read a schema that contains custom attributes with arbitrary value types, but not necessarily able to preserve the values in memory and write them out again. That would help compatibility with future versions of Avro, e.g. new standard logical types.

If CustomAttributes::attributes returns a reference to a map that contains the string values, then that makes it harder for a future version of the library to add support for other types without a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 1.11.2/1.12.0 will be released in several months, so whoever is interested in better handling of the custom attributes should step up and do it. Here I just tried to fix the broken installation of C++ SDK 1.11.1.

Let me re-phrase the above: PRs are very welcome!

@martin-g martin-g merged commit d70b847 into master Aug 15, 2022
martin-g added a commit that referenced this pull request Aug 15, 2022
#1826)

* AVRO-3601: CustomAttributes#getAttribute() now returns boost::optional

Add unit tests for CustomAttributes#getAttribute(string)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3601: Add unit tests for writing CustomAttributes's values as JSON strings

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
(cherry picked from commit d70b847)
@martin-g martin-g deleted the avro-3601-c++-simplify-custom-attributes-2 branch August 15, 2022 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ Pull Requests for C++ binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants