Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3601: CustomAttributes#getAttribute() now returns boost::optional #1826

Merged
merged 2 commits into from
Aug 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion lang/c++/api/CustomAttributes.hh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#ifndef avro_CustomAttributes_hh__
#define avro_CustomAttributes_hh__

#include <boost/optional.hpp>
#include <iostream>
#include <map>
#include <string>
Expand All @@ -33,7 +34,7 @@ class AVRO_DECL CustomAttributes {
public:
// Retrieves the custom attribute json entity for that attributeName, returns an
// null if the attribute doesn't exist.
std::string getAttribute(const std::string &name) const;
boost::optional<std::string> getAttribute(const std::string &name) const;

// Adds a custom attribute. If the attribute already exists, throw an exception.
void addAttribute(const std::string &name, const std::string &value);
Expand Down
8 changes: 5 additions & 3 deletions lang/c++/impl/CustomAttributes.cc
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,15 @@

namespace avro {

std::string CustomAttributes::getAttribute(const std::string &name) const {
boost::optional<std::string> CustomAttributes::getAttribute(const std::string &name) const {
boost::optional<std::string> result;
std::map<std::string, std::string>::const_iterator iter =
attributes_.find(name);
if (iter == attributes_.end()) {
return NULL;
return result;
}
return iter->second;
result = iter->second;
return result;
}

void CustomAttributes::addAttribute(const std::string& name,
Expand Down
28 changes: 24 additions & 4 deletions lang/c++/test/unittest.cc
Original file line number Diff line number Diff line change
Expand Up @@ -442,7 +442,12 @@ struct TestSchema {
concepts::MultiAttribute<CustomAttributes> customAttributes;

CustomAttributes cf;
cf.addAttribute("extra field", std::string("1"));
cf.addAttribute("stringField", std::string("\\\"field value with \\\"double quotes\\\"\\\""));
cf.addAttribute("booleanField", std::string("true"));
cf.addAttribute("numberField", std::string("1.23"));
cf.addAttribute("nullField", std::string("null"));
cf.addAttribute("arrayField", std::string("[1]"));
cf.addAttribute("mapField", std::string("{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}"));
fieldNames.add("f1");
fieldValues.add(NodePtr( new NodePrimitive(Type::AVRO_LONG)));
customAttributes.add(cf);
Expand All @@ -452,7 +457,14 @@ struct TestSchema {
customAttributes);
std::string expectedJsonWithCustomAttribute =
"{\"type\": \"record\", \"name\": \"Test\",\"fields\": "
"[{\"name\": \"f1\", \"type\": \"long\",\"extra field\": \"1\"}]}";
"[{\"name\": \"f1\", \"type\": \"long\", "
"\"arrayField\": \"[1]\", "
"\"booleanField\": \"true\", "
"\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", "
"\"nullField\": \"null\", "
"\"numberField\": \"1.23\", "
"\"stringField\": \"\\\"field value with \\\"double quotes\\\"\\\"\""
Comment on lines +460 to +466
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in #1821 (comment), I think this should be

Suggested change
"[{\"name\": \"f1\", \"type\": \"long\", "
"\"arrayField\": \"[1]\", "
"\"booleanField\": \"true\", "
"\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", "
"\"nullField\": \"null\", "
"\"numberField\": \"1.23\", "
"\"stringField\": \"\\\"field value with \\\"double quotes\\\"\\\"\""
"[{\"name\": \"f1\", \"type\": \"long\", "
"\"arrayField\": [1], "
"\"booleanField\": true, "
"\"mapField\": {\"key1\":\"value1\", \"key2\":\"value2\"}, "
"\"nullField\": null, "
"\"numberField\": 1.23, "
"\"stringField\": \"field value with \\\"double quotes\\\"\""

i.e. CustomAttributes.printJson should assume that the std::string values are already in JSON format, and write them out without adding any quotation marks around them or backslashes within them. Likewise, callers of CustomAttributes::addAttribute (especially in Compiler.cc) should provide a JSON-format std::string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CustomAttributes worked that way, then it would be able to use just std::string rather than boost::optional<std::string>, because an empty std::string could mean that the attribute is not present, while an std::string containing two quotation marks "" would mean that the value is an empty JSON string literal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the JSON representation of custom attributes be compatible with Avro IDL? The IDL Language spec is not clear on whether the thing between parentheses in an annotation is always a JSON value.

Copy link
Member Author

@martin-g martin-g Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the content is preserved as whatever the user provided. It could be JSON, XML, base64, ...
It is up-to the user app to encode/decode the values.
You might be right about the non-optional representation ("") but IMO this way it is more clear. Other opinions are also welcome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an Avro schema file, must all custom attributes of fields have string values? I.e. is this invalid:

{
    "type": "record",
    "name": "Demo",
    "fields": [
        {
            "name": "field",
            "type": "string",
            "custom_flag": true
        }
    ]
}

If this schema is not invalid, then is the Avro C++ library able to load it from a file and then write it to another file, preserving the custom attribute?

Copy link
Member Author

@martin-g martin-g Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Avro spec does not say anything about the possible value types of the custom attributes/metadata.

Until AVRO-3547 the C++ SDK didn't support it at all. (The Rust SDK still does not support this too. I expect a user to open a ticket/PR this week).
With AVRO-3601 we found out that using JsonDom.hh for the custom attributes is not recommended, thus the string-based approach.

I guess 1.11.2/1.12.0 will be released in several months, so whoever is interested in better handling of the custom attributes should step up and do it. Here I just tried to fix the broken installation of C++ SDK 1.11.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, minimally, the library should be able to read a schema that contains custom attributes with arbitrary value types, but not necessarily able to preserve the values in memory and write them out again. That would help compatibility with future versions of Avro, e.g. new standard logical types.

If CustomAttributes::attributes returns a reference to a map that contains the string values, then that makes it harder for a future version of the library to add support for other types without a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 1.11.2/1.12.0 will be released in several months, so whoever is interested in better handling of the custom attributes should step up and do it. Here I just tried to fix the broken installation of C++ SDK 1.11.1.

Let me re-phrase the above: PRs are very welcome!

"}]}";
testNodeRecord(nodeRecordWithCustomAttribute,
expectedJsonWithCustomAttribute);
}
Expand All @@ -467,8 +479,6 @@ struct TestSchema {
concepts::MultiAttribute<NodePtr> fieldValues;
std::vector<GenericDatum> defaultValues;

CustomAttributes cf;
cf.addAttribute("extra field", std::string("1"));
fieldNames.add("f1");
fieldValues.add(NodePtr( new NodePrimitive(Type::AVRO_LONG)));

Expand All @@ -481,6 +491,15 @@ struct TestSchema {
expectedJsonWithoutCustomAttribute);
}

void checkCustomAttributes_getAttribute()
{
CustomAttributes cf;
cf.addAttribute("field1", std::string("1"));

BOOST_CHECK_EQUAL(std::string("1"), *cf.getAttribute("field1"));
BOOST_CHECK_EQUAL(false, cf.getAttribute("not_existing").is_initialized());
}

void test() {
std::cout << "Before\n";
schema_.toJson(std::cout);
Expand All @@ -505,6 +524,7 @@ struct TestSchema {

checkNodeRecordWithoutCustomAttribute();
checkNodeRecordWithCustomAttribute();
checkCustomAttributes_getAttribute();
}

ValidSchema schema_;
Expand Down