-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode column names with encoding of context #210
base: main
Are you sure you want to change the base?
Encode column names with encoding of context #210
Conversation
contrib/ruby/ext/trilogy-ruby/cext.c
Outdated
@@ -811,10 +811,12 @@ static VALUE read_query_response(VALUE vargs) | |||
} | |||
} | |||
|
|||
rb_encoding *conn_enc = rb_to_encoding(ctx->encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rb_to_encoding
is quite costly. We probably should it only once after connect or something, and keep a rb_encoding *
in ctx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, it should be moved outside the loop, so that if you have 20 fields you don't do the conversion 20 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I moved the call to rb_to_encoding
to the rb_trilogy_connect
method and store it in rb_encoding *conn_encoding
in the ctx
. I also replaced the call to the rb_to_encoding
method in rb_trilogy_query
to use this value in the ctx
struct. Let me know if you'd need any other adaptions for this.
contrib/ruby/ext/trilogy-ruby/cext.c
Outdated
@@ -811,10 +811,12 @@ static VALUE read_query_response(VALUE vargs) | |||
} | |||
} | |||
|
|||
rb_encoding *conn_enc = rb_to_encoding(ctx->encoding); | |||
|
|||
#ifdef HAVE_RB_INTERNED_STR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be updated to HAVE_RB_ENC_INTERNED_STR
and extconf.rb
should also be updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I changed this to use HAVE_RB_ENC_INTERNED_STR
and added the corresponding have_func
to extconf.rb
contrib/ruby/ext/trilogy-ruby/cext.c
Outdated
@@ -456,6 +457,7 @@ static VALUE rb_trilogy_connect(VALUE self, VALUE encoding, VALUE charset, VALUE | |||
|
|||
RB_OBJ_WRITE(self, &ctx->encoding, encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RB_OBJ_WRITE(self, &ctx->encoding, encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're keeping both. I think we can get rid of the reference to encoding
. Just turn encoding
into a rb_encoding *
, so we don't need to mark it or antyhing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove encoding
from the ctx
(and also from the trilogy_ctx
struct) alltogether?
It's only used in the RB_OBJ_WRITE
call (which, if I understand correctly, writes the encoding
value to the ctx->encoding
) and in the rb_to_encoding
call, which probably makes storing this obsolete as we have the rb_encoding
pointer in the ctx
struct which can be used to encode values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm suggesting yes. That member is no longer used, so no point keeping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, didn't see your other comment in the conversation tab 😅 I applied these changes, now encoding
in the ctx
struct is the rb_encoding *
value which can be directly used :)
Looks good to me now, but I'm not a maintainer so I can't merge. |
Co-authored-by: Jean Boussier <[email protected]>
@byroot Great, thanks for reviewing, in that case let's wait for someone from the maintainers to review this as well :) |
Makes sense to me. It's still not quite the same as mysql2, which uses the default internal encoding, but that might also be fine—we are not obligated to match mysql2 behavior. |
Not matching the exact mysql2 behaviour is totally fine, I only mentioned this because it broke some code that relied on non-ascii encoding of column names 😅 Let me know if you need me to change anything else for you to be able to accept this PR 😄 |
We detected some behaviour which might lead to some unexpected failures when migrating from the
mysql2
Adapter to thetrilogy
adapter.In the result set returned by a query, one can access the column names using the
fields
attribute, however the encoding of these strings is different from the encoding of the database.Behaviour in mysql2:
When a query is executed via the database adapter (or rather its client), the column names (in the attribute
fields
of the result) respect the encoding set for the database connection. Running a query on a database withencoding: utf8mb4
specified in the client creation respects this encoding for the column names, returning the strings withEncoding:UTF-8
encoding.Behaviour in trilogy:
Running the same query on the same database using the trilogy adapter returns the column names encoded with
Encoding:US-ASCII
, ignoring the database encoding.This might lead to some unexpected encoding issues when the client uses the names from the query to further process the data.
Tests:
Behaviour with
mysql2
Adapter:Result:
UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8
Behaviour with
trilogy
2.9.0:Result:
US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII
Behaviour after patch:
Result with patch:
UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8
With this patch, trilogy returns the strings in the same encoding as the database, which matches the behaviour of the previous used mysql2 adapter.
I am not sure whether simply returning the strings in ASCII encoding is a deliberate choice or not, therefore feel free to accept or reject this PR, both is fine, I wanted to bring this to attention.
Let me know if you need any other informations, and I'd appreciate someone looking at my usage of Ruby C-API methods to ensure everything is used in its intended way, as I have only limited experience with the C-API.
The CI passed in my fork, see here and here.
Have a pleasant day!