Encode column names with encoding of context #210

Adrian-Hirt · 2024-11-21T15:33:03Z

We detected some behaviour which might lead to some unexpected failures when migrating from the mysql2 Adapter to the trilogy adapter.
In the result set returned by a query, one can access the column names using the fields attribute, however the encoding of these strings is different from the encoding of the database.

Behaviour in mysql2:

When a query is executed via the database adapter (or rather its client), the column names (in the attribute fields of the result) respect the encoding set for the database connection. Running a query on a database with encoding: utf8mb4 specified in the client creation respects this encoding for the column names, returning the strings with Encoding:UTF-8 encoding.

Behaviour in trilogy:

Running the same query on the same database using the trilogy adapter returns the column names encoded with Encoding:US-ASCII, ignoring the database encoding.
This might lead to some unexpected encoding issues when the client uses the names from the query to further process the data.

Tests:

Behaviour with `mysql2` Adapter:

client = Mysql2::Client.new(host: '127.0.0.1', port: 3306, username: 'root', encoding: 'utf8mb4', collation: 'utf8mb4_unicode_520_ci', database: 'mydatabase')

if client.ping
  result = client.query('SELECT * FROM users LIMIT 10')
  puts result.fields.map(&:encoding).join(', ')
end

Result: UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8

Behaviour with `trilogy` 2.9.0:

client = Trilogy.new(host: '127.0.0.1', port: 3306, username: 'root', encoding: 'utf8mb4', collation: 'utf8mb4_unicode_520_ci', database: 'mydatabase')

if client.ping
  result = client.query('SELECT * FROM users LIMIT 10')
  puts result.fields.map(&:encoding).join(', ')
end

Result: US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII

Behaviour after patch:

client = Trilogy.new(host: '127.0.0.1', port: 3306, username: 'root', encoding: 'utf8mb4', collation: 'utf8mb4_unicode_520_ci', database: 'mydatabase')

if client.ping
  result = client.query('SELECT * FROM users LIMIT 10')
  puts result.fields.map(&:encoding).join(', ')
end

Result with patch: UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8

With this patch, trilogy returns the strings in the same encoding as the database, which matches the behaviour of the previous used mysql2 adapter.
I am not sure whether simply returning the strings in ASCII encoding is a deliberate choice or not, therefore feel free to accept or reject this PR, both is fine, I wanted to bring this to attention.

Let me know if you need any other informations, and I'd appreciate someone looking at my usage of Ruby C-API methods to ensure everything is used in its intended way, as I have only limited experience with the C-API.

The CI passed in my fork, see here and here.

Have a pleasant day!

byroot · 2024-12-02T08:51:45Z

contrib/ruby/ext/trilogy-ruby/cext.c

@@ -811,10 +811,12 @@ static VALUE read_query_response(VALUE vargs)
            }
        }

+        rb_encoding *conn_enc = rb_to_encoding(ctx->encoding);


rb_to_encoding is quite costly. We probably should it only once after connect or something, and keep a rb_encoding * in ctx.

At the very least, it should be moved outside the loop, so that if you have 20 fields you don't do the conversion 20 times.

Makes sense, I moved the call to rb_to_encoding to the rb_trilogy_connect method and store it in rb_encoding *conn_encoding in the ctx. I also replaced the call to the rb_to_encoding method in rb_trilogy_query to use this value in the ctx struct. Let me know if you'd need any other adaptions for this.

byroot · 2024-12-02T08:52:46Z

contrib/ruby/ext/trilogy-ruby/cext.c

@@ -811,10 +811,12 @@ static VALUE read_query_response(VALUE vargs)
            }
        }

+        rb_encoding *conn_enc = rb_to_encoding(ctx->encoding);
+
 #ifdef HAVE_RB_INTERNED_STR


This should be updated to HAVE_RB_ENC_INTERNED_STR and extconf.rb should also be updated accordingly.

Thanks, I changed this to use HAVE_RB_ENC_INTERNED_STR and added the corresponding have_func to extconf.rb

…heck for encoding support

byroot · 2024-12-05T09:59:41Z

contrib/ruby/ext/trilogy-ruby/cext.c

@@ -456,6 +457,7 @@ static VALUE rb_trilogy_connect(VALUE self, VALUE encoding, VALUE charset, VALUE

    RB_OBJ_WRITE(self, &ctx->encoding, encoding);


Suggested change

RB_OBJ_WRITE(self, &ctx->encoding, encoding);

Ah, you're keeping both. I think we can get rid of the reference to encoding. Just turn encoding into a rb_encoding *, so we don't need to mark it or antyhing.

Should I remove encoding from the ctx (and also from the trilogy_ctx struct) alltogether?

It's only used in the RB_OBJ_WRITE call (which, if I understand correctly, writes the encoding value to the ctx->encoding) and in the rb_to_encoding call, which probably makes storing this obsolete as we have the rb_encoding pointer in the ctx struct which can be used to encode values.

That's what I'm suggesting yes. That member is no longer used, so no point keeping it.

Ah sorry, didn't see your other comment in the conversation tab 😅 I applied these changes, now encoding in the ctx struct is the rb_encoding * value which can be directly used :)

…method calls.

contrib/ruby/ext/trilogy-ruby/extconf.rb

byroot · 2024-12-05T10:55:58Z

Looks good to me now, but I'm not a maintainer so I can't merge.

Co-authored-by: Jean Boussier <[email protected]>

Adrian-Hirt · 2024-12-05T10:59:50Z

@byroot Great, thanks for reviewing, in that case let's wait for someone from the maintainers to review this as well :)

composerinteralia · 2024-12-05T13:28:57Z

Makes sense to me. It's still not quite the same as mysql2, which uses the default internal encoding, but that might also be fine—we are not obligated to match mysql2 behavior.

Adrian-Hirt · 2024-12-06T14:09:32Z

Makes sense to me. It's still not quite the same as mysql2, which uses the default internal encoding, but that might also be fine—we are not obligated to match mysql2 behavior.

Not matching the exact mysql2 behaviour is totally fine, I only mentioned this because it broke some code that relied on non-ascii encoding of column names 😅

Let me know if you need me to change anything else for you to be able to accept this PR 😄

Encode column names with encoding of context

9ca5024

byroot reviewed Dec 2, 2024

View reviewed changes

Adrian-Hirt added 3 commits December 5, 2024 09:37

Use HAVE_RB_ENC_INTERNED_STR instead of HAVE_RB_INTERNED_STR to c…

9343cd4

…heck for encoding support

Move rb_to_encoding call out of the loop

b55afcf

Store rb_encoding in trilogy_ctx struct

70410b8

byroot reviewed Dec 5, 2024

View reviewed changes

Use rb_encoding * as type for connection encoding, remove obsolete …

88d6be0

…method calls.

byroot reviewed Dec 5, 2024

View reviewed changes

contrib/ruby/ext/trilogy-ruby/extconf.rb Outdated Show resolved Hide resolved

Remove obsolete have_func("rb_interned_str") call

f84264c

Co-authored-by: Jean Boussier <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode column names with encoding of context #210

Encode column names with encoding of context #210

Adrian-Hirt commented Nov 21, 2024

byroot Dec 2, 2024

byroot Dec 2, 2024

Adrian-Hirt Dec 5, 2024

byroot Dec 2, 2024

Adrian-Hirt Dec 5, 2024

byroot Dec 5, 2024

byroot Dec 5, 2024

Adrian-Hirt Dec 5, 2024

byroot Dec 5, 2024

Adrian-Hirt Dec 5, 2024

byroot commented Dec 5, 2024

Adrian-Hirt commented Dec 5, 2024

composerinteralia commented Dec 5, 2024

Adrian-Hirt commented Dec 6, 2024 •

edited

Loading

		@@ -456,6 +457,7 @@ static VALUE rb_trilogy_connect(VALUE self, VALUE encoding, VALUE charset, VALUE

		RB_OBJ_WRITE(self, &ctx->encoding, encoding);

Encode column names with encoding of context #210

Are you sure you want to change the base?

Encode column names with encoding of context #210

Conversation

Adrian-Hirt commented Nov 21, 2024

Behaviour in mysql2:

Behaviour in trilogy:

Tests:

Behaviour with mysql2 Adapter:

Behaviour with trilogy 2.9.0:

Behaviour after patch:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byroot commented Dec 5, 2024

Adrian-Hirt commented Dec 5, 2024

composerinteralia commented Dec 5, 2024

Adrian-Hirt commented Dec 6, 2024 • edited Loading

Behaviour with `mysql2` Adapter:

Behaviour with `trilogy` 2.9.0:

Adrian-Hirt commented Dec 6, 2024 •

edited

Loading