-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: (#3387) use collation to determine if string type is compatible #3400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: (#3387) use collation to determine if string type is compatible #3400
Conversation
DrewMcArthur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abonander @alu I'd love your feedback on this, whenever you get a chance 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes are all a result of checking against SELECT ID, COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS ORDER BY ID; locally - let me know if any of them should be thrown out.
- There was an additional ~100 collations on my machine, which I added and then commented out, rather than implementing all the other pieces necessary. let me know if you'd like me to finish that out, I'd be happy to, if that's something we want. otherwise, we can remove the comments of those extra collations.
uft8_*was displayed asutf8mb3_*on my machine. let me know if those changes should be reverted- also added
impl TryFrom<u16> for Collationfor use here
| // https://dev.mysql.com/doc/internals/en/com-query-response.html#column-type | ||
| // dead link ^ | ||
| // https://dev.mysql.com/doc/dev/mysql-server/9.0.0/page_protocol_com_query_response_text_resultset_column_definition.html | ||
| // the "type of the column as defined in enum_field_types" | ||
| // https://dev.mysql.com/doc/dev/mysql-server/9.0.0/field__types_8h.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to suggest which of these links we should keep - the original link here was dead when i tried it.
| MySqlTypeInfo { | ||
| r#type: ColumnType::VarString, // VARCHAR | ||
| flags: ColumnFlags::empty(), | ||
| collation: Collation::utf8mb3_bin, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using utf8mb3_bin regardless of session state is misleading.
In my opinion, the only difference needed in this context is whether it is binary or string.
How about using an enum like this one?
CollationType {
Binary,
String
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was one of the pieces I wasn't sure about, what the default should be for some of these. I think you're right, _bin probably isn't right & I'll update this to utf8mb3_general_ci if you think that'd be correct.
As for changing the type of MySqlTypeInfo's collation field to an enum, I think the source of the issue on #3387 is that there are three cases being compressed into the ColumnFlag::binary flag:
- a
VARCHARhas a collation likeutf8_general_ci, and theColumnFlag::binaryis false, so this is compatible with a RustString - a
BLOBhas a collation ofbinary,ColumnFlag::binaryis true, so it's not compatible with a RustString, which is also correct - a
VARCHARhas a collation of*_bin, soColumnFlag::binaryis true, so it's marked incompatible when we should be able to decode this into aString. This is the case that needs to be fixed.
This could be handled with that enum, if the value is String for both VARCHAR cases, but I feel like that could also be confusing to have a CollationType::String (as opposed to binary) while simultaneously having ColumnFlag::binary = true.
I think my preference is to keep track of each unique collation here, since it's only a byte, but I'll defer to you and @abonander to decide what makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| collation: Collation::utf8mb3_bin, | |
| collation: Collation::utf8mb3_general_ci, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ @alu (sorry, I'm terrible at remembering to directly @ people in replies here)
|
@alu This fixes only for MySQL, not for SQLite one. I understand that original question is for MySQL, but SQLite and presumably PostgreSQL are also affected. Please, tell me if i need to create a separate issue for SQLite. |
This comment was marked as spam.
This comment was marked as spam.
|
@eirnym oh good catch - my bandwidth is pretty low right now, so if you wanted to open a PR into my branch with those changes, or even just the tests for them that'd be awesome! @alvico I'm not sure their release schedule, but you can target this branch in your |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
|
Sorry it took so long to get to this. I really wanted to get #3383 done for 0.9.0 so this had to wait. Conflating the method of sorting stored data and the character set being used over the wire is one of several design decisions that has made me come to really loathe MySQL. They could have sent the character set ID instead, for not much extra overhead (it's just one more lookup, probably entirely in-memory), and saved the rest of us a ton of headache. On the surface, I don't see much point in cataloguing collations. We can't treat them as a closed set because new ones are being added all the time, and they're not always UTF-8 either, so we can't assume anything about a collation we don't recognize. The I was thinking there's only one collation we should care about, and that's Ultimately, it's a pretty clear logic error if the user tries to encode or decode a string using a completely incompatible encoding. I think we could catch this if we really wanted to, but I'm wondering about the ratio of value to effort there. I guess one thing we should worry about is when a given sequence of bytes is valid for two different encodings, but doesn't actually represent the same character. That's tantamount to silent memory corruption, which would be pretty bad. We could probably do a similar thing to what we do in Postgres, and query the metadata for a given collation when we see it. Then we can actually check if the character set is compatible, since that's a significantly smaller set of possibilities, and doesn't change much. We can't do that in every situation, however; during the simple query flow (not using a prepared statement), the column data immediately precedes the data we need to decode, so there's no opportunity to stop and issue another query first. So ultimately, we probably need to catalogue collations anyway so we don't always have to query them. But Given that this is a major departure from the current PR and that your last message mentioned you didn't have the bandwidth to finish this, I'm going to close. However, I'm going to follow up with my own PR cause this really needs to be fixed. By the way, this is not an issue for Postgres or SQLite. SQLite defines strings to always be UTF-8, and Postgres transcodes to/from the character set requested by the client. Any issue in other databases which seems to be related is almost certainly not. Please open a search for an existing issue specific to that database or open a new one. |
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
cc #3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes #3200 fixes #3387 fixes #3390 fixes #3409
…r` (launchbadge#3924) cc launchbadge#3400 (comment) comment in `sqlx-mysql/src/collation.rs` for explanation fixes launchbadge#3200 fixes launchbadge#3387 fixes launchbadge#3390 fixes launchbadge#3409
Does your PR solve an issue?
fixes #3387
fixes #3390
impl TryFrom<u16> for Collationcollationfield toTypeInfostrcompatibility check to compare withCollation::binaryinstead ofColumnFlags::BinaryVARCHARwith*_bincollation incorrectly interpreted asVARBINARYColumn #3387, failed onmain, and passes now. (let me know if we'd like to make this test more robust)todo
<uuid::fmt::Hyphenated as sqlx::Type<MySql>::compatible()is wrong? #3409