Background
I'm creating a TOML config parser for SQLx to override various aspects of the library's operation, namely the query macros, as well as migrations. (launchbadge/sqlx#3383)
One big need is to provide a way to override or augment the default SQL -> Rust type mappings in the macros, so I'm adding a section like this to the TOML format:
[macros.type_overrides]
# SQL type `foo` will map to `crate::types::Foo`
foo = "crate::types::Foo"
This is going to be most useful with PostgreSQL, which has a rich type system supporting user-created types (as well as a dynamically loaded extension system which can also add types).
Making it easier to use custom types is one of the main goals of this feature.
Type names in Postgres can be schema-qualified, and to support this in TOML, I'm just allowing type names to be arbitrary paths (using a type with a custom Deserialize implementation to read nested tables):
[macros.type_overrides]
foo.foo = "crate::types::Foo"
This would also make it easier to define multiple types for the same schema, as the user can just create a new table:
[macros.type_overrides.foo]
# foo.foo
foo = "crate::types::Foo"
# foo.bar
bar = "crate::types::Bar"
Problem
In Postgres, both the name of the type as well as the name of the schema can be quoted identifiers, which are case-sensitive whereas normal unquoted identifiers are not. Thus, it is important to distinguish between the two.
From my experience, I know with some certainty that a typical user's first instinct will be to just do this:
[macros.type_overrides]
# *NOT* `foo."foo"` in SQLx (parses as `foo.foo`)
foo."foo" = "crate::types::Foo"
But this will not result in the expected behavior because toml will parse out the quotes, and SQLx won't know the difference.
This is an issue with or without a schema qualifier:
[macros.type_overrides]
# *NOT* `"foo"` in SQLx (parses as `foo`)
"foo" = "crate::types::Foo"
Proposed Solution
I think the most versatile solution would be to add a type analogous to serde_json::value::RawValue.
When parsing, I could use something like this to get the raw key, then decide whether to parse it or use it raw.
A simpler solution would be to just add a specific analogue just for keys, maybe implementing Deserialize for toml_edit::Key itself. That would satisfy my use-case but little else.
I'm willing to put some work into a PR if we agree on a solution here.
Alternatives
Fix It in Post Documentation
This could easily be written off as a simple documentation issue on SQLx's part. If the key is wrapped in a second pair of quotes, SQLx will see the inner quotes:
[macros.type_overrides]
# `foo."foo"` in SQLx
foo.'"foo"' = "crate::types::Foo"
This is what I'm going with for now so I'm not blocked on this proposal. However, it's not a satisfying solution on its own because there's always users who don't fully read documentation and just go with their gut, then open an issue when things don't work as expected. I'd prefer SQLx to be able to handle this intelligently.
Spanned
While reading the source, I did notice the support for serde_spanned::Spanned which would theoretically allow me to implement this myself.
I could deserialize a key with Spanned<IgnoredAny>, then go back to the TOML string and get the raw key from the given span.
The problem is, how do I thread the TOML string that deep in the call tree?
- Thread-local storage?
- Would work, but TLS is not particularly fast to access. I'm not overly worried about performance, but I'm trying to avoid any huge pitfalls.
- I might try this if my main proposal goes nowhere and quoted identifiers prove to be a significant problem.
DeserializeSeed?
- Non-starter. I'd have to replace
#[derive(serde::Deserialize)] with manual implementations, which would make it more cumbersome for maintenance and future additions.
Also, is Spanned support even part of toml/toml_edit's SemVer guarantees?
Deserialize using the parsed TOML structure
This is a non-starter for the same reasons DeserializeSeed is.
Lint the TOML or Two-Pass Deserializtion
These are basically the same thing and have mostly the same problems.
Alongside the documentation alternative, I could lint for improperly quoted names in the TOML by parsing to a toml_edit::DocumentMut, inspecting the structure and then deserializing the config structure if I don't find any problems.
Alternatively, I could parse to a toml_edit::DocumentMut, deserialize the config structure, then traverse the TOML structure and go back and fix-up quoted names.
Either way, it requires duplicating some of the work that deserialization is doing and having two different places in the code where semantics are defined, which is not great for maintainability.
Background
I'm creating a TOML config parser for SQLx to override various aspects of the library's operation, namely the query macros, as well as migrations. (launchbadge/sqlx#3383)
One big need is to provide a way to override or augment the default SQL -> Rust type mappings in the macros, so I'm adding a section like this to the TOML format:
This is going to be most useful with PostgreSQL, which has a rich type system supporting user-created types (as well as a dynamically loaded extension system which can also add types).
Making it easier to use custom types is one of the main goals of this feature.
Type names in Postgres can be schema-qualified, and to support this in TOML, I'm just allowing type names to be arbitrary paths (using a type with a custom
Deserializeimplementation to read nested tables):This would also make it easier to define multiple types for the same schema, as the user can just create a new table:
Problem
In Postgres, both the name of the type as well as the name of the schema can be quoted identifiers, which are case-sensitive whereas normal unquoted identifiers are not. Thus, it is important to distinguish between the two.
From my experience, I know with some certainty that a typical user's first instinct will be to just do this:
But this will not result in the expected behavior because
tomlwill parse out the quotes, and SQLx won't know the difference.This is an issue with or without a schema qualifier:
Proposed Solution
I think the most versatile solution would be to add a type analogous to
serde_json::value::RawValue.When parsing, I could use something like this to get the raw key, then decide whether to parse it or use it raw.
A simpler solution would be to just add a specific analogue just for keys, maybe implementing
Deserializefortoml_edit::Keyitself. That would satisfy my use-case but little else.I'm willing to put some work into a PR if we agree on a solution here.
Alternatives
Fix It in
PostDocumentationThis could easily be written off as a simple documentation issue on SQLx's part. If the key is wrapped in a second pair of quotes, SQLx will see the inner quotes:
This is what I'm going with for now so I'm not blocked on this proposal. However, it's not a satisfying solution on its own because there's always users who don't fully read documentation and just go with their gut, then open an issue when things don't work as expected. I'd prefer SQLx to be able to handle this intelligently.
SpannedWhile reading the source, I did notice the support for
serde_spanned::Spannedwhich would theoretically allow me to implement this myself.I could deserialize a key with
Spanned<IgnoredAny>, then go back to the TOML string and get the raw key from the given span.The problem is, how do I thread the TOML string that deep in the call tree?
DeserializeSeed?#[derive(serde::Deserialize)]with manual implementations, which would make it more cumbersome for maintenance and future additions.Also, is
Spannedsupport even part oftoml/toml_edit's SemVer guarantees?Deserialize using the parsed TOML structure
This is a non-starter for the same reasons
DeserializeSeedis.Lint the TOML or Two-Pass Deserializtion
These are basically the same thing and have mostly the same problems.
Alongside the documentation alternative, I could lint for improperly quoted names in the TOML by parsing to a
toml_edit::DocumentMut, inspecting the structure and then deserializing the config structure if I don't find any problems.Alternatively, I could parse to a
toml_edit::DocumentMut, deserialize the config structure, then traverse the TOML structure and go back and fix-up quoted names.Either way, it requires duplicating some of the work that deserialization is doing and having two different places in the code where semantics are defined, which is not great for maintainability.