Skip to content

Conversation

@withinboredom
Copy link
Contributor

Description

To support something like protobuf, Serde requires more type tags than PHP currently has support for. This adds multiple new type tags that currently do nothing:

  • Binary: marks a string as "binary" vs. utf8 (or whatever encoding it is).
  • Fixed32: marks an integer or string as a fixed-width unsigned 32-bit integer. This can be represented fully on a 64-bit system, but not on a 32-bit system.
  • Fixed64: marks an integer or string as a fixed-width unsigned 64-bit integer. This can be represented partially on a 64-bit system, but not fully.
  • Float32: marks a float as a 32-bit float.
  • Float64: marks a string or float as a 64-bit float. A string representation may be required on 32-bit systems.
  • Int32: marks an integer as a 32-bit integer.
  • Int64: marks an integer or string as a 64-bit integer. A string representation may be required on 32-bit systems.
  • SFixed32: marks an integer as a fixed 32-bit signed integer. For binary representations, this will always be 32-bits, including padding.
  • SFixed64: marks an integer as a fixed 64-bit signed integer. For binary representations, this will always be 64-bits, even if it could fit in a 32-bit number.
  • SInt32: marks an integer as a signed 32-bit integer.
  • SInt64: marks an integer or string as a signed 64-bit integer.
  • UInt32: marks an integer or string as an unsigned 32-bit integer.
  • UInt64: marks an integer or string as an unsigned 64-bit integer.

Motivation and context

These type-tags are required for protobuf support to work correctly. There are various differences in encoding for each of these types in protobuf formats. (for example, SInt's are encoded using a zig-zag encoding for more space-efficient representation).

Additionally, other formatters could make use of this. For example, Yaml/JSON could base64 encode strings marked Binary. For string-numbers, it can pass through the string representation as a number instead of encoding a string.

How has this been tested?

These type tags currently have no effect on the output generated. Additional behaviour will come in a followup PR.

Screenshots (if appropriate)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

Please, please, please, don't send your pull request until all of the boxes are ticked. Once your pull request is created, it will trigger a build on our continuous integration server to make sure your tests and code style pass.

  • I have read the CONTRIBUTING document.
  • My pull request addresses exactly one patch/feature.
  • I have created a branch for this patch/feature.
  • Each individual commit in the pull request is meaningful.
  • I have added tests to cover my changes.
  • If my change requires a change to the documentation, I have updated it accordingly.

If you're unsure about any of these, don't hesitate to ask. We're here to help!


public function validate(mixed $value): bool
{
return is_numeric($value);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unsigned int can probably be validated as positive. If provided with a negative value, that's a validation error, I'd think.

@Crell
Copy link
Owner

Crell commented May 30, 2025

Oh my...

A couple of thoughts:

  • I'd rather not merge anything until a final complete version is ready. That way we don't leave early designs lying about in the code before it's done.
  • A TypeField generally matches up with a corresponding handler pair. (Importer/Exporter.) Does this mean we're going to need 13 more handler pairs? As currently implemented I can see that being a performance problem, as the full list is rescanned for every property. I'd likely need to devise a different, faster matching process.
  • I am concerned that this will necessitate adding a bunch of more output types to Formatters/Deformatters, as well. I don't know what the impact of those would be on the rest of the system. Something to think through.
  • If we go this route, we'll need a lot of thinking through how these types behave on existing formats. Turning a binary string into base64 is certainly one option, though that would likely involve adding a FormatField attribute to control behavior only under certain formats. (Something I've considered in the past and even started doing, but haven't gotten around to finishing. It was part of my never-finished attempt at XML support, because XML is annoying.) I'm not sure yet what all of the implications are.

This may make sense to try and sort through in chat at some point. Also worth looking to see how Rust Serde handles this. I know it supports more types of type than mine, but so does Rust itself. I trimmed it down to just those types that PHP reasonably cares about.

@withinboredom withinboredom mentioned this pull request May 30, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants