Skip to content

Add Text/Integer Converter operation#2213

Merged
GCHQDeveloper581 merged 6 commits intogchq:masterfrom
p-leriche:string-integer
Mar 8, 2026
Merged

Add Text/Integer Converter operation#2213
GCHQDeveloper581 merged 6 commits intogchq:masterfrom
p-leriche:string-integer

Conversation

@p-leriche
Copy link
Contributor

@p-leriche p-leriche commented Feb 28, 2026

Description

Converts between text strings and large integers (decimal or hexadecimal), interpreting text as big-endian byte sequences.

Changes

  • Add TextIntegerConverter.mjs operation
  • Update Categories.json to include Text/Integer Converter in Data format category

Functionality

Bidirectional conversion between text and numeric representations:

  • Text → Integer: Interprets each character's code as bytes in big-endian order
  • Integer → Text: Decodes integer as big-endian bytes to characters

Examples:

  • "ABC" → 4276803 (decimal) or 0x414243 (hex)
  • 0x48656C6C6F → "Hello"
  • 123456789 → corresponding text representation

Character limitations:

  • Only ASCII and Latin-1 characters are accepted. Code points > 255 generate an error.

Use Cases

  • Cryptographic operations requiring numeric text representations
  • Encoding/decoding text as numeric values
  • Data format conversions
  • Understanding byte-level text encoding

Features

  • Automatic input format detection:
    • Quoted strings ("text" or 'text')
    • Hexadecimal (0x... prefix)
    • Decimal (plain numbers)
    • Unquoted text (treated as string)
  • Output in string, decimal, or hexadecimal format
  • Fully pipeable for use in operation chains
  • Round-trip validation (string→integer→string preserves original)
  • Handles arbitrarily large text using BigInt

Dependencies

Depends on BigIntUtils PR #2205

This operation uses parseBigInt from the shared BigIntUtils library.

Satisfies

Feature Request #2141

This operation uses parseBigInt from the shared BigIntUtils library.

Testing

  • Tested text to decimal conversion
  • Tested text to hexadecimal conversion
  • Tested decimal to text conversion
  • Tested with quoted and unquoted input
  • Tested hexadecimal to text conversion
  • Verified round-trip conversions (string→int→string)
  • Verified handling of Latin-1 with round-trip conversion
  • Built successfully with npm run build

@GCHQDeveloper581
Copy link
Contributor

Produces (not entirely) unexpected results when the input comprises characters outside the normal ASCII set, and in particular round-tripping won't work.

Eg for input "aΓa" (Γ being capital gamma from the ancient Greek UTF set), the output for the recipe
[
{ "op": "Text-Integer Conversion",
"args": ["Decimal"] },
{ "op": "Text-Integer Conversion",
"args": ["String"] }
]
is "c�a"

This is inherent in the difference between "characters" and "character codes" so I don't see there's any way to "fix" this and maintain the desired functionality so I suggest adding something to the description to the effect that the input string must be pure ASCII for the output to be consistent.

@p-leriche
Copy link
Contributor Author

p-leriche commented Mar 7, 2026

Rather than saying "don't do it" in the description for code points > 255 whilst giving a result which might puzzle the user, I decided to throw an informative error to make it clear they weren't supported. ASCII + Larin-1 only also adde to the description.

I considered converting input strings to UTF-8, but that's making an assumption about how any real world algotithm would treat Unicode text. Who knows - it might pack it as UTF-16. A UTF-8 or UTF-16 option could be added later as an enhancement if a use case arose.

Incidentally, I couldn't make a test for a failure case work, but since I couldn't find another test doing that to take as a pattern, it may be that I was doing it wrong. Or it may be (perish the thought) that the testing infrastructure has a bug. But evidently, testing failure cases doesn't seem to be a priority for you.

@GCHQDeveloper581
Copy link
Contributor

GCHQDeveloper581 commented Mar 8, 2026

Error case test added to counter aspersions about testing failure cases not being a priority. :-)

(I agree that the way of doing this is not entirely intuitive, and indeed in this case I wrote the test first and added the output once I worked out what would actually happen. And searching for them to copy isn't easy as there isn't a pattern to search for that's obviously different from a normal test)

Copy link
Contributor

@GCHQDeveloper581 GCHQDeveloper581 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! (and adding a soft failure for the non-Latin use case is much better than my suggestion of documenting that it won't work).

Thanks for your contribution.

@GCHQDeveloper581 GCHQDeveloper581 merged commit f759f4c into gchq:master Mar 8, 2026
2 checks passed
@p-leriche
Copy link
Contributor Author

p-leriche commented Mar 8, 2026

Error case test added to counter aspersions about testing failure cases not being a priority. :-)

(I agree that the way of doing this is not entirely intuitive, and indeed in this case I wrote the test first and added the output once I worked out what would actually happen. And searching for them to copy isn't easy as there isn't a pattern to search for that's obviously different from a normal test)

Yay! Thanks for the merge.
Error case testing is easy when you know how. Claude was suggesting I needed to add expectedError: true - not sure whether it was hallucinating, or whether it might have picked that up in a different context.

@GCHQDeveloper581
Copy link
Contributor

Every day is a school day, and Claude wins a coconut!

Looking here:

if (result.error) {
if (test.expectedError) {
if (result.error.displayStr === test.expectedOutput) {
ret.status = "passing";
} else {
ret.status = "failing";
ret.output = [
"Expected",
"\t" + test.expectedOutput.replace(/\n/g, "\n\t"),
"Received",
"\t" + result.error.displayStr.replace(/\n/g, "\n\t"),
].join("\n");
}
} else {
ret.status = "erroring";
ret.output = result.error.displayStr;
}
} else {
if (test.expectedError) {
ret.status = "failing";
ret.output = "Expected an error but did not receive one.";
} else if (result.result === test.expectedOutput) {
ret.status = "passing";
} else if ("expectedMatch" in test && test.expectedMatch.test(result.result)) {
ret.status = "passing";
} else if ("unexpectedMatch" in test && !test.unexpectedMatch.test(result.result)) {
ret.status = "passing";
} else {
ret.status = "failing";
const expected = test.expectedOutput ? test.expectedOutput :
test.expectedMatch ? test.expectedMatch.toString() :
test.unexpectedMatch ? "to not find " + test.unexpectedMatch.toString() :
"unknown";
ret.output = [
"Expected",
"\t" + expected.replace(/\n/g, "\n\t"),
"Received",
"\t" + result.result.replace(/\n/g, "\n\t"),
].join("\n");
}
}

we can see Claude is entirely correct. However because of the nature of the if statements used it tends to work even if you don't specify expectedError = True (and since the difference between "normal" and "error" output is generally pretty obvious the simpler version is good enough for most purposes.

Furthermore there is only one test in the entire codebase that actually makes correct use of this:

{
name: "From Base85",
input: wpOutput + "v",
expectedError: true,
expectedOutput: "From Base85 - Invalid character 'v' at index 337",
recipeConfig: [
{ "op": "From Base85",
"args": ["!-u", false] }
]
},

(and no, being relatively new to this codebase myself, I didn't know this until ... now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Decimal to bytes

2 participants