JSON Parser Fails to Handle UTF-16 Surrogate Pairs (Emojis)

## Description

The JSON string parser currently fails when encountering UTF-16 surrogate pairs that represent Unicode characters outside the Basic Multilingual Plane (such as emojis). The parser only handles the first component of the surrogate pair and attempts to convert it directly to UTF-8, which results in an error since lone surrogates are invalid.

## Expected Behavior

The parser should correctly handle UTF-16 surrogate pairs like `\ud83d\udc51` (👑 crown emoji) by:
1. Detecting high surrogates (0xD800-0xDBFF)
2. Reading the corresponding low surrogate (0xDC00-0xDFFF)
3. Converting the pair to the proper Unicode code point
4. Successfully parsing the character

## Current Behavior

The parser fails with an error when it encounters the high surrogate `\ud83d` because it tries to convert it directly to a `char`, which is invalid in UTF-8.

## Minimal Example

```rust
// Test function
fn parse_json_string(input: &str) -> Result<String, LexerError> {
    let mut lexer = Lexer::new(input, ParserLanguage::Json);
    let mut r = String::new();
    while !lexer.eof() {
        r.push(
             lexer
                .next_json_char_value()?
        );
    }
    Ok(r)
}

// This should parse successfully but currently fails
let json_string = r#""\ud83d\udc51""#;  // Crown emoji: 👑
let result = parser.parse_string(json_string);

// Currently: Err(LexerError::IncorrectUnicodeChar)
// Expected: Ok("👑")
```

### Test Cases

```rust
// These should all work:
assert_eq!(parse_json_string(r#""\ud83d\ude00""#), "😀");  // Grinning face
assert_eq!(parse_json_string(r#""\ud83d\udc36""#), "🐶");  // Dog face  
assert_eq!(parse_json_string(r#""\ud83d\udc51""#), "👑");  // Crown

// These should still fail (lone surrogates):
let str = parse_json_string(r#""\ud83d""#); // Lone high surrogate
assert!(str.is_err());
assert!(matches!(str.unwrap_err(), LexerError::ExpectedLowSurrogate));

let str = parse_json_string(r#""\ud83d\""#); // Lone high surrogate with incomplete low surrogate
assert!(str.is_err());
assert!(matches!(str.unwrap_err(), LexerError::ExpectedLowSurrogate));

let str = parse_json_string(r#""\ud83d\a""#); // Lone high surrogate with invalid low surrogate
assert!(str.is_err());
assert!(matches!(str.unwrap_err(), LexerError::ExpectedLowSurrogate));

let str = parse_json_string(r#""\udc51""#); // Lone low surrogate
assert!(str.is_err());
assert!(matches!(str.unwrap_err(), LexerError::ExpectedHighSurrogate));

let str = parse_json_string(r#""\ud83d\u0181""#); // Invalid low surrogate
assert!(str.is_err());
assert!(matches!(str.unwrap_err(), LexerError::InvalidLowSurrogate));
```

## Root Cause

In the `next_json_char_value` function, the `'u'` branch only handles single 16-bit Unicode escapes:

```rust
'u' => {
    let mut v = 0;
    for _ in 0..4 {
        let digit = self.next_hex_digit()?;
        v = v * 16 + digit;
    }
    Self::char_try_from(v)  // ← Fails for surrogates
}
```

## Suggested Fix

The fix involves detecting surrogate pairs and handling them properly:

1. Check if the parsed value is a high surrogate (0xD800-0xDBFF)
2. If so, read the next `\uXXXX` sequence as the low surrogate
3. Validate the low surrogate is in range (0xDC00-0xDFFF)
4. Convert the pair using: `0x10000 + ((high & 0x3FF) << 10) + (low & 0x3FF)`

## Environment

- **Rust version**: 1.86.0
- **Library version**: 3.7.2
- **Platform**: Linux

## Additional Context

This issue affects any JSON containing emoji or other Unicode characters outside the Basic Multilingual Plane that are encoded as UTF-16 surrogate pairs. This is common in JSON data from web APIs, especially social media platforms.

The JSON specification (RFC 7159) supports `\uXXXX` escape sequences, and many JSON generators (including JavaScript's `JSON.stringify`) will encode emojis as surrogate pairs when targeting ASCII-safe output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Parser Fails to Handle UTF-16 Surrogate Pairs (Emojis) #768

Description

Expected Behavior

Current Behavior

Minimal Example

Test Cases

Root Cause

Suggested Fix

Environment

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

JSON Parser Fails to Handle UTF-16 Surrogate Pairs (Emojis) #768

Description

Description

Expected Behavior

Current Behavior

Minimal Example

Test Cases

Root Cause

Suggested Fix

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions