-
-
Notifications
You must be signed in to change notification settings - Fork 261
Description
Oj is blindly serialising strings with invalid unicode sequences in them that I expected it would reject, when using Rails mode and setting ActiveSupport::JSON::Encoding.escape_html_entities_in_json = false. I'm happy for literal <, &, etc characters to appear in my JSON, but I do not want to emit JSON with invalid UTF-8 bytes!
Oj correctly mimics the JSON gem's default behaviour of throwing an exception here:
irb(main):001> require 'json'
=> true
irb(main):002> broken_string = "very\xAEbroken"
=> "very\xAEbroken"
irb(main):003> JSON.dump broken_string
/Users/ktsanaktsidis/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/json-2.7.1/lib/json/common.rb:305:in `generate': source sequence is illegal/malformed utf-8 (JSON::GeneratorError)
irb(main):004> require 'oj'
=> true
irb(main):005> Oj.mimic_JSON
=> JSON
irb(main):006> JSON.dump broken_string
(irb):6:in `dump': Invalid Unicode [ae 62 72 6f 6b] at 4 (JSON::GeneratorError)
So this is good. However, it's not correctly mimicking ActiveSupports' behaviour:
irb(main):001> require 'active_support/all'
=> true
irb(main):002> broken_string = "very\xAEbroken"
=> "very\xAEbroken"
irb(main):003> ActiveSupport::JSON::Encoding.escape_html_entities_in_json = true
=> true
irb(main):004> broken_string.to_json
/Users/ktsanaktsidis/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/json-2.7.1/lib/json/common.rb:305:in `generate': source sequence is illegal/malformed utf-8 (JSON::GeneratorError)
irb(main):005> ActiveSupport::JSON::Encoding.escape_html_entities_in_json = false
=> false
irb(main):006> broken_string.to_json
/Users/ktsanaktsidis/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/json-2.7.1/lib/json/common.rb:305:in `generate': source sequence is illegal/malformed utf-8 (JSON::GeneratorError)
irb(main):007> require 'oj'
=> true
irb(main):010> Oj.optimize_rails
=> nil
irb(main):011> ActiveSupport::JSON::Encoding.escape_html_entities_in_json = true
=> true
irb(main):012> broken_string.to_json
/Users/ktsanaktsidis/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/activesupport-7.1.2/lib/active_support/json/encoding.rb:23:in `encode': Invalid Unicode [ae 62 72 6f 6b] at 4 (JSON::GeneratorError)
irb(main):013> ActiveSupport::JSON::Encoding.escape_html_entities_in_json = false
=> false
irb(main):014> broken_string.to_json
=> "\"very\xAEbroken\""
Oops! that last example should raise JSON::GeneratorError to be consistent with the rest, I think.
The reason for this seems to be that this check for unicode correctness only seems to be being done for RailsXEsc mode, but not RailsEsc mode:
Line 825 in 0032fbb
| if ((JXEsc == out->opts->escape_mode || RailsXEsc == out->opts->escape_mode) && check_start <= str) { |
Is this an accurate diagnosis? If you a) agree that this is a bug, and b) agree that that line is the right place to fix it, I'm happy to send a PR.
Thank you for your excellent work on Oj!