-
-
Notifications
You must be signed in to change notification settings - Fork 294
Description
I'd like to propose an alternative to #53.
It is supposed in #53 that “invalid” characters (i.e. characters that cannot be encoded with the given encoding) should be dealt with individually. Sometimes, however, it becomes more useful to deal with the whole susbstrings of such characters. For such cases I propose an idea of a method that would split any given string into an array of encodable and non-encodable substrings following each other.
Example:
var iconvLite = require('iconv-lite');
console.log(
iconvLite.split('Хлѣбъ です。', 'cp866')
); // output: ['Хл', 'ѣ', 'бъ ', 'です。']The above suggested method is inspired by a behaviour of String.prototype.split when it is given a regular expression enclosed in a single set of capturing parentheses:
console.log(
'foo-bar'.split(/(-+)/)
); // output: [ 'foo', '-', 'bar' ]
console.log(
'--foo-bar'.split(/(-+)/)
); // output: [ '', '--', 'foo', '-', 'bar' ]The proposed method should remind its users of String.prototype.split (hence the name .split) and thus be understood by analogy.
To make a complete similarity, it should also behave similarly, i.e. the even array indices (0, 2, 4…) should always correspond to encodable substrings while the odd array indices (1, 3, 5…) should always correspond to non-encodable substring. (To achieve that, the first substring in the returned array could sometimes be intentionally left blank, like String.prototype.split does it in the [ '', '--', 'foo', '-', 'bar' ] example above, to preserve the meaning of odd and even indices.)