Skip to content

add a method to split a string into an array of encodable and non-encodable substrings #73

@Mithgol

Description

@Mithgol

I'd like to propose an alternative to #53.

It is supposed in #53 that “invalid” characters (i.e. characters that cannot be encoded with the given encoding) should be dealt with individually. Sometimes, however, it becomes more useful to deal with the whole susbstrings of such characters. For such cases I propose an idea of a method that would split any given string into an array of encodable and non-encodable substrings following each other.

Example:

var iconvLite = require('iconv-lite');
console.log(
   iconvLite.split('Хлѣбъ です。', 'cp866')
); // output: ['Хл', 'ѣ', 'бъ ', 'です。']

The above suggested method is inspired by a behaviour of String.prototype.split when it is given a regular expression enclosed in a single set of capturing parentheses:

console.log(
   'foo-bar'.split(/(-+)/)
); // output: [ 'foo', '-', 'bar' ]
console.log(
   '--foo-bar'.split(/(-+)/)
); // output: [ '', '--', 'foo', '-', 'bar' ]

The proposed method should remind its users of String.prototype.split (hence the name .split) and thus be understood by analogy.

To make a complete similarity, it should also behave similarly, i.e. the even array indices (0, 2, 4…) should always correspond to encodable substrings while the odd array indices (1, 3, 5…) should always correspond to non-encodable substring. (To achieve that, the first substring in the returned array could sometimes be intentionally left blank, like String.prototype.split does it in the [ '', '--', 'foo', '-', 'bar' ] example above, to preserve the meaning of odd and even indices.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions