Skip to content

CSV escape character is documented wrong #4947

@gwenya

Description

@gwenya

Affected pages

https://www.php.net/manual/en/function.str-getcsv.php
https://www.php.net/manual/en/function.fgetcsv.php
https://www.php.net/manual/en/splfileobject.fgetcsv.php

Issue description

In str_getcsv, fgetcsv and SplFileObject::fgetcsv the escape character is documented wrong. Specifically, these pages contain the following note:

Note: Usually an enclosure character is escaped inside a field by doubling it; however, the escape character can be used as an alternative. So for the default parameter values "" and " have the same meaning. Other than allowing to escape the enclosure character the escape character has no special meaning; it isn't even meant to escape itself.

However, these functions turn "" into " while \" remains \". Therefore "" and \" do not have the same meaning.

The Note should be changed to reflect this difference.

Steps to reproduce

Run the following PHP code and observe the output:

$a = '"foo""bar"';
$b = '"foo\"bar"';
$aCsv = str_getcsv($a);
$bCsv = str_getcsv($b);

var_dump($aCsv);
var_dump($bCsv);
var_dump($aCsv == $bCsv);

Output on PHP 8.4:

Deprecated: str_getcsv(): the $escape parameter must be provided as its default value will change in php shell code on line 3

Deprecated: str_getcsv(): the $escape parameter must be provided as its default value will change in php shell code on line 4
array(1) {
  [0]=>
  string(7) "foo"bar"
}
array(1) {
  [0]=>
  string(8) "foo\"bar"
}
bool(false)

Output on PHP 8.0:

array(1) {
  [0]=>
  string(7) "foo"bar"
}
array(1) {
  [0]=>
  string(8) "foo\"bar"
}
bool(false)

Output on PHP 7.4:

array(1) {
  [0]=>
  string(7) "foo"bar"
}
array(1) {
  [0]=>
  string(8) "foo\"bar"
}
bool(false)

Suggested fix

Because of how confusing this behavior is I think a warning rather than a note is appropriate, here's my suggestion:

Warning: Inside an enclosure, the enclosure character can always be escaped by doubling it, resulting in a single enclosure character in the parsed result. The escape character works differently: If it is followed by an enclosure character then that enclosure character will not be treated as one, however the escape character itself remains. So for the default parameters, "" inside an enclosure will be parsed into ", while \" inside an enclosure will be parsed into \".

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDocumentation contains incorrect information

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions