Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values between "a quoted and escaped quote" and "a quoted value, that starts with the delimiter" are skipped #508

Open
kasgilpofi opened this issue Jul 8, 2022 · 0 comments

Comments

@kasgilpofi
Copy link

Version of Univocity

<dependency>
	<groupId>com.univocity</groupId>
	<artifactId>univocity-parsers</artifactId>
	<version>2.9.1</version>
</dependency>

Problem

Parsing a valid (Rfc 4180) csv file, which contains "a quoted and escaped quote" ("""") and "a quoted value, that starts with the delimiter" (e.g. ";abc").

Using
-selectFields
-NormalizeLineEndingsWithinQuotes=false

the values between "a quoted and escaped quote" and "a quoted value, that starts with the delimiter" are skipped.

The problem does not occur with NormalizeLineEndingsWithinQuotes=true.

The problem appears to be caused by

AbstractCharInputReader.skipQuotedString(char quote, char escape, char stop1, char stop2)

which doesn't seem to properly handle "quoted and escaped quotes"

CSV-Data

A line, that contains a single quote (quoted and escaped with quote).
The next quoted value starts with the delimiter.

e.g.

1;"""";100
2;abc;101
10;";abc";200

Example

import com.univocity.parsers.common.Context;
import com.univocity.parsers.common.processor.core.Processor;
import com.univocity.parsers.csv.Csv;
import com.univocity.parsers.csv.CsvParser;
import com.univocity.parsers.csv.CsvParserSettings;
import com.univocity.parsers.csv.UnescapedQuoteHandling;

import java.io.ByteArrayInputStream;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class CsvParserTesting {

    public static void main(String[] args) {
        try {
            CsvParserSettings settings = Csv.parseRfc4180();

            settings.getFormat().setDelimiter(";");
            settings.getFormat().setLineSeparator("\n");
            settings.getFormat().setQuote('"');
            settings.getFormat().setQuoteEscape('"');
            settings.getFormat().setComment('#');

            settings.setMaxColumns(300);
            settings.setMaxCharsPerColumn(-1);
            settings.setEmptyValue("");
            settings.setNullValue("");
            settings.setIgnoreTrailingWhitespaces(true);
            settings.setIgnoreLeadingWhitespaces(true);
            settings.setReadInputOnSeparateThread(false);
            settings.setSkipEmptyLines(true);
            settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER);
            settings.setErrorContentLength(1000);
            settings.setHeaders("A", "B", "C");
            settings.selectFields("A", "C");

            settings.setNormalizeLineEndingsWithinQuotes(false);


            settings.setProcessor(new Processor<Context>() {
                @Override
                public void processStarted(Context context) {
                    System.out.println("processStarted");
                }

                @Override
                public void rowProcessed(String[] row, Context context) {
                    System.out.println(Arrays.toString(row));
                }

                @Override
                public void processEnded(Context context) {
                    System.out.println("processEnded");
                }
            });

            CsvParser csvParser = new CsvParser(settings);

            String text = "";

            text += "1;\"\"\"\";100";
            text += "\n2;abc;101";
            text += "\n10;\";abc\";200";
            ByteArrayInputStream is = new ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8));

            csvParser.parse(is);

        } catch (Throwable th) {
            th.printStackTrace();
        }
    }
}

Expected output

processStarted
[1, 100]
[2, 101]
[10, 200]
processEnded

Actual Output

processStarted
[1, abc"]
processEnded
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant