Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch UnequalLengths #134

Closed
HarHarLinks opened this issue Jul 3, 2022 · 9 comments · Fixed by #176
Closed

Catch UnequalLengths #134

HarHarLinks opened this issue Jul 3, 2022 · 9 comments · Fixed by #176
Labels
bug Something isn't working

Comments

@HarHarLinks
Copy link
Contributor

HarHarLinks commented Jul 3, 2022

I've come across this csv:

$ curl -s https://opendata.dwd.de/weather/local_forecasts/swsmos/swsmos_LATEST_opendata.csv.bz2 | bzcat | head
ID;Lat;Lon;YYYYMMDDHHmm;TL;TLSTA;RRL1c;RRS1c;RR6;WWL6;WWS3;RRS3c;R650;RC;TS;TD
202207031100
A006;54.88920;8.90870;202207031200;22.8;0.8;0.0;0.0;0.0;6.0;0.0;0.0;0.0;1;45.27;18.8
A006;54.88920;8.90870;202207031300;23.3;2.1;0.0;0.0;0.0;4.0;0.0;0.0;0.0;1;44.33;18.0
A006;54.88920;8.90870;202207031400;23.1;3.0;0.0;0.0;0.0;8.0;0.0;0.0;0.0;1;42.62;17.5
[...]

tv does not like it due to the second line being just an ISO-ish date string with missing data:

$ curl -s https://opendata.dwd.de/weather/local_forecasts/swsmos/swsmos_LATEST_opendata.csv.bz2 | bzcat | head | tidy-viewer -s ';'
thread 'main' panicked at 'a csv record: Error(UnequalLengths { pos: Some(Position { byte: 79, line: 2, record: 1 }), expected_len: 16, len: 1 })', src/main.rs:354:20

While the csv is clearly at fault, I expect this isn't all that unusual. I would like tv to be able to

  • at a minimum, have an option to just ignore (skip) faulty lines and continue
  • better, note the error in the line and leave it unformatted or similar, potentially highlighting it in a way? e.g. make the line red with a ⚠ symbol.

For now, I've added | awk 'NR != 2' into my pipe to skip the 2nd line explicitly.

@alexhallam
Copy link
Owner

Thanks for the great issue. I am able to reproduce this with:

curl -s https://opendata.dwd.de/weather/local_forecasts/swsmos/swsmos_LATEST_opendata.csv.bz2 | bzcat | head | awk 'NR != 2'| tidy-viewer -s ';'

Ill formatted csvs have been on the ticket for a while. It is probably time to tackle the problem.

@alexhallam
Copy link
Owner

Just giving a little update.

This is where I will start the error handling

tv/src/main.rs

Line 367 in e8beee0

.map(|x| x.expect("a csv record"))

There will likely be additional formatting needed for broken lines, but that is the start of it.

RepEx:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=35ea71ae858972a0f7a2bcb9283cb54f

@alexhallam
Copy link
Owner

Also wanted to keep this #79 in the loop for this fix.

@alexhallam
Copy link
Owner

alexhallam commented Jul 10, 2022

Also, I could possibly resurrect #124

@alexhallam
Copy link
Owner

And maybe #91

@alexhallam
Copy link
Owner

@alexhallam
Copy link
Owner

and #137

@alexhallam alexhallam added the bug Something isn't working label Jul 21, 2022
@alexhallam
Copy link
Owner

@alexhallam
Copy link
Owner

Here I put the error in as a vector. maybe I can handle strings that look like errors in the printout

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f7bb8d643148b903f99260d40c244f05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants