-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a readr-style "problems" report on parsing failures #221
Comments
I'd say this is all working as documented. This is how googlesheets4, readxl, and readr have always worked: there's a (It's possible that round tripping empty strings should be easier? But that's a separate matter.) library(googlesheets4)
library(googledrive)
library(tidyverse)
# hidden auth chunk here tbl <- tibble(i = 1:2000, str = c(rep("", 1999), "blah"))
ss <- gs4_create(sheets = tbl)
#> ✓ Creating new Sheet: 'compulsory-fanworms'
# saw some weird errors, so let's slow things down
Sys.sleep(2)
dat <- read_sheet(ss, "tbl")
#> ✓ Reading from 'compulsory-fanworms'
#> ✓ Range ''tbl''
dat
#> # A tibble: 2,000 x 2
#> i str
#> <dbl> <lgl>
#> 1 1 NA
#> 2 2 NA
#> 3 3 NA
#> 4 4 NA
#> 5 5 NA
#> 6 6 NA
#> 7 7 NA
#> 8 8 NA
#> 9 9 NA
#> 10 10 NA
#> # … with 1,990 more rows
tail(dat)
#> # A tibble: 6 x 2
#> i str
#> <dbl> <lgl>
#> 1 1995 NA
#> 2 1996 NA
#> 3 1997 NA
#> 4 1998 NA
#> 5 1999 NA
#> 6 2000 NA The
Its default is 1000 (or
This is also how readxl and readr work. dat <- read_sheet(ss, "tbl", guess_max = Inf)
#> ✓ Reading from 'compulsory-fanworms'
#> ✓ Range ''tbl''
dat
#> # A tibble: 2,000 x 2
#> i str
#> <dbl> <chr>
#> 1 1 <NA>
#> 2 2 <NA>
#> 3 3 <NA>
#> 4 4 <NA>
#> 5 5 <NA>
#> 6 6 <NA>
#> 7 7 <NA>
#> 8 8 <NA>
#> 9 9 <NA>
#> 10 10 <NA>
#> # … with 1,990 more rows
tail(dat)
#> # A tibble: 6 x 2
#> i str
#> <dbl> <chr>
#> 1 1995 <NA>
#> 2 1996 <NA>
#> 3 1997 <NA>
#> 4 1998 <NA>
#> 5 1999 <NA>
#> 6 2000 blah
is.character(dat$str)
#> [1] TRUE An absence of cell data is brought in as dat <- dat %>%
replace_na(list(str = ""))
identical(tbl$str, dat$str)
#> [1] TRUE
drive_rm(ss)
#> File deleted:
#> • 'compulsory-fanworms' <id: 1Mu0JrRh0ZASocbL0SosFeCYvaC-4_ECCoRmkkixJfTU> Created on 2021-07-11 by the reprex package (v2.0.0.9000) |
That totally makes sense. The thing that made this difficult for me to puzzle out was the silence of this potentially destructive behavior. I wasn't aware of Maybe there's no easy way to tell when this occurs, but something like, "First value non-blank value in col |
In readr this would show up in the "problems" report. I'll reopen this. readr::read_csv("x,y\na,\nc,d", guess_max = 1)
#> Warning: 1 parsing failure.
#> row col expected actual file
#> 2 y 1/0/T/F/TRUE/FALSE d literal data
#> # A tibble: 2 x 2
#> x y
#> <chr> <lgl>
#> 1 a NA
#> 2 c NA Created on 2021-07-12 by the reprex package (v2.0.0.9000) |
Just as an FYI for all of these packages, I think it's more common to do |
This will presumably get bundled up with the Great Col Spec Project that is coming soon for me (googlesheets4 & readxl) #51. |
On read, a column is assumed to be logical after some (maybe 1,000?) rows of missing data. If a character is at the bottom, then this silently fails with no error.
The text was updated successfully, but these errors were encountered: