Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cope with missing columns converting from json to df #3

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

DASpringate
Copy link

Hi,

Thanks for writing this package, it is really useful!

I was using it and found that in some cases where not all of the fields are present in every row of the data, the converter chokes as not all of the rows are the same length. e.g. try this:

url <- "http://data.undp.org/resource/wxub-qc5k.json"
read.socrata(url)

To deal with this I added a function content_to_df, which is called from getContentAsDataFrame and looks like:

content_to_df <- function(con){
    lengths <- sapply(con, length)
    if(all(lengths == length(con[[1]]))){
        data.frame(t(sapply(con, unlist)), stringsAsFactors = FALSE)
    } else {
        all_cols <- names(con[[which(sapply(con, length) == max(sapply(con, length)))[1]]])
        con <- lapply(con, function(x){
            r <- c(x, sapply(all_cols[!all_cols %in% names(x)], function(xx) NA, simplify = FALSE))
            r[all_cols]
        })
        data.frame(t(sapply(con, unlist)), stringsAsFactors = FALSE)
    }
}

If the rows are all the same length, it converts as before, but if they are not, it assumes that the first longest length row has all of the correct column names in it and adds empty fields to any rows that do not have all of them in, then it sorts the columns back to the original order and returns a dataframe from this as before.

I added a test for this condition to the test file as well.

I also changed the json parser to jsonlite (I think you have already done this in the CRAN version).

If you can bring these changes into the CRAN version as well I would be really grateful as I want to call it in in a package I am developing now.

Cheers,

David

@DASpringate
Copy link
Author

Updated again, I left the imports out from the last one!

Also, I found a bug in the code for getResponse():

In some cases, if there was a non-200 response, there was not a response$headers$content-length element in the response, resulting in an error. I changed this by checking the length of the response$content slot instead.

I also added an option to not produce an error if there is a non-200 response. The default is exactly as it is now so it doesn't affect anything in the package as it stands, but I find it useful to have the option in my package https://github.com/rOpenHealth/rUN_development_reports/blob/master/R/rUNdp.R#L95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants