Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #72 by adding an optional argument na.convert #360

Merged
merged 6 commits into from
Jun 29, 2022

Conversation

deschen1
Copy link
Contributor

No description provided.

Copy link
Collaborator

@JanMarvin JanMarvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @deschen1 , thanks for the PR. Could you please create a test for this, one that uses the option and one that doesn't. And what about the other change I added in my branch, do we need this one too?

@deschen1
Copy link
Contributor Author

Re the test - if you are referring to testthat, that's also something I'm not really familiar with. However, the main problem is that it can't be automated through testthat.

It appears that the behaviour of changing blanks to NAs only happens after the Excel file has been manually opened, saved and closed in Excel. As long as I just save and load an Excel file with openxlsx and don't interact with it (in the sense of not saving it in Excel), blanks are not converted. ONce I save in Excel, they are converted. However, I created a small code chunk that at least does the file creation and mentions when you need to save the file manually. Hope this helps.

#Assuming we are in the package programming environment
devtools::load_all()

test_data <- data.frame(x = c("1", "", "b"),
                        y = c(1, 2, 3))

wb <- createWorkbook()
addWorksheet(wb, "load test")
writeData(wb, "load test", test_data)
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)

#
# Now go to the file, open in Excel, save it and close it, then go back to R
# 

wb_new <- loadWorkbook("test.xlsx", na.convert = TRUE)
saveWorkbook(wb_new, "test2.xlsx", overwrite = TRUE) # test2.xlsx now has NAs

wb_new <- loadWorkbook("test.xlsx", na.convert = FALSE)
saveWorkbook(wb_new, "test3.xlsx", overwrite = TRUE) # test3.xlsx doesn't have NAs

@deschen1
Copy link
Contributor Author

Re the change you did to the read_workbook.cpp change. I didn't touch this, because:

a) I wanted to see if it already works without the fix (which it does)
b) I'm not familiar with C++, so didn't want to mess around there. Especially since I'm not sure if the respective code part should also get an optional argument for changing the behaviour there.

If you say this can just be changed as suggested by you, I'm happy to add it to the pull request. Otherwise I'd feel more comfortable if someone else would do the change.

@JanMarvin
Copy link
Collaborator

Re the test - if you are referring to testthat, that's also something I'm not really familiar with. However, the main problem is that it can't be automated through testthat.

It appears that the behaviour of changing blanks to NAs only happens after the Excel file has been manually opened, saved and closed in Excel. As long as I just save and load an Excel file with openxlsx and don't interact with it (in the sense of not saving it in Excel), blanks are not converted. ONce I save in Excel, they are converted. However, I created a small code chunk that at least does the file creation and mentions when you need to save the file manually. Hope this helps.

#Assuming we are in the package programming environment
devtools::load_all()

test_data <- data.frame(x = c("1", "", "b"),
                        y = c(1, 2, 3))

wb <- createWorkbook()
addWorksheet(wb, "load test")
writeData(wb, "load test", test_data)
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)

#
# Now go to the file, open in Excel, save it and close it, then go back to R
# 

wb_new <- loadWorkbook("test.xlsx", na.convert = TRUE)
saveWorkbook(wb_new, "test2.xlsx", overwrite = TRUE) # test2.xlsx now has NAs

wb_new <- loadWorkbook("test.xlsx", na.convert = FALSE)
saveWorkbook(wb_new, "test3.xlsx", overwrite = TRUE) # test3.xlsx doesn't have NAs

Alright. You could simply use the files from #72. Your change should create any measurable effect. Either a certain value is missing or NA. I can add one, but for that you have to wait until next week.

@JanMarvin
Copy link
Collaborator

Re the change you did to the read_workbook.cpp change. I didn't touch this, because:

a) I wanted to see if it already works without the fix (which it does) b) I'm not familiar with C++, so didn't want to mess around there. Especially since I'm not sure if the respective code part should also get an optional argument for changing the behaviour there.

If you say this can just be changed as suggested by you, I'm happy to add it to the pull request. Otherwise I'd feel more comfortable if someone else would do the change.

No worries, we've all started with a small change. We can skip the c++ part in this PR, I do not recall what I was looking for at the time. It's been a long time since that initial draft.

@deschen1
Copy link
Contributor Author

@JanMarvin Sorry for nudging you, but wanted to check if you had a chance to look into the commit and are fine with updating the package?

@JanMarvin
Copy link
Collaborator

High, thanks for the reminder. I'd still like to see a test for this.

@deschen1
Copy link
Contributor Author

Fair enough. I just remember from the discussion above that you mentioned you want to add one. Just want to prevent misunderstanding (in case you are waiting for me to add one), because I though you'd add the test. 😬 😅

@JanMarvin
Copy link
Collaborator

Ah well, if you have the time it would be a massive help

@deschen1
Copy link
Contributor Author

OK, you still see me confused, because my initial statement stays true, i.e. I can't really add a reproducible example in a sense that it can be run automatically through test_that (beyond the fact that I'm really not sure how to set this up).

The problem is that the NAs only appear after manually (!) opening and saving the Excel file. As long as I don't do anything with the Excel file (after creating it with openxlsx) I don't get any NAs added. Once I open and save the Excel and then do loadWorkbook/saveWorkbook, I do get NAs.

That's why I think the only helpful contribution I can make is this one (where a human interaction for opening/saving is required): #360 (comment)

@JanMarvin
Copy link
Collaborator

okay, I'll look at it. we can make tests with some external xlsx file. writing tests is not exactly hard, but still takes time and time is short :)

@deschen1
Copy link
Contributor Author

Thanks. Much appreciated!

@JanMarvin
Copy link
Collaborator

I've added the test. Most likely the other part of my fix was for the output to R.

@JanMarvin JanMarvin changed the base branch from master to development June 29, 2022 18:43
@JanMarvin JanMarvin merged commit 547640d into ycphs:development Jun 29, 2022
@JanMarvin
Copy link
Collaborator

Thanks @deschen1 , I've merged this PR with the development branch!
It should be in the next release, whenever that will be :)

@deschen1
Copy link
Contributor Author

Thanks @JanMarvin . Just curious what's your release policy, e.g. do you have a minimum number of important changes you want to see in a new version or would such a standalone fix also qualify for a new release? You are probably guessing right the intention of this question that I of course would be very happy to see an updated version soon. 😬

@JanMarvin
Copy link
Collaborator

There is no release policy. Someone should test the development branch (e.g., #362) and poke @ycphs to craft a release.

@deschen1
Copy link
Contributor Author

deschen1 commented Jul 6, 2022

@JanMarvin I had a look at the development branch the the RMD check returns with an error:

Error: 'openxlsxFontSizeLookupTable' is not an exported object from 'namespace:openxlsx'. So that might be something you want to look at before updating the master.

Also noticing that the abovementioned issue 362 is not mentioned in the news file (which is what I used to check what's new in the dev branch).

@deschen1 deschen1 deleted the na_handling_loadWorkbook branch July 6, 2022 08:48
@JanMarvin
Copy link
Collaborator

Thanks @deschen1 , every bit of testing helps. We all have a lot to do and user feedback is somewhat scarce for pre-release sources.

@JanMarvin
Copy link
Collaborator

FYI: fixed the openxlsxFontSizeLookupTable namespace error. In #363 .If CI gives us a greenlight, I'll merge it to development. Would be nice if you could have another look (e.g. test dates) afterwards.

@deschen1
Copy link
Contributor Author

deschen1 commented Jul 7, 2022

Sure, can do once it's merged.

@AbhaySThakur
Copy link

Hi guys, any idea when this one gets released, I am facing the exact issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants