-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
324 GitHub discussions with graph ql api #332
base: master
Are you sure you want to change the base?
Conversation
Added get functions for mbox, jira, and github.
Functions that require something from a list (ie. conf[["version_control"]][["branch"]][4]) now have an index parameter. (ie. conf[["version_control"]][["branch"]][branch_index]) Also finished the rest of the get methods for all notebooks.
I normalized the comments by having it follow a format. I added if wrappers to catch null values.
I changed the documentation of the JIRA functions to return a list of project keys and storage paths (HADOOP case). I also changed the documentation of the mbox function to return a list of storage paths.
Confirming setup for committing and PR reflection.
Confirming setup for committing and PR reflection.
In each notebook, the working directory is the vignettes/ directory, however, the getwd() command states that the working directory is the kaiaulu/ folder itself. This is because each R code chunk considers its directory to be the working directory. To test the functionality of the get functions, I edited the config file to reflect the relative directory of vignettes/, thus parent directory ../ file path changes were added to each function.
The source("../R/config.R") reference was previously used to import the get() functions from config.R, however I wasn't building the project correctly. Now, this incorrect reference can be removed.
Due to a minor misunderstanding on my part, I added in relative paths in the config.R file from the notebook's directory, this was due to the IDE RStudio changing the working directory to the current running notebook directory rather than staying at the project directory.
- Function titles, descriptions, and parameter documentation were reviewed and redefined. - Removed inefficient hardcoding of tools.yml and duplicated "config" path code. - Appropriate warning messages were added to each function in the case of the return value possessing the NULL value. - Notebook blamed_line_types_showcase.Rmd was updated to reflect follow the redefined R/config.R get functions.
- config.R was modified to possess descriptive get function titles and descriptions as well as a separation of getter functions into groups for clarity. - Some appropriate notebooks (every notebook in vignettes/ that doesn't have an "_" prefix) were decoupled to use the get() functions from config.R in lieu of hard-coded paths to the configuration files.
- Updated mailing_list for conf/ files - Updated issue_tracker for conf/ files
…/sailuh/kaiaulu into 230-abstract-config-file-parsing
- Refactored most of the notebooks to replace the hard coded paths with the getter functions in R/config.R - Updated a few of the configuration files (.yml extension) in conf/ fixing some syntax and indentation errors - Added another getter function in R/config.r called get_github_issue_event_path - Edited DESCRIPTION to incorporate myself as a contributor - Edited NEWS.md to describe refactoring getter function feature.
- config_file parameter description was incorrect, a reference to how it is obtained was added to its description. - renamed get_parsed() to parse_config() and changed the appropriate getter calls in all of the notebooks. - reverted RoxygenNote version in DESCRIPTION
- config.R functions are organized to follow the same formatting as the conf files - All conf files issue_trackers have been updated.
- get_mbox_mailing_list() and get_mbox_archive_type() were removed as the keys that they are grabbing are scheduled for removal.
- added two new bugzilla getter functions get_bugzilla_issue_comment_path and get_bugzilla_issue_path
- Implemented two pipermail mailing list getters get_pipermail_domain and get_pipermail_path
- Implemented two more getter functions: get_mbox_input_file() and get_pipermail_input_file()
- Added Scitools understand section to tools - Updated mailing_list
- Implemented 4 Getter functions: get_understand_code_language(), get_understand_keep_dependencies_type(), get_understand_project_path(), and get_understand_output_path(). - Changed get_code_language() and get_keep_dependencies_type() to get_depends_code_language() and get_depends_keep_dependencies_type(), respectively as well as updated the appropriate notebooks that used these getters.
- Updated NEWS.md - Minor change to mailing_list
- The project configuration sections of each notebook was incorrectly using the project directory (kaiaulu/) as their working directory rather than the directory that they reside in (/vignettes/) as their working directory. - Uncommented the output_filepath line for pattern4 tool in junit5.yml configuration file. - Uncommented the mod_mbox mailing list section for mailing_list in openssl.yml configuration file.
For github.R: - github_api_discussions - github_parse_discussions - github_parse_discussion_comments For config.R: - get_github_discussions_path For conf/kaiaulu.yml: - Added new discussion field in github issue_tracker for a save filepath for discussions JSON file For github_api_showcase.Rmd: - Demonstrating usage for the newly added functions.
Changed the function to paginate as needed to download all available entries.
- Added refresher function for discussions - Non-fixed warning, refresher function could cause error if JSON file is improperly named.
- Added `create_file_directory` function - Added `create_file_path` helper function
- Added @export to functions - Added some more detail to function description - Added discussion filepath under github issue tracker
- Moved the Download Discussions function and explanations from `vignettes/github_api_showcase.Rmd` to the `vignettes/download_github_comments.Rmd` notebook
…' into 324-github-discussions-with-GraphQL-API
- Updated filepaths to unify formatting - Added discussion save filepath to github issue tracker
- Updated `vignettes/download_github_comments.Rmd` notebook - Added documentation for refresh function - Set create_file_directory function verbose param to FALSE as default
@crepesAlot You mentioned a few times something was not working on this PR but I don't remember ever going in detail with you. Do you want to share what is it that you found was not working to see if I can help? |
@carlosparadis Yes, I'm having difficulties getting the refresher to work properly.
The error I get is
The problem seems to stem from the way I check the folder the downloaded JSON files should be in.
However, if I delete the JSON files in the
I believe I need to somehow make the function explicitly check for JSON files, rather than if the folder is empty.
However this also causes some problems as it adds an extra slash to the folder_path like so:
I know this isn't caused by an incorrect
Explicitly checking for the JSON files solves the issue of the refresh not running the regular downloader if the folder is empty, but throws the same error if the folder is not empty, (thus necessitating it to check if the downloaded files are up to date) |
@crepesAlot have you checked how the JIRA downloader handles this? It also downloads json files. You may even be able to reuse the entire code logic from there. |
@carlosparadis I found out the solution was simple all along. I simply needed to remove the all.files=TRUE from the list.files. |
Just so I am clear, when you say it does not update: Are you trying to edit the existing file? You don't need to edit the existing file, it is ok the new changes are added as a new file, provided the timestamp of the file is correct. The files do not need to be precisely per month. I do not believe JIRA nor GitHub obey that either. Dao's is per month only because mbox files are downloaded per month only. |
@carlosparadis No, I am testing with a repository that the downloader creates 2 new JSON files, then deleting 1 of the JSON files with the more recent discussions. I then try to run the refresher to see if it will pick up that there are more recent discussion than the remaining JSON file in the folder. I strongly believe this is because I structured the query in the refresh incorrectly.
I had thought that the query would work by providing it the date at the "after" field, but that doesn't seem to work. |
@crepesAlot You should not have delete file logic! This will overly complicate things. Just have it so the new file starts of where the last file began, which information can be obtained from the file name. Trying to edit an existing JSON to add more data is the road to madness! I'm still trying to understand the cursor part. Are you saying we can't query discussion starting from a created at date? So long that is possible, then you can just create the new file. Please remove the delete file logic, it will be hard to maintain with file deletion, which is why GitHub, JIRA and Bugzilla just add new files. |
@carlosparadis The function doesn't delete or modify any JSON files. The downloader acts exactly as you say, it downloads 1 page, saves it as a JSON, then goes to page 2 and saves that as a new, separate JSON file. The act of manually deleting a file (me going into my Finder and deleting it) was my attempt to forcibly try to see if it could check if the dates of the JSON file were not the latest discussion, then download the discussions from where the remaining JSON left off. Which is what didn't work. We cannot directly form a query using a createdAt date. |
- Reconfigured how `create_file_directory` function obtains paths from config file
No description provided.