324 GitHub discussions with graph ql api #332

crepesAlot · 2024-12-08T22:23:28Z

No description provided.

Added get functions for mbox, jira, and github.

Functions that require something from a list (ie. conf[["version_control"]][["branch"]][4]) now have an index parameter. (ie. conf[["version_control"]][["branch"]][branch_index]) Also finished the rest of the get methods for all notebooks.

I normalized the comments by having it follow a format. I added if wrappers to catch null values.

I changed the documentation of the JIRA functions to return a list of project keys and storage paths (HADOOP case). I also changed the documentation of the mbox function to return a list of storage paths.

Confirming setup for committing and PR reflection.

In each notebook, the working directory is the vignettes/ directory, however, the getwd() command states that the working directory is the kaiaulu/ folder itself. This is because each R code chunk considers its directory to be the working directory. To test the functionality of the get functions, I edited the config file to reflect the relative directory of vignettes/, thus parent directory ../ file path changes were added to each function.

The source("../R/config.R") reference was previously used to import the get() functions from config.R, however I wasn't building the project correctly. Now, this incorrect reference can be removed.

Due to a minor misunderstanding on my part, I added in relative paths in the config.R file from the notebook's directory, this was due to the IDE RStudio changing the working directory to the current running notebook directory rather than staying at the project directory.

- Function titles, descriptions, and parameter documentation were reviewed and redefined. - Removed inefficient hardcoding of tools.yml and duplicated "config" path code. - Appropriate warning messages were added to each function in the case of the return value possessing the NULL value. - Notebook blamed_line_types_showcase.Rmd was updated to reflect follow the redefined R/config.R get functions.

- config.R was modified to possess descriptive get function titles and descriptions as well as a separation of getter functions into groups for clarity. - Some appropriate notebooks (every notebook in vignettes/ that doesn't have an "_" prefix) were decoupled to use the get() functions from config.R in lieu of hard-coded paths to the configuration files.

- Updated mailing_list for conf/ files - Updated issue_tracker for conf/ files

…/sailuh/kaiaulu into 230-abstract-config-file-parsing

- Refactored most of the notebooks to replace the hard coded paths with the getter functions in R/config.R - Updated a few of the configuration files (.yml extension) in conf/ fixing some syntax and indentation errors - Added another getter function in R/config.r called get_github_issue_event_path - Edited DESCRIPTION to incorporate myself as a contributor - Edited NEWS.md to describe refactoring getter function feature.

- config_file parameter description was incorrect, a reference to how it is obtained was added to its description. - renamed get_parsed() to parse_config() and changed the appropriate getter calls in all of the notebooks. - reverted RoxygenNote version in DESCRIPTION

- config.R functions are organized to follow the same formatting as the conf files - All conf files issue_trackers have been updated.

- get_mbox_mailing_list() and get_mbox_archive_type() were removed as the keys that they are grabbing are scheduled for removal.

- added two new bugzilla getter functions get_bugzilla_issue_comment_path and get_bugzilla_issue_path

- Implemented two pipermail mailing list getters get_pipermail_domain and get_pipermail_path

- Implemented two more getter functions: get_mbox_input_file() and get_pipermail_input_file()

- Added Scitools understand section to tools - Updated mailing_list

- Implemented 4 Getter functions: get_understand_code_language(), get_understand_keep_dependencies_type(), get_understand_project_path(), and get_understand_output_path(). - Changed get_code_language() and get_keep_dependencies_type() to get_depends_code_language() and get_depends_keep_dependencies_type(), respectively as well as updated the appropriate notebooks that used these getters.

- Updated NEWS.md - Minor change to mailing_list

- The project configuration sections of each notebook was incorrectly using the project directory (kaiaulu/) as their working directory rather than the directory that they reside in (/vignettes/) as their working directory. - Uncommented the output_filepath line for pattern4 tool in junit5.yml configuration file. - Uncommented the mod_mbox mailing list section for mailing_list in openssl.yml configuration file.

For github.R: - github_api_discussions - github_parse_discussions - github_parse_discussion_comments For config.R: - get_github_discussions_path For conf/kaiaulu.yml: - Added new discussion field in github issue_tracker for a save filepath for discussions JSON file For github_api_showcase.Rmd: - Demonstrating usage for the newly added functions.

Changed the function to paginate as needed to download all available entries.

- Added refresher function for discussions - Non-fixed warning, refresher function could cause error if JSON file is improperly named.

- Added `create_file_directory` function - Added `create_file_path` helper function

@export

- Added @export to functions - Added some more detail to function description - Added discussion filepath under github issue tracker

- Moved the Download Discussions function and explanations from `vignettes/github_api_showcase.Rmd` to the `vignettes/download_github_comments.Rmd` notebook

…' into 324-github-discussions-with-GraphQL-API

- Updated filepaths to unify formatting - Added discussion save filepath to github issue tracker

- Updated `vignettes/download_github_comments.Rmd` notebook - Added documentation for refresh function - Set create_file_directory function verbose param to FALSE as default

carlosparadis · 2024-12-08T22:27:01Z

@crepesAlot You mentioned a few times something was not working on this PR but I don't remember ever going in detail with you. Do you want to share what is it that you found was not working to see if I can help?

crepesAlot · 2024-12-08T22:54:55Z

@carlosparadis Yes, I'm having difficulties getting the refresher to work properly.
The main problem is this line:

  # Read the JSON file
  json_data <- fromJSON(file.path(save_folder_path, latest_discussion), simplifyVector = FALSE)

The error I get is

Error: lexical error: invalid char in json text.
                                       NA
                     (right here) ------^

The problem seems to stem from the way I check the folder the downloaded JSON files should be in.

  # If there are no json files, download all discussions
  if (length(contents) == 0 ) {
    # Run the regular downloader
    discussions <- github_api_discussions(token, owner, repo, save_folder_path)
    return (discussions)
  }

However, if I delete the JSON files in the save_folder_path (thus making it empty) it does not run the regular downloader.
running:

contents <- list.files(save_folder_path, all.files = TRUE)
print(contents)
print(length(contents)==0)
[1] "."         ".."        ".DS_Store"
[1] FALSE

I believe I need to somehow make the function explicitly check for JSON files, rather than if the folder is empty.
My attempts at this were:

json_contents <- list.files(save_folder_path, pattern= "\\.json$", full.names = TRUE, recursive= TRUE)

However this also causes some problems as it adds an extra slash to the folder_path like so:

"../../rawdata/deep-live-cam/github/hacksider_deep-live-cam/discussion//hacksider_Deep-Live-Cam_1714152212_1732289412.json"

I know this isn't caused by an incorrect save_folder_path because printing the save_folder_path returs

"../../rawdata/deep-live-cam/github/hacksider_deep-live-cam/discussion/"

Explicitly checking for the JSON files solves the issue of the refresh not running the regular downloader if the folder is empty, but throws the same error if the folder is not empty, (thus necessitating it to check if the downloaded files are up to date)

carlosparadis · 2024-12-08T23:15:58Z

@crepesAlot have you checked how the JIRA downloader handles this? It also downloads json files. You may even be able to reuse the entire code logic from there.

crepesAlot · 2024-12-08T23:56:35Z

@carlosparadis I found out the solution was simple all along. I simply needed to remove the all.files=TRUE from the list.files.
Unfortunately, I found some new errors that I need to fix as well.
When I delete one of the JSON files (to simulate the folder being out-of-date) it doesn't properly update and download the most recent discussions.
I tried to follow the jira and github refresh functions as closely as I could, but because the graphql functions off of query, the current problem with the code is with the way I am forming said query, which is something I cannot copy from the other refresh functions.

carlosparadis · 2024-12-09T00:50:15Z

Just so I am clear, when you say it does not update: Are you trying to edit the existing file? You don't need to edit the existing file, it is ok the new changes are added as a new file, provided the timestamp of the file is correct.

The files do not need to be precisely per month. I do not believe JIRA nor GitHub obey that either. Dao's is per month only because mbox files are downloaded per month only.

crepesAlot · 2024-12-09T01:15:59Z

@carlosparadis No, I am testing with a repository that the downloader creates 2 new JSON files, then deleting 1 of the JSON files with the more recent discussions. I then try to run the refresher to see if it will pick up that there are more recent discussion than the remaining JSON file in the folder.
But the refresh function doesn't detect this, and returns that there are no new discussions to download.

I strongly believe this is because I structured the query in the refresh incorrectly.
I am having some difficulties with trying to fix this due to some confusion with how the GraphQL retrieves the discussions from Github.

discussions(
  after: String,
  before: String,
  first: Int,
  last: Int,
  categoryId: ID = null,
  answered: Boolean = null,
  orderBy: DiscussionOrder = {field: UPDATED_AT, direction: DESC}
) : Discussion

I had thought that the query would work by providing it the date at the "after" field, but that doesn't seem to work.
The queries seem to rely on a cursor, a type of string retrieved from a different query.

carlosparadis · 2024-12-09T01:34:24Z

@crepesAlot You should not have delete file logic! This will overly complicate things. Just have it so the new file starts of where the last file began, which information can be obtained from the file name. Trying to edit an existing JSON to add more data is the road to madness!

I'm still trying to understand the cursor part. Are you saying we can't query discussion starting from a created at date? So long that is possible, then you can just create the new file. Please remove the delete file logic, it will be hard to maintain with file deletion, which is why GitHub, JIRA and Bugzilla just add new files.

crepesAlot · 2024-12-09T02:05:09Z

@carlosparadis The function doesn't delete or modify any JSON files. The downloader acts exactly as you say, it downloads 1 page, saves it as a JSON, then goes to page 2 and saves that as a new, separate JSON file.

The act of manually deleting a file (me going into my Finder and deleting it) was my attempt to forcibly try to see if it could check if the dates of the JSON file were not the latest discussion, then download the discussions from where the remaining JSON left off. Which is what didn't work.
I wasn't sure of another way to test it without posting random things in this repo's discussion, but I was able to find that the query in the refresh function doesn't work the way I thought it did regardless.

We cannot directly form a query using a createdAt date.
I am currently trying to see if there's still a way to implement a refresher function by the way the query orders the discussions and saves the cursor.
From my understanding the cursor acts as a sort of page number for the query. (Since the number of discussions a query can get per page can vary from 1 to 100). But it currently is up in air as to whether I can get it working or not.

- Reconfigured how `create_file_directory` function obtains paths from config file

anthonyjlau and others added 30 commits March 22, 2024 17:47

i #230 generalized get functions

287f95b

i #230 added new get functions

6c323ce

Added get functions for mbox, jira, and github.

i #230 changed functions that access lists

ffb969a

Functions that require something from a list (ie. conf[["version_control"]][["branch"]][4]) now have an index parameter. (ie. conf[["version_control"]][["branch"]][branch_index]) Also finished the rest of the get methods for all notebooks.

i #230 updated comments, added catches

a7b719f

I normalized the comments by having it follow a format. I added if wrappers to catch null values.

i #230 updated jira and mbox funcs to rtn a list

a16ab10

I changed the documentation of the JIRA functions to return a list of project keys and storage paths (HADOOP case). I also changed the documentation of the mbox function to return a list of storage paths.

i #230 added the project_key_index for Jira

f249709

i #230 added function to return GitHub keys

f6e8e31

i #288 added changes to github, mail structure

42fee34

fix github checks

b1e4001

fixes compiling error

1b1535b

i #230 added config file changes

f160003

Merge branch 'master' into 230-abstract-config-file-parsing

ba27309

Documentation edit for PR reflection test.

4fe896d

Confirming setup for committing and PR reflection.

Minor Documentation Edit for PR Reflection Test

720c028

Confirming setup for committing and PR reflection.

i #230 removed source call from modified notebook

c15223c

The source("../R/config.R") reference was previously used to import the get() functions from config.R, however I wasn't building the project correctly. Now, this incorrect reference can be removed.

i #230 Update and Refactor Conf/ Files

1219265

- Updated mailing_list for conf/ files - Updated issue_tracker for conf/ files

Merge branch '230-abstract-config-file-parsing' of https://github.com…

3202887

…/sailuh/kaiaulu into 230-abstract-config-file-parsing

i #230 Organized config.R functions

85f4830

- config.R functions are organized to follow the same formatting as the conf files - All conf files issue_trackers have been updated.

i #230 Removed Unused Getters

cee5e36

- get_mbox_mailing_list() and get_mbox_archive_type() were removed as the keys that they are grabbing are scheduled for removal.

i #230 Bugzilla Getters

25deb91

- added two new bugzilla getter functions get_bugzilla_issue_comment_path and get_bugzilla_issue_path

i #230 Pipermail Getters

89e768f

- Implemented two pipermail mailing list getters get_pipermail_domain and get_pipermail_path

i #230 Mbox and Pipermail Input Getters

cad5c6f

- Implemented two more getter functions: get_mbox_input_file() and get_pipermail_input_file()

i #230 Updated conf/ files

719650a

- Added Scitools understand section to tools - Updated mailing_list

beydlern and others added 12 commits October 10, 2024 11:20

Merge branch 'master' into 230-abstract-config-file-parsing

dbe66c1

i #230 Update NEWS.md

15b5e62

- Updated NEWS.md - Minor change to mailing_list

i #324 Updated downloader function

9c8dad9

Changed the function to paginate as needed to download all available entries.

i #324 Created refresher function for discussions

2dd8877

- Added refresher function for discussions - Non-fixed warning, refresher function could cause error if JSON file is improperly named.

i #321 Added function to create folder directory to config.R

db8a9e3

- Added `create_file_directory` function - Added `create_file_path` helper function

i #321 Updated function documentation

20e626d

- Added @export to functions - Added some more detail to function description - Added discussion filepath under github issue tracker

i #324 Updated Notebook

50653a0

- Moved the Download Discussions function and explanations from `vignettes/github_api_showcase.Rmd` to the `vignettes/download_github_comments.Rmd` notebook

Merge branch '321-creating-a-file-directory-with-functions-in-configr…

a17a482

…' into 324-github-discussions-with-GraphQL-API

i #324 Updated Configuration Files

f4a337f

- Updated filepaths to unify formatting - Added discussion save filepath to github issue tracker

i #324 Updated Notebooks

a6e25cc

- Updated `vignettes/download_github_comments.Rmd` notebook - Added documentation for refresh function - Set create_file_directory function verbose param to FALSE as default

carlosparadis assigned crepesAlot Dec 9, 2024

crepesAlot added 2 commits December 10, 2024 14:02

i #324 Deleted refresh function

8e5f5d5

- Reconfigured how `create_file_directory` function obtains paths from config file

i #324 Re-added refresh function

2e1b477

crepesAlot requested a review from carlosparadis December 11, 2024 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

324 GitHub discussions with graph ql api #332

324 GitHub discussions with graph ql api #332

crepesAlot commented Dec 8, 2024

carlosparadis commented Dec 8, 2024

crepesAlot commented Dec 8, 2024 •

edited

Loading

carlosparadis commented Dec 8, 2024

crepesAlot commented Dec 8, 2024

carlosparadis commented Dec 9, 2024

crepesAlot commented Dec 9, 2024

carlosparadis commented Dec 9, 2024

crepesAlot commented Dec 9, 2024

324 GitHub discussions with graph ql api #332

Are you sure you want to change the base?

324 GitHub discussions with graph ql api #332

Conversation

crepesAlot commented Dec 8, 2024

carlosparadis commented Dec 8, 2024

crepesAlot commented Dec 8, 2024 • edited Loading

carlosparadis commented Dec 8, 2024

crepesAlot commented Dec 8, 2024

carlosparadis commented Dec 9, 2024

crepesAlot commented Dec 9, 2024

carlosparadis commented Dec 9, 2024

crepesAlot commented Dec 9, 2024

crepesAlot commented Dec 8, 2024 •

edited

Loading