-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
310 Parallelize Refresh Pipeline #328
Open
daomcgill
wants to merge
99
commits into
master
Choose a base branch
from
310-parallelization-of-the-refresh-pipeline
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… helix config in accordance to new save file structure I have created the parse_mbox_latest_date and refresh_mbox functions. The latter function deletes the latest year and month mbox file that is currently downloaded (identified by parse_mbox_latest_date), and redownloads that along with any file beyond up until the current year. The naming convention of the downloaded files are also changed to what we have agreed on. Just to note, download_mod_mbox REMAINS UNCHANGED since I'm only using download_mod_mbox_per_month.
…ted refresh_pipermail, updated news Found out that the pipermail downloader function already downloads the files by month and year, so all I really needed to do was change it so that it downloads the files as mbox files (change the extension from .txt to .mbox). Created the refresher for pipermail. I had no need to create a parse latest pipermail since they were mbox files anyway.
…to ensure it does not download files past current year and month Added checks in the aforementioned functions so that the refreshers won't download "mail from the future"
…ountered Done as requested by Carlos
- Remove archive_url and archive_type parameters from download_pipermail(). - Add start_year_month and end_year_month parameters for date filtering. - Remove convert_pipermail_to_mbox() function, as download_pipermail() now handles file conversion automatically. - Change file naming convention to 'kaiaulu_'YYYYMM.mbox'. - Attempt to download and decompress files directly without saving .gz to disk, but could not establish a valid connection. Signed-off-by: Dao McGill <[email protected]>
…mail() - Modified helix.yml to use [[“mailing_list”]][[“pipermail”]][[“project_key_1”]] - Added project_key_2 to helix.yml - Created /vignettes/download_mail.Rmd to document information about pipermail downloader - Made function calls explicit for external libraries - ISSUE: Build -> Check is not passing. Seems to be having issues with utags_path, even though I changed the path to the one for universal-ctags in tools.yml
…process_gz_to_mbox_in_folder() - download_pipermail: Attempts to download .txt file first. If unavailable fallback to .gz. If using .gz file, unzips and writes output in .mbox - Added log messages - download_pipermail: Added timeout parameter to deal with case that server takes too long to respond - Added refresh_pipermail function - Updated vignettes/download_mail.Rmd to include refresh_pipermail - Added process_gz_to_mbox_in_folder function
…il refresher. - Replaced paste0 with stringi::stri_c - Removed create directory if does not exist - Added more verbose descriptions/comments - Added dividers within functions - Added verbose parameter - Added else block for refresher - Added call to process_gz_to_mbox_in_folder at end of refresher - parse_mbox: stri_replace_last was not working, changed it to stringi::stri_replace_last_regex - Tested parse_mbox. Perceval was not returning any output. I will look further into why this is happening.
…il refresher. - Replaced paste0 with stringi::stri_c - Removed create directory if does not exist - Added more verbose descriptions/comments - Added dividers within functions - Added verbose parameter - Added else block for refresher - Added call to process_gz_to_mbox_in_folder at end of refresher - parse_mbox: stri_replace_last was not working, changed it to stringi::stri_replace_last_regex - Tested parse_mbox. Perceval was not returning any output. I will look further into why this is happening. Signed-off-by: Dao McGill <[email protected]>
…uh/kaiaulu into 284-mbox-download-refresher
Updated parameters for download_mod_mbox to use Apache Pony Mail links as Apache lists now redirect there - Modified downloads to use YYYYMM instead of YYYY - Removed the option for downloading by year for clearer functionality. - Updated vignette/download_mail.Rmd Signed-off-by: Dao McGill <[email protected]>
- Created `refresh_mod_mbox` function to automatically refresh mailing list archives downloaded using Mod Mbox. - The function checks for the latest downloaded file, deletes it, and redownloads the archive from that month to the current date. - Added documentation for `refresh_mod_mbox` to the notebook. Signed-off-by: Dao McGill <[email protected]>
- Updated vignettes/download_mail.Rmd to working version - Fixed errors in helix.yml - Minor edits in mail.R Signed-off-by: Dao McGill <[email protected]>
- Check works locally - Commit all changed files
- Renamed for match with convention set by issue #230 Signed-off-by: Dao McGill <[email protected]>
- Reverted name change of save_folder_mail - Removed previous documentation file for mail (download_mod_mbox.Rmd) - Updates to dowmload_mail.Rmd
This reverts commit f0027dc.
- parse_mbox_lateset_date() now uses new naming convention for files - Added to download_mail.Rmd - Fixed documentation for download_pipermail() Signed-off-by: Dao McGill <[email protected]>
- added parse_mbox_latest_date
- Update pkgdown.yml - Set eval to False for notebook - Added warning for failed downloads - Added check for missing months in the date range within save_folder_path - Changed mbox_path in parsers to mbox_file_path - Use gt package to view tables - Made changes so Knit works for download_mail.Rmd - Updated exec/mailinglist.R to use new functions - To do: Use getter functions once they are merged Signed-off-by: Dao McGill <[email protected]>
R/example.R contained an unused parameter, triggering warnings on build. Signed-off-by: Carlos Paradis <[email protected]>
Actions is failing due to being unable to install XML. Some new error yet again on Actions. Trying to make the version requirement less strict to see if it is able to install. Signed-off-by: Carlos Paradis <[email protected]>
The story is a bit too dry and assumes much of the user. The file format stored is not brief. Modified it a bit to add an example on how it can be revised. Signed-off-by: Carlos Paradis <[email protected]>
In case the error of XML compile is tied to this issue: r-lib/actions#559 revert to 4.1 to see if it solves the problem. Signed-off-by: Carlos Paradis <[email protected]>
Signed-off-by: Dao McGill <[email protected]>
Signed-off-by: Dao McGill <[email protected]>
Signed-off-by: Dao McGill <[email protected]>
Signed-off-by: Carlos Paradis <[email protected]>
Signed-off-by: Carlos Paradis <[email protected]>
Signed-off-by: Carlos Paradis <[email protected]>
Notebooks should be now functional in master, so no longer needed to keep them disabled from pkgdown. Signed-off-by: Carlos Paradis <[email protected]>
Signed-off-by: Carlos Paradis <[email protected]>
Move to internal section of pkgdown so it is not display on docs to the user. Signed-off-by: Carlos Paradis <[email protected]>
The unit tests rely on a separate copy of thirft.yml. That file was not updated, so the unit tests were throwing errors for not finding the field. Moreover, the unit tests did not have the new config path, so the parse_mbox() were failing. Signed-off-by: Carlos Paradis <[email protected]>
Signed-off-by: Carlos Paradis <[email protected]>
Signed-off-by: Carlos Paradis <[email protected]>
the flag caused errors of perceval being unable to parse json files. Signed-off-by: Carlos Paradis <[email protected]>
Signed-off-by: Carlos Paradis <[email protected]>
Simplified some of the notebook language, reduced title of functions, removed some of the sub-headers pound symbols as it was creating too many sections on the code blocks. Added parser tables after downloaders and remove their eval so example tables of what can be downloaded are shown on the generated notebook. Commit passes check, tests, and downloaders, refresh and parsers work. Signed-off-by: Carlos Paradis <[email protected]>
- Make start_year_month optional - Determine start_year_month from existing files if they exist - Return error if no existing files, and no date specified Signed-off-by: Dao McGill <[email protected]>
Signed-off-by: Dao McGill <[email protected]>
- Takes file path for mbox file to parse - No longer need to pass project_conf Signed-off-by: Dao McGill <[email protected]>
…f-the-refresh-pipeline
Signed-off-by: Dao McGill <[email protected]>
Signed-off-by: Dao McGill <[email protected]>
carlosparadis
requested changes
Dec 9, 2024
Seems there are still merge conflicts. |
Signed-off-by: Dao McGill <[email protected]>
@carlosparadis reverted changes and resolved merge conflicts |
Signed-off-by: Dao McGill <[email protected]>
Signed-off-by: Dao McGill <[email protected]>
…f-the-refresh-pipeline
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.