Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

313 Syntax Extraction #320

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

313 Syntax Extraction #320

wants to merge 27 commits into from

Conversation

daomcgill
Copy link
Collaborator

  • Documented current syntax extraction functions
  • Overview on syntax extraction and XPath
  • Placeholder for syntax.yml config file

- Documented current syntax extraction functions
- Overview on syntax extraction and XPath
- Placeholder for syntax.yml config file

Signed-off-by: Dao McGill <[email protected]>
@daomcgill daomcgill changed the title i #313 Added Notebook for Syntax Extraction 313 Syntax Extraction Oct 15, 2024
@daomcgill daomcgill mentioned this pull request Oct 15, 2024
8 tasks
Copy link
Member

@carlosparadis carlosparadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be double ../../ because when we run the code block, it assumes it is located on vignettes. I believe this also needs to be fixed on the mail PR.

Copy link
Member

@carlosparadis carlosparadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not done the full review yet, but I hope this helps. Is my understanding correct that the documentation extractor is still pending?

We can do another full pass on the notebook at that point after you try it out.

conf/syntax.yml Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
vignettes/text_gof_showcase.Rmd Outdated Show resolved Hide resolved
vignettes/syntax_extractor.Rmd Outdated Show resolved Hide resolved
- Added new functions
- New configuration file
- Updated documentation

Signed-off-by: Dao McGill <[email protected]>
@daomcgill daomcgill linked an issue Oct 18, 2024 that may be closed by this pull request
8 tasks
daomcgill and others added 15 commits October 17, 2024 16:26
- Remove unused settings
- Change ../ to ../../
- Update notebook to reflect changes

Signed-off-by: Dao McGill <[email protected]>
- Added parameter for excluding licenses in class and file-level comment extraction
- Implemented function extraction for function names with optional parameters
- Implemented variable extraction with optional types
- Added examples for removing empty comments and/or comment delimiters

Signed-off-by: Dao McGill <[email protected]>
- Added function for imports
- Reformatted new query functions
- Added Notebook Example for Joined Queries

Signed-off-by: Dao McGill <[email protected]>
- Fix for issue with namespaces in certain queries
- TO DO: Package function currently missing filepath

Signed-off-by: Dao McGill
- Now displays filenames correctly

Signed-off-by: Dao McGill <[email protected]>
- TO DO: Cheatsheet for this work thread

Signed-off-by: Dao McGill <[email protected]>
This commits perform a major refactoring of how Kaiaulu interface with config files, and the suggested folder organization to store rawdata and analysis. 

The configuration files are generalized to account for anomaly cases when performing project analysis. For instance, long-lived projects may contain multiple repositories, issue trackers, mailing list, etc. The new template of the configuration file allows to account for this information. 

Moreover, changes to the config template cascaded in changes to all notebooks, as the access to the config was hardcoded to the file organization. A new set of get_ functions should make this the last commit that change in template cascades into notebooks. All actively maintained notebooks  (not prefixed by underline under vignettes/) have been updated to use the get functions. Future changes, therefore, will only affect the get() functions in R/config.R.

The folder organization of the filepaths has also been modified. Previously, filepaths assumed as default in the versioned config files suggested organizing code as rawdata/git_repo/projectX ; rawdata/jira/projectY. This organization was not practical for sharing data manually, as the user would need to zip several folders individually. The new organization is now rawdata/projectX/git_repo ; rawdata/projectX/jira. This means users only need to zip projectX and that will contain all the data wanted to be shared.

A minor typo on graph.R was also fixed for merge function calls from `sorted=` to `sort=`.
@carlosparadis
Copy link
Member

Working on this PR now, please hold on doing commits.

Signed-off-by: Carlos Paradis <[email protected]>
Copy link
Member

@carlosparadis carlosparadis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded paths, need to be replaced by the get()

@carlosparadis
Copy link
Member

@daomcgill i suggest you fix this PR first with the get and the other notes, assuming this is the one that came before the exec PR. Then you can merge locally this branch to the exec PR, to avoid having to do it twice (just be careful you dont merge the one that came after to the PR before or it will mix everything up).

@daomcgill
Copy link
Collaborator Author

@carlosparadis okay, I will work on this now

@daomcgill
Copy link
Collaborator Author

@carlosparadis fixed hardcoded paths and reverted workflow. Going to move to 314 now

@carlosparadis
Copy link
Member

@daomcgill could you confirm the notebook display the parsed tables so the users can see what each function outputs as i did with parse_mbox on your mail notebook?

@daomcgill
Copy link
Collaborator Author

@carlosparadis every query that I run displays a table. Should I allow every query to be evaluated then? I think that also means I have to change it so the annotation is evaluated?

- remove print statement
- gt displays head(10)

Signed-off-by: Dao McGill <[email protected]>
@daomcgill
Copy link
Collaborator Author

@carlosparadis I think I answered my own question. Notebook now displays tables.

@carlosparadis
Copy link
Member

@daomcgill you sure did! It is very much in line with the changes I did on mail and that you see on itm0, there is no better way to showcase your work on the docs than having a table showing on the intermediate steps. Please send me on drive the files needed to run the code (I imagine just the raw data suffice, the rest I can extract from your functions). Thank you!

Signed-off-by: Dao McGill <[email protected]>
@daomcgill
Copy link
Collaborator Author

@carlosparadis I just used the cloned maven repository but, yes, I am adding it to the drive right now (folder is just named maven).

I think I have made it through all the requested changes, in both branches. I have also updated the fasttext notebook, and it appears to be running with the new config setup.

@carlosparadis
Copy link
Member

Thank you!!

- Added back filters using get()

Signed-off-by: Dao McGill <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expanding the Syntax Extractor
3 participants