Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add features based on file paths in the title and description #4270

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
9adba3f
Added file path feature extraction
benjaminmah Jun 20, 2024
4a43181
Improved regex for splitting filepaths
benjaminmah Jun 20, 2024
bdbf73d
Moved `/` and `.` check into regex
benjaminmah Jun 25, 2024
9ce9d76
Moved regex initialization to the constructor
benjaminmah Jun 25, 2024
802c9bd
Compiled regex using `re.compile` and move to constructor
benjaminmah Jul 2, 2024
165e68a
Renamed `ExtractFilePaths` to `FilePaths`
benjaminmah Jul 18, 2024
49601eb
Removed temporary list creation in `FilePaths` feature
benjaminmah Jul 18, 2024
9166307
Fixed `FilePaths` feature to accurately extract file paths and avoid …
benjaminmah Jul 18, 2024
8250977
Revised version to extract only file paths with valid file extensions
benjaminmah Jul 19, 2024
d1eb019
Initialized and compiled regex in compiler
benjaminmah Jul 22, 2024
f0f1118
Made code more Pythonic
benjaminmah Jul 24, 2024
fad7dae
Added 2 tests for `FilePaths` feature
benjaminmah Jul 24, 2024
fdd6123
Restructured `file_paths.json`
benjaminmah Jul 29, 2024
82a038a
Replaced hard-coding programming language extensions with `pygment.le…
benjaminmah Sep 4, 2024
bfd6334
Fixed tests to reflect more file extensions
benjaminmah Sep 4, 2024
5f4ec72
Added `publicsuffix2` to generate list of tlds
benjaminmah Sep 4, 2024
1f3921b
Replaced all addition strings with f-strings
benjaminmah Oct 18, 2024
0cf2482
Removed fixture from file path test
benjaminmah Oct 18, 2024
deadc18
Fixed test errors
benjaminmah Oct 18, 2024
a3f0ede
Added custom delimiter
benjaminmah Oct 18, 2024
a96a1e2
Fixed json input
benjaminmah Oct 18, 2024
176079c
Deleted fixture for file paths
benjaminmah Oct 18, 2024
19a289c
Pre-compile regex
benjaminmah Oct 18, 2024
4fd1f04
Removed comment
benjaminmah Oct 18, 2024
0d8d9dd
Changed default value of `inline_data` to `None`
benjaminmah Oct 21, 2024
c28ad97
Removed inline data boolean
benjaminmah Oct 21, 2024
24a5375
Removed `readlines()`
benjaminmah Oct 21, 2024
2bcfb18
Converted results into a list
benjaminmah Oct 21, 2024
9bed4a1
Moved FilePaths test to function
benjaminmah Oct 21, 2024
e76c0d0
Fixed indentation
benjaminmah Oct 21, 2024
1c00a10
Fixed assertion
benjaminmah Oct 21, 2024
f2a9d39
Changed `valid_extensions` to a local variable instead of an attribute
benjaminmah Oct 23, 2024
b677ccb
Converted `non_file_path_keywords` from attribute to local variable
benjaminmah Oct 23, 2024
70f72f5
Added comment explaining sorting `valid_extensions`
benjaminmah Oct 23, 2024
836d42d
Removed deletion of URLs from string
benjaminmah Oct 23, 2024
d6f8002
Removed sorting (test)
benjaminmah Oct 25, 2024
e11be5b
Removed sorting comment
benjaminmah Oct 25, 2024
38432cf
Simplified updating valid extensions set with lexers
benjaminmah Oct 25, 2024
9373c0b
Fixed ValueError
benjaminmah Oct 25, 2024
2022eb4
Fixed ValueError
benjaminmah Oct 25, 2024
64e5c3b
Removed tracking
benjaminmah Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixed json input
benjaminmah committed Oct 28, 2024
commit a96a1e29a9c0e74c436ba04119710d4b016b80dd
28 changes: 19 additions & 9 deletions tests/test_bug_features.py
Original file line number Diff line number Diff line change
@@ -49,10 +49,7 @@ def _read(
feature_extractor = feature_extractor_class()

if use_inline_data:
results = (
feature_extractor(json.loads(line.strip()))
for line in inline_data.split("###")
)
results = (feature_extractor(item) for item in inline_data)
else:
path = get_fixture_path(os.path.join("bug_features", path))
with open(path, "r") as f:
@@ -205,11 +202,24 @@ def test_BugTypes(read) -> None:


def test_FilePaths(read):
inline_data = """{"summary": "<nsFrame.cpp> cleanup", "comments": [{"text": "Fix for\n{{ <http://tinderbox.mozilla.org/SeaMonkey/warn1082809200.7591.html>\nanthonyd (2 warnings)\n1.\tlayout/html/base/src/nsFrame.cpp:3879 (See build log excerpt)\n\t`nsIFrame*thisBlock' might be used uninitialized in this function\n2.\tlayout/html/base/src/nsFrame.cpp:3908 (See build log excerpt)\n\t`nsIFrame*thisBlock' might be used uninitialized in this function\n}} (NB: lines should be lines - 2, due to checkin \"in progress\")\nwill be included."}]}
###
{"summary": "Spidermonkey regression causes treehydra trunk to fail 6 tests", "comments": [{"text": "Today I'm trying to get callgraph stuff hooked into dxr, and I'm unable to get a working treehydra. I've updated tried updating just dehydra, then I updated gcc w/plugins using the new stuff in the patch queue, and it doesn't matter. Running make check_treehydra fails like this:\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad3.js locks_bad3.cc\n Failure msg: Expected 'locks_bad3.cc:10: error: precondition not met' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error(\"No case_val in this lazy object\")\n../libs/treehydra.js:12: #1: unhandledLazyProperty(\"case_val\")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_good.js locks_good.cc\n Failure msg: Expected no error output, got error output :../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error(\"No case_val in this lazy object\")\n../libs/treehydra.js:12: #1: unhandledLazyProperty(\"case_val\")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_good2.js locks_good2.cc\n Failure msg: Expected no error output, got error output :../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error(\"No case_val in this lazy object\")\n../libs/treehydra.js:12: #1: unhandledLazyProperty(\"case_val\")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad4.js locks_bad4.cc\n Failure msg: Expected 'locks_bad4.cc:13: error: precondition not met' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error(\"No case_val in this lazy object\")\n../libs/treehydra.js:12: #1: unhandledLazyProperty(\"case_val\")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad2.js locks_bad2.cc\n Failure msg: Expected 'locks_bad2.cc:12: error: precondition not met' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error(\"No case_val in this lazy object\")\n../libs/treehydra.js:12: #1: unhandledLazyProperty(\"case_val\")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad1.js locks_bad1.cc\n Failure msg: Expected 'locks_bad1.cc:11: error: precondition not met' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error(\"No case_val in this lazy object\")\n../libs/treehydra.js:12: #1: unhandledLazyProperty(\"case_val\")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\n\nUnit Test Suite Summary:\n 32 passed\n 6 failed\n 0 error(s)\nmake[1]: *** [check_treehydra] Error 1\nmake[1]: Leaving directory `/var/www/html/dxr/tools/gcc-dehydra/dehydra/test'\nmake: *** [check] Error 2"}]}
"""

inline_data = [
{
"summary": "<nsFrame.cpp> cleanup",
"comments": [
{
"text": "Fix for\n{{ <http://tinderbox.mozilla.org/SeaMonkey/warn1082809200.7591.html>\nanthonyd (2 warnings)\n1.\tlayout/html/base/src/nsFrame.cpp:3879 (See build log excerpt)\n\t`nsIFrame*thisBlock' might be used uninitialized in this function\n2.\tlayout/html/base/src/nsFrame.cpp:3908 (See build log excerpt)\n\t`nsIFrame*thisBlock' might be used uninitialized in this function\n}} (NB: lines should be lines - 2, due to checkin \"in progress\")\nwill be included."
}
],
},
{
"summary": "Spidermonkey regression causes treehydra trunk to fail 6 tests",
"comments": [
{
"text": 'Today I\'m trying to get callgraph stuff hooked into dxr, and I\'m unable to get a working treehydra. I\'ve updated tried updating just dehydra, then I updated gcc w/plugins using the new stuff in the patch queue, and it doesn\'t matter. Running make check_treehydra fails like this:\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad3.js locks_bad3.cc\n Failure msg: Expected \'locks_bad3.cc:10: error: precondition not met\' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error("No case_val in this lazy object")\n../libs/treehydra.js:12: #1: unhandledLazyProperty("case_val")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_good.js locks_good.cc\n Failure msg: Expected no error output, got error output :../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error("No case_val in this lazy object")\n../libs/treehydra.js:12: #1: unhandledLazyProperty("case_val")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_good2.js locks_good2.cc\n Failure msg: Expected no error output, got error output :../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error("No case_val in this lazy object")\n../libs/treehydra.js:12: #1: unhandledLazyProperty("case_val")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad4.js locks_bad4.cc\n Failure msg: Expected \'locks_bad4.cc:13: error: precondition not met\' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error("No case_val in this lazy object")\n../libs/treehydra.js:12: #1: unhandledLazyProperty("case_val")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad2.js locks_bad2.cc\n Failure msg: Expected \'locks_bad2.cc:12: error: precondition not met\' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error("No case_val in this lazy object")\n../libs/treehydra.js:12: #1: unhandledLazyProperty("case_val")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\nTest Failure: \n Test command: /var/www/html/dxr/tools/gcc-dehydra/installed/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.3.0/cc1plus -quiet -fplugin=../gcc_treehydra.so -o /dev/null -fplugin-arg=test_locks_bad1.js locks_bad1.cc\n Failure msg: Expected \'locks_bad1.cc:11: error: precondition not met\' in error output; not found. stderr:../libs/treehydra.js:12: JS Exception: No case_val in this lazy object\n:0: #0: Error("No case_val in this lazy object")\n../libs/treehydra.js:12: #1: unhandledLazyProperty("case_val")\n../libs/unstable/esp.js:481: #2: ()\n./esp_lock.js:41: #3: process_tree([object GCCNode])\n\n\nUnit Test Suite Summary:\n 32 passed\n 6 failed\n 0 error(s)\nmake[1]: *** [check_treehydra] Error 1\nmake[1]: Leaving directory `/var/www/html/dxr/tools/gcc-dehydra/dehydra/test\'\nmake: *** [check] Error 2'
}
],
},
]
expected_results = [
[
"nsFrame.cpp",