This repository was archived by the owner on Sep 30, 2024. It is now read-only.
Commit aa76466
SCIP Tree-sitter CLI evaluation logic + workspace indexing mode (#57894)
* Support indexing entire workspace
* Separate evaluate and index subcommands into different modules
* Add --evaluate argument to index command, use serde for json
* Punish candidates with high ambiguity
Candidate ambiguity is a measure of how detailed the candidate SCIP in
comparison to ground truth.
When a candidate symbol has high ambiguity, it means that it occurs in a
lot of places where ground truth SCIP uses different symbols.
A demonstration of this method overloads in Java.
If you have 20 overloads of the same method (but with different
parameters), scip-java actually produces 20 different symbols (e.g.
"NodeRenderer#render(+19)").
Our current methods just produce a single symbol "NodeRenderer#render()"
for all those occurrences.
This commit penalises such occurrences by the logarithm of ambiguity.
* Introduce normalised weighting of candidates
After computing the weights (using same jaccard measure) of individual
pairs of (candidate, ground truth) symbols, we collect all the ground
truth symbols that can be assigned to a given candidate, and normalise
the weights of each pair by dividing it by sum of all weights.
The idea behind this is to reassert the fact that mapping of symbols is
fuzzy, and therefore we shouldn't be selecting just 1 symbol - instead we spread
the fuzziness over all the occurrences, normalise them so they add up to
one.
Note that for a single alternative the weight will be 1, but that's not
a problem because even if some occurrences were missed, they will be
counted as part of false negatives, heavily discounting the effect of
this spurious 1.0 TP
* bzl: Remove library target from Rust crate (#58221)
* fix: Stop sanitizing path unnecessarily
* cleanup: Remove incorrect dep on CLI in highlighter binary
* bzl: Re-add library target for cargo compat
* build: Add comment for build targets
* config: Hoist walkdir to workspace-level dep
---------
Co-authored-by: Varun Gandhi <varun.gandhi@sourcegraph.com>1 parent d97ac6a commit aa76466
File tree
17 files changed
+1664
-296
lines changed- docker-images/syntax-highlighter
- crates
- scip-syntax
- scip-treesitter-cli
- src
- tests
- snapshots
- scip-treesitter-languages/src
- scip-treesitter/src
17 files changed
+1664
-296
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
0 commit comments