Skip to content

Commit 3334b9a

Browse files
authored
Merge pull request #398 from JohT/feature/anomaly-detection
Anomaly Detection
2 parents 8cfcbe4 + 6f8b227 commit 3334b9a

File tree

56 files changed

+11504
-28
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+11504
-28
lines changed

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,5 +93,11 @@ coverage/
9393
.ipynb_checkpoints
9494
*.nbconvert*
9595

96+
# Python
97+
__pycache__/
98+
9699
# Python environments
97-
.conda
100+
.conda
101+
102+
# Optuna (and other) Database data
103+
*.db

COMMANDS.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
- [Examples](#examples)
99
- [Start an analysis with CSV reports only](#start-an-analysis-with-csv-reports-only)
1010
- [Start an analysis with Jupyter reports only](#start-an-analysis-with-jupyter-reports-only)
11+
- [Start an analysis with Python reports only](#start-an-analysis-with-python-reports-only)
1112
- [Start an analysis with PDF generation](#start-an-analysis-with-pdf-generation)
1213
- [Start an analysis without importing git log data](#start-an-analysis-without-importing-git-log-data)
1314
- [Only run setup and explore the Graph manually](#only-run-setup-and-explore-the-graph-manually)
@@ -102,6 +103,14 @@ If only the Jupyter reports are needed e.g. when the CSV reports had already bee
102103
./../../scripts/analysis/analyze.sh --report Jupyter
103104
```
104105

106+
#### Start an analysis with Python reports only
107+
108+
If you only need Python reports, e.g. to skip Chromium Browser dependency, the this can be done with:
109+
110+
```shell
111+
./../../scripts/analysis/analyze.sh --report Python
112+
```
113+
105114
#### Start an analysis with PDF generation
106115

107116
Note: Generating a PDF from a Jupyter notebook using [nbconvert](https://nbconvert.readthedocs.io) takes some time and might even fail due to a timeout error.

GETTING_STARTED.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,16 +84,22 @@ Use these optional command line options as needed:
8484
./../../scripts/analysis/analyze.sh --report Csv
8585
```
8686
87-
- Jupyter notebook reports when Python and Conda are installed:
87+
- Jupyter notebook reports when Python and Conda are installed (and Chromium Browser for PDF generation):
8888
8989
```shell
9090
./../../scripts/analysis/analyze.sh --report Jupyter
9191
```
9292
93+
- Python reports when Python and Conda are installed (without Chromium Browser for PDF generation):
94+
95+
```shell
96+
./../../scripts/analysis/analyze.sh --report Python
97+
```
98+
9399
- Graph visualizations when Node.js and npm are installed:
94100
95101
```shell
96-
./../../scripts/analysis/analyze.sh --report Jupyter
102+
./../../scripts/analysis/analyze.sh --report Visualization
97103
```
98104
99105
- All reports with Python, Conda, Node.js and npm installed:

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Contained within this repository is a comprehensive and automated code graph ana
1717
- Easily integrable into your [continuous integration pipeline](./INTEGRATION.md)
1818
- More than 130 CSV reports for dependencies, metrics, cycles, annotations, algorithms and many more
1919
- Jupyter notebook reports for dependencies, metrics, visibility and many more
20+
- Anomaly detection powered by unsupervised machine learning and explainable AI
2021
- Graph structure visualization
2122
- Automated reference document generation
2223
- Runtime and library independent automation using [shell scripts](./scripts/SCRIPTS.md)
@@ -27,6 +28,7 @@ Contained within this repository is a comprehensive and automated code graph ana
2728

2829
### :newspaper: News
2930

31+
- August 2025: Anomaly detection powered by unsupervised machine learning and explainable AI
3032
- May 2025: Migrated to [Neo4j 2025.x](https://neo4j.com/docs/upgrade-migration-guide/current/version-2025/upgrade) and Java 21.
3133

3234
### :notebook: Jupyter Notebook Reports
@@ -148,6 +150,10 @@ The [Code Structure Analysis Pipeline](./.github/workflows/internal-java-code-an
148150
- [Neo4j Python Driver](https://neo4j.com/docs/api/python-driver)
149151
- [openTSNE](https://github.com/pavlin-policar/openTSNE)
150152
- [wordcloud](https://github.com/amueller/word_cloud)
153+
- [umap](https://umap-learn.readthedocs.io)
154+
- [scikit-learn](https://scikit-learn.org)
155+
- [optuna](https://optuna.org)
156+
- [SHAP](https://github.com/shap/shap)
151157
- [Graph Visualization](./graph-visualization/README.md) uses [node.js](https://nodejs.org/de) and the dependencies listed in [package.json](./graph-visualization/package.json).
152158
- [HPCC-Systems (High Performance Computing Cluster) Web-Assembly (JavaScript)](https://github.com/hpcc-systems/hpcc-js-wasm) containing a wrapper for GraphViz to visualize graph structures.
153159
- [GraphViz](https://gitlab.com/graphviz/graphviz) for CLI Graph Visualization
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
//Community Detection Leiden Statistics
2+
3+
CALL gds.leiden.stats(
4+
$dependencies_projection + '-cleaned', {
5+
gamma: toFloat($dependencies_leiden_gamma),
6+
theta: toFloat($dependencies_leiden_theta),
7+
maxLevels: toInteger($dependencies_leiden_max_levels),
8+
tolerance: 0.0000001,
9+
consecutiveIds: true,
10+
relationshipWeightProperty: $dependencies_projection_weight_property
11+
})
12+
YIELD nodeCount
13+
,communityCount
14+
,ranLevels
15+
,modularity
16+
,modularities
17+
,communityDistribution
18+
RETURN nodeCount
19+
,communityCount
20+
,ranLevels
21+
,modularity
22+
,modularities
23+
,communityDistribution.min
24+
,communityDistribution.mean
25+
,communityDistribution.max
26+
,communityDistribution.p50
27+
,communityDistribution.p75
28+
,communityDistribution.p90
29+
,communityDistribution.p95
30+
,communityDistribution.p99
31+
,communityDistribution.p999
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
//Community Detection Leiden Write property communityLeidenId
2+
3+
CALL gds.leiden.write(
4+
$dependencies_projection + '-cleaned', {
5+
gamma: toFloat($dependencies_leiden_gamma),
6+
theta: toFloat($dependencies_leiden_theta),
7+
maxLevels: toInteger($dependencies_leiden_max_levels),
8+
tolerance: 0.0000001,
9+
consecutiveIds: true,
10+
relationshipWeightProperty: $dependencies_projection_weight_property,
11+
writeProperty: $dependencies_projection_write_property
12+
})
13+
YIELD preProcessingMillis
14+
,computeMillis
15+
,writeMillis
16+
,postProcessingMillis
17+
,nodePropertiesWritten
18+
,communityCount
19+
,ranLevels
20+
,modularity
21+
,modularities
22+
,communityDistribution
23+
RETURN preProcessingMillis
24+
,computeMillis
25+
,writeMillis
26+
,postProcessingMillis
27+
,nodePropertiesWritten
28+
,communityCount
29+
,ranLevels
30+
,modularity
31+
,communityDistribution.min
32+
,communityDistribution.mean
33+
,communityDistribution.max
34+
,communityDistribution.p50
35+
,communityDistribution.p75
36+
,communityDistribution.p90
37+
,communityDistribution.p95
38+
,communityDistribution.p99
39+
,communityDistribution.p999
40+
,modularities
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
// Creates a smaller projection by sampling the original graph using "Common Neighbour Aware Random Walk"
2+
3+
CALL gds.graph.sample.cnarw(
4+
$dependencies_projection + '-sampled-cleaned',
5+
$dependencies_projection,
6+
{
7+
samplingRatio: toFloat($dependencies_projection_sampling_ratio)
8+
}
9+
)
10+
YIELD graphName, fromGraphName, nodeCount, relationshipCount, startNodeCount, projectMillis
11+
RETURN graphName, fromGraphName, nodeCount, relationshipCount, startNodeCount, projectMillis
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
// Writes batch data back into the database for code units when working with a dependencies projection. Variables: dependencies_projection_rows, dependencies_projection_node
2+
3+
UNWIND $dependencies_projection_rows AS row
4+
MATCH (codeUnit)
5+
WHERE elementId(codeUnit) = row.nodeId
6+
AND $dependencies_projection_node IN labels(codeUnit)
7+
SET codeUnit += row.properties
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
// Explore non null node property counts for the selected node label. Variables: projection_node_label
2+
3+
MATCH (selectedNode)
4+
WHERE $projection_node_label IN labels(selectedNode)
5+
WITH *, keys(selectedNode) AS nodeProperties
6+
UNWIND nodeProperties AS nodeProperty
7+
WITH *
8+
ORDER BY nodeProperty
9+
WHERE selectedNode[nodeProperty] IS NOT NULL
10+
WITH nodeProperty, count(*) AS nonNullCount
11+
// RETURN nodeProperty, nonNullCount
12+
// ORDER BY nodeProperty, nonNullCount
13+
RETURN nonNullCount
14+
,count(DISTINCT nodeProperty) AS propertyCount
15+
,collect(nodeProperty) AS properties
16+
ORDER BY nonNullCount DESC

cypher/Node_Embeddings/Node_Embeddings_1d_Fast_Random_Projection_Stream.cypher

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,9 @@ OPTIONAL MATCH (projectRoot:Directory)<-[:HAS_ROOT]-(proj:TS:Project)-[:CONTAINS
1616
WITH *, last(split(projectRoot.absoluteFileName, '/')) AS projectName
1717
RETURN DISTINCT
1818
coalesce(codeUnit.fqn, codeUnit.globalFqn, codeUnit.fileName, codeUnit.signature, codeUnit.name) AS codeUnitName
19-
,codeUnit.name AS shortCodeUnitName
20-
,coalesce(artifactName, projectName) AS projectName
21-
,coalesce(codeUnit.communityLeidenId, 0) AS communityId
19+
,codeUnit.name AS shortCodeUnitName
20+
,elementId(codeUnit) AS nodeElementId
21+
,coalesce(artifactName, projectName) AS projectName
22+
,coalesce(codeUnit.communityLeidenId, 0) AS communityId
2223
,coalesce(codeUnit.centralityPageRank, 0.01) AS centrality
2324
,embedding

0 commit comments

Comments
 (0)