-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsrape.py
More file actions
57 lines (47 loc) · 2.5 KB
/
srape.py
File metadata and controls
57 lines (47 loc) · 2.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
"""
Script Objective:
-----------------
This script is intended to help automate the population of `data.json` by scraping
security advisory information from the official Django project website and CVE databases.
Target Data Structure:
----------------------
The output should match the schema defined in `data.json`:
- cve_id (e.g., CVE-2024-XXXX)
- cve_description (from NVD/MITRE)
- django_description (from Django security release)
- cwe (id, name)
- fixed_in (version number)
- affected_versions (list of objects with version and patches)
Data Sources:
-------------
1. **Django Security Releases**:
https://docs.djangoproject.com/en/dev/releases/security/
- This is the main archive of all security content.
- It lists all security releases. You will need to click through to each specific release note (e.g., "Django 4.2.11 released") to find details like CVEs, descriptions, and patch links.
2. **CVE Details (cve.org)**:
https://www.cve.org
- This is the official source for CVE Records.
- Use this to fetch the canonical "Description" and "CWE Category" for each CVE ID found in the Django docs.
- You can inspect the CVE content by visiting `https://www.cve.org/CVERecord?id=CVE-YYYY-XXXX`.
- **To find the CWE ID**: Look for the "CWE" or "Weakness Enumeration" section on the CVE record page. It usually lists the CWE ID (e.g., CWE-79) and its name (e.g., Cross-site Scripting).
3. **GitHub / Django Source**:
- Patch links in the Django blog often point to GitHub commits.
- We need to extract the commit hash from these URLs.
Recommended Steps for Implementation:
-------------------------------------
1. **Fetch Security Log**:
- Request `https://docs.djangoproject.com/en/dev/releases/security/`.
- Parse the HTML to find links to individual security release posts.
2. **Parse Release Post**:
- For each release post, extract the CVE IDs mentioned.
- Extract the description of the vulnerability.
- Identify the versions mentioned (e.g., "Django 4.2.x", "Fixed in 5.0.1").
- Find the patch links (look for "Apply this patch" or links to `github.com/django/django/commit/...`).
- Map patches to specific versions if possible (often described in text like "Table of contents" or "Affected supported versions").
3. **Enrich with CVE Data**:
- For each extracted CVE ID, query the cve.org website or API.
- Get the official `cve_description` and `cwe` information.
4. **Format and Save**:
- Construct the JSON objects.
- Save to `data.json`.
"""