Skip to content
/ urlF Public

My script stands out by preserving the structure of duplicate URLs and handling complex query parameters, unlike standard tools that only filter alphabetically or deduplicate.

Notifications You must be signed in to change notification settings

Boopath1/urlF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

Url's - Filter by - Parameters

urlF.py is a Python script designed to remove duplicate URLs based on both the base URL (including path) and their query parameters. The script processes a list of URLs from an input file, filters out duplicates based on their query parameters, and writes the unique URLs to an output file.


πŸš€ Installation

You can install urlF.py using GitHub or PyPI.

Option 1: Install from GitHub

Step 1: Clone the Repository

git clone https://github.com/Boopath1/urlF.py

or

git clone --depth 1 https://github.com/Boopath1/urlF.py

Install the required dependencies:

Step 2

pip3 install -r requirements.txt  # or pip install -r requirements.txt

Option 2: Install from PyPI

Step 1: Install via pip

pip install urlf  # Standard installation

Alternative: If Facing System Restrictions

pip install urlf --break-system-packages  # For some restricted environments

Usage

Step 1

python3 -m urlf <input_file> <output_file>
  • <input_file>: Path to the input file containing the list of URLs.
  • <output_file>: Path to the output file where unique URLs will be written.

Basic usage:

Step 2

python3 urlF.py duplicate-params.txt filtered_urls.txt

urlF.py: The main script file. It processes URLs from an input file, removes duplicates based on query parameters, and writes the results to an output file.

Example

The input file duplicate-params.txt might look like this:

https://example.com/page?fileGuid=DPg868kv89HJtQ8q
https://example.com/page?fileGuid=DPg868kv89HJtQ8q&anotherParam=123
https://example.com/page?anotherParam=123
https://example.com/page?fileGuid=aAqwe868kv89HJtQ8q
https://example.com/page?fileGuid=DPg868kv89HJtQ8q&extraParam=xyz
https://example.com/page?extraParam=xyz
https://example.com/page?extraParam=xyz_Aqw
https://example.com/page?fileGuid=DifferentGuid

The output file filtered_urls.txt will contain:

https://example.com/page?fileGuid=DPg868kv89HJtQ8q
https://example.com/page?fileGuid=DPg868kv89HJtQ8q&anotherParam=123
https://example.com/page?anotherParam=123
https://example.com/page?fileGuid=DPg868kv89HJtQ8q&extraParam=xyz
https://example.com/page?extraParam=xyz

πŸ“Š Comparison with Other Tools

Tool Functionality Limitation
sort Orders URLs alphabetically Does not filter based on query parameters
urldedupe Removes exact duplicate URLs Cannot analyze query parameter uniqueness
uro Normalizes and deduplicates URLs Does not focus on parameter-based filtering
urlF.py Filter URLs based on both the base URL (including path) and their query parameters Provides better query-based filtering and cleanup

Sample POC

The timing is also mentioned on the right side. You can verify that this script takes little time compared to other tools.

image

πŸ”Ή Why Run This After paramspider?

  • When running paramspider, you’ll often get duplicate parameters.
  • Instead of scanning the same parameter multiple times, use urlF.py to filter results efficiently.

image

image

  • Almost 2K URLs 😱

πŸ’‘ Contributing

Contributions are welcome! If you have suggestions or feature improvements, feel free to:

  • Fork the repository and create a pull request.
  • Open an issue if you encounter any bugs.

🎯 Final Thoughts

  • After enumerating all the URLs using tools like waybackurls, gau, katana, and others, use urlF.py to get unique URLs along with their parameters.
  • This ensures efficient filtering, reduces redundant requests, and helps in better targeted testing.
  • Optimized for security researchers and penetration testers to streamline the URL analysis process.

Happy Hacking! 🎯 πŸš€

About

My script stands out by preserving the structure of duplicate URLs and handling complex query parameters, unlike standard tools that only filter alphabetically or deduplicate.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages