Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ Each provider module generates a specific type of fake data:
| company.rs | company, job, catch_phrase | Business data |
| network.rs | url, domain_name, ipv4, ipv6, mac_address | Network identifiers |
| finance.rs | credit_card, iban | Financial identifiers with valid checksums |
| packages.rs | commit_sha, semver, calver, spdx_license, git_username, pypi/npm/cargo/gem/maven package names, version constraints, maven_coordinate, pypi_requirement | Package-registry data for PyPI, npm, Maven, Cargo, RubyGems |
| records.rs | records | Structured data from schema DSL (Rust-only, not yet exposed to Python) |

All providers follow the same pattern:
Expand All @@ -125,6 +126,8 @@ Static data organized by locale, embedded at compile time as `&'static [&str]`:
- `countries.rs`: ~200 countries
- `companies.rs`: Company name components
- `tlds.rs`: ~20 top-level domains
- `spdx_licenses.rs`: 50 common SPDX license identifiers
- `packages.rs`: package-name keywords, modifiers, Maven/npm scope components, pre-release tags, Maven qualifiers

Each data file includes tests for uniqueness and non-empty values.

Expand Down
40 changes: 39 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.4.0] - 2026-04-17

### Added

- **Package Registry Providers**: Cross-ecosystem fake data for seeding PyPI,
npm, Maven, Cargo, and RubyGems test databases. 22 method pairs
(44 Python-visible methods).
- Cross-ecosystem primitives: `commit_sha()` / `short_commit_sha()`,
`semver()` / `semver_prerelease()`, `calver()`, `spdx_license()` (50
common IDs), `git_username()` (enforces GitHub's rules: alphanumerics
and single hyphens, no leading/trailing hyphen, no consecutive hyphens,
≤ 39 chars).
- Ecosystem-specific versions: `pypi_version()` (PEP 440 — includes
pre/post/dev releases), `maven_version()` (with qualifiers like
`-SNAPSHOT`, `.RELEASE`, `.Final`, `-RC1`).
- Version constraints: `pypi_version_specifier()` (PEP 440),
`npm_version_range()`, `cargo_version_req()`, `maven_version_range()`,
`gem_version_requirement()`.
- Package identity: `pypi_package_name()` (PEP 503 normalized),
`npm_package_name()` (plain or `@scope/pkg`), `cargo_package_name()`,
`gem_name()`, `maven_group_id()` (reverse domain),
`maven_artifact_id()`, `maven_coordinate()` (GAV form
`group:artifact:version`).
- Full requirement line: `pypi_requirement()` (e.g.,
`requests>=2.0.0,<3.0.0`).
- All batch methods support parallel generation via `set_parallel()`.
- **Parallel Generation**: Opt-in multi-threaded batch generation via Rayon
- `set_parallel(enabled, num_threads=None)`: Enable/disable parallel mode
- `get_parallel()` / `get_num_threads()`: Query current parallel settings
Expand All @@ -18,6 +42,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- ~3.3x speedup at 100K+ items (names: 83ms -> 25ms for 1M items)
- `unique=True` always uses sequential path (requires shared state)
- Criterion benchmarks for parallel vs sequential comparison
- **Streaming file writer**: `records_to_file(path, n, schema, ...)` generates
records in chunks and writes each chunk to disk, keeping peak memory bounded
by `chunk_size` regardless of `n`. Supports CSV, NDJSON, SQL, and Parquet
with auto-detection from the file extension. Includes an optional progress
callback and an `estimate_memory()` utility.
- **Serialized output formats** for `records()` — serialised directly in Rust,
avoiding the cost of materialising Python objects before serialising:
- `records_csv()` — RFC 4180 CSV with header row
- `records_json()` — JSON array with proper scalar types
- `records_ndjson()` — newline-delimited JSON
- `records_parquet()` — Parquet bytes via the Arrow path
- `records_sql()` — ANSI SQL `INSERT`s, batched at 1000 rows,
with identifier quoting

## [0.3.0] - 2026-03-17

Expand Down Expand Up @@ -160,6 +197,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- SonarCloud integration for code quality
- CodeQL static analysis

[Unreleased]: https://github.com/williajm/forgery/compare/v0.3.0...HEAD
[Unreleased]: https://github.com/williajm/forgery/compare/v0.4.0...HEAD
[0.4.0]: https://github.com/williajm/forgery/compare/v0.3.0...v0.4.0
[0.3.0]: https://github.com/williajm/forgery/compare/v0.2.0...v0.3.0
[0.1.0]: https://github.com/williajm/forgery/releases/tag/v0.1.0
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "forgery"
version = "0.3.0"
version = "0.4.0"
edition = "2021"
description = "Fake data at the speed of Rust"
license = "MIT"
Expand Down
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,72 @@ License plate formats by locale:
| `it_IT` | AB 123 CD | `"FG 482 HJ"` |
| `ja_JP` | 300 12-34 | `"500 38-47"` |

### Package Registry Data

For seeding test databases of package registries (PyPI, npm, Maven, Cargo, RubyGems).
Cross-ecosystem primitives share one API; ecosystem-specific shapes have their own
methods.

**Cross-ecosystem primitives**

| Batch | Single | Description |
|-------|--------|-------------|
| `commit_shas(n)` | `commit_sha()` | 40-hex-char git commit SHA |
| `short_commit_shas(n)` | `short_commit_sha()` | 7-hex-char short SHA |
| `semvers(n)` | `semver()` | SemVer `MAJOR.MINOR.PATCH` |
| `semver_prereleases(n)` | `semver_prerelease()` | Pre-release (e.g. `1.2.3-alpha.1+build.5`) |
| `calvers(n)` | `calver()` | CalVer in mixed schemes (`YYYY.MM.DD`, `YY.MM`, ...) |
| `spdx_licenses(n)` | `spdx_license()` | SPDX identifier (50 common IDs) |
| `git_usernames(n)` | `git_username()` | GitHub/GitLab/Bitbucket-compatible username |

**Ecosystem-specific versions** (where SemVer alone doesn't cover the format)

| Batch | Single | Description |
|-------|--------|-------------|
| `pypi_versions(n)` | `pypi_version()` | PEP 440 (pre/post/dev releases) |
| `maven_versions(n)` | `maven_version()` | Maven version with qualifiers (`-SNAPSHOT`, `.RELEASE`, ...) |

**Version constraints**

| Batch | Single | Description |
|-------|--------|-------------|
| `pypi_version_specifiers(n)` | `pypi_version_specifier()` | PEP 440 (e.g. `>=1.2,<2.0`, `~=1.0`) |
| `npm_version_ranges(n)` | `npm_version_range()` | npm (e.g. `^1.2.3`, `~1.2.3`, `1.x`) |
| `cargo_version_reqs(n)` | `cargo_version_req()` | Cargo (e.g. `^1.0`, `~1.2`) |
| `maven_version_ranges(n)` | `maven_version_range()` | Maven (e.g. `[1.0,2.0)`) |
| `gem_version_requirements(n)` | `gem_version_requirement()` | RubyGems (e.g. `~> 1.2`) |

**Package identity**

| Batch | Single | Description |
|-------|--------|-------------|
| `pypi_package_names(n)` | `pypi_package_name()` | PEP 503 normalised |
Comment thread
williajm marked this conversation as resolved.
Outdated
| `npm_package_names(n)` | `npm_package_name()` | Plain or `@scope/pkg` (~30% scoped) |
| `cargo_package_names(n)` | `cargo_package_name()` | Rust-ident flavour |
| `gem_names(n)` | `gem_name()` | RubyGems gem name |
| `maven_group_ids(n)` | `maven_group_id()` | Reverse domain (e.g. `com.example.tools`) |
| `maven_artifact_ids(n)` | `maven_artifact_id()` | Lowercase with hyphens |
| `maven_coordinates(n)` | `maven_coordinate()` | GAV (`group:artifact:version`) |

**Full requirement lines**

| Batch | Single | Description |
|-------|--------|-------------|
| `pypi_requirements(n)` | `pypi_requirement()` | e.g. `requests>=2.0.0,<3.0.0` |

```python
from forgery import Faker

fake = Faker()
fake.seed(42)
fake.pypi_requirement() # 'requests>=2.0.0,<3.0.0'
fake.maven_coordinate() # 'com.example.tools:widget-core:1.2.3-SNAPSHOT'
fake.npm_package_name() # '@types/fast-parser'
fake.spdx_license() # 'Apache-2.0'
fake.git_username() # 'tiny-logger42'
fake.commit_sha() # 'a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2'
```

### Profile

| Batch | Single | Description |
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "maturin"

[project]
name = "forgery"
version = "0.3.0"
version = "0.4.0"
description = "Fake data at the speed of Rust"
readme = "README.md"
license = { text = "MIT" }
Expand Down
Loading
Loading