Skip to content

Releases: williajm/forgery

v0.4.0

17 Apr 23:11
695d45a

Choose a tag to compare

Highlights

Package Registry Providers

Cross-ecosystem fake data for seeding PyPI, npm, Maven, Cargo, and RubyGems test databases. 22 method pairs (44 Python-visible methods).

  • Cross-ecosystem primitives: commit_sha() / short_commit_sha(), semver() / semver_prerelease(), calver(), spdx_license() (50 common IDs), git_username() (GitHub/GitLab/Bitbucket rules).
  • Ecosystem-specific versions: pypi_version() (PEP 440 with pre/post/dev releases), maven_version() (with qualifiers like -SNAPSHOT, .RELEASE, .Final, -RC1).
  • Version constraints: pypi_version_specifier() (PEP 440), npm_version_range(), cargo_version_req(), maven_version_range(), gem_version_requirement().
  • Package identity: pypi_package_name() (PEP 503 normalised), npm_package_name() (plain or @scope/pkg), cargo_package_name(), gem_name(), maven_group_id(), maven_artifact_id(), maven_coordinate() (GAV).
  • Full requirement line: pypi_requirement() (e.g. requests>=2.0.0,<3.0.0).

Nine of the batch methods accept unique=True for no-duplicate output — see the README's Package Registry Data section for full details.

Parallel Generation

Opt-in multi-threaded batch generation via Rayon. set_parallel(enabled, num_threads=None); ~3.3× speedup at 100K+ items. Deterministic for same seed + thread count.

Streaming file writer

records_to_file(path, n, schema, ...) writes records in chunks, keeping peak memory bounded by chunk_size regardless of n. Formats auto-detected from extension: CSV, NDJSON, SQL, Parquet. Includes estimate_memory() and optional progress callback.

Serialized output formats

records_csv(), records_json(), records_ndjson(), records_parquet(), records_sql() — serialised directly in Rust, skipping Python object materialisation.

Install

pip install forgery==0.4.0

See the CHANGELOG and README for full API details.