Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/update python versions #5

Draft
wants to merge 17 commits into
base: 1.0.1-SNAPSHOT
Choose a base branch
from
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# PyMASq

<p align="center">
<img src="./assets/images/masq_logo_light.svg" width="150px"/>
<img src="./assets/images/masq_logo_light.svg" width="150px" alt="MASq Logo"/>
</p>

## Python-based Mitigation Application and Assessment (MASq)
Expand Down Expand Up @@ -32,9 +32,9 @@ cd pymasq
### Installing into a Conda Environment

```sh
conda create -n masq python=3.8 -y
conda create -n masq python=3.10 -y
conda activate masq
pip install .
pip install -e .
```

To generate the docs
Expand All @@ -44,7 +44,7 @@ python -m pip install -r ./doc-requirements.txt
```

<p align="center">
<img src="./assets/images/Lincoln_Lab_icon.png" width="150px"/>
<img src="./assets/images/Lincoln_Lab_icon.png" width="150px" alt="MIT Lincoln Lab Logo"/>
</p>

## Distribution Statement
Expand Down
18 changes: 9 additions & 9 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import sys
import sphinx_rtd_theme

sys.path.insert(0, os.path.abspath(os.path.join('..','..')))
sys.path.insert(0, os.path.abspath(os.path.join("..", "..")))
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
Expand All @@ -22,12 +22,12 @@

# -- Project information -----------------------------------------------------

project = 'pymasq'
copyright = '2022, MITLL'
author = 'MITLL'
project = "pymasq"
copyright = "2022, MITLL"
author = "MITLL"

# The full version, including alpha/beta/rc tags
release = '1.0'
release = "1.1"


# -- General configuration ---------------------------------------------------
Expand All @@ -36,14 +36,14 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.napoleon', # NumPy & Google style docstring support
"sphinx.ext.napoleon", # NumPy & Google style docstring support
"sphinx_rtd_theme",
]

napoleon_google_docstring = False

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand All @@ -56,9 +56,9 @@
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
html_theme = "sphinx_rtd_theme"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ["_static"]
38 changes: 18 additions & 20 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -11,27 +11,25 @@ author = Cuyler OBrien, Jaime Pena, Evan Young, Brian Levine, Eric Wybenga
author_email = [email protected], [email protected], [email protected]

[options]
python_requires = >= 3.8
python_requires = >= 3.10
packages = find:
package_dir =
= src
install_requires =
boruta>=0.3
bpemb>=0.3.3
matplotlib>=3.4.2
numpy>=1.19.3
pandas>=1.1.3
bpemb~=0.3
matplotlib~=3.5
numpy~=1.26
pandas~=1.4
plotly>=4.11.0
scikit-learn>=0.23
scipy>=1.5.4
statsmodels>=0.12
SALib>=1.4.5
tensorflow>=2.4.0
tpot[dask]>=0.11
SALib~=1.4
scikit-learn~=1.1
scipy~=1.8
statsmodels~=0.13
tensorflow~=2.9
tpot[dask]~=0.11
tests_require =
pytest>=3.8
hypothesis>=4.53.2
beartype>=0.5.1
beartype>=0.5.1
pytest~=7.4

[options.packages.find]
where = src
Expand All @@ -43,7 +41,7 @@ python_files=test_*.py
testpaths=tests

[tox:tox]
envlist = py38, py39, coverage, bandit, owasp-depcheck
envlist = py3{10,11}, coverage, bandit, owasp-depcheck
toxworkdir = build/tox

[testenv]
Expand All @@ -54,24 +52,24 @@ commands = pytest tests --junitxml={toxworkdir}/xunit-tests-{envname}.xml -o jun

[testenv:coverage]
usedevelop = true
basepython = python3.8
basepython = python3.10
deps = {[testenv]deps}
coverage
pytest-cov
commands = pytest --cov-report xml:{toxworkdir}/xunit-coverage.xml --cov-config=setup.cfg --cov=pymasq tests -o junit_suite_name=pytest-{envname}

[testenv:localcoverage]
usedevelop = true
basepython = python3.8
basepython = python3.10
deps = {[testenv]deps}
coverage
pytest-cov
commands = pytest --cov-report term-missing --cov-config=setup.cfg --cov=pymasq tests

[testenv:bandit]
basepython = python3.8
basepython = python3.10
deps = bandit
commands = bandit -f json -o {toxworkdir}/security-bandit.json -r {envsitepackagesdir}/pymasq

[testenv:owasp-depcheck]
basepython = python3.8
basepython = python3.10
2 changes: 1 addition & 1 deletion src/pymasq/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from os import path

__version__ = "0.6.5"
__version__ = "0.6.6"


try:
Expand Down
6 changes: 5 additions & 1 deletion src/pymasq/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from pathlib import Path
from typing import Tuple
from pymasq import ROOT_DIR

import numpy as np

# Directory where all embeddings and models will be cached
CACHE_LOCATION: Path = Path("~/.cache/pymasq").expanduser()
Expand All @@ -27,6 +28,7 @@
CLASSIFIER_MODELS: Tuple[str] = ("logreg", "rfclass", "tpotclass")

DEFAULT_LOGISITIC_REGRESSION_SOLVER: str = "saga"
DEFAULT_MODEL_ITERATIONS: int = 1000

# Byte Pair Encoding default language and dimensionality for vectors
BPE_LANG: str = "en"
Expand All @@ -40,3 +42,5 @@

# Default number of parallel processors, set to -1 for all processors
DEFAULT_N_JOBS: int = -1

rg = np.random.default_rng(DEFAULT_SEED)
8 changes: 7 additions & 1 deletion src/pymasq/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,13 @@
The :mod:`pymasq.datasets` module includes utilities to load tabular datasets.
"""

from ._base import load_data, load_census, load_loan, load_prestige, load_bank_attrition_rates
from ._base import (
load_data,
load_census,
load_loan,
load_prestige,
load_bank_attrition_rates,
)
from .data_generator import gen_geom_seq, gen_bin_df, gen_num_df
from .utils import rand_cat_change

Expand Down
5 changes: 3 additions & 2 deletions src/pymasq/datasets/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,12 @@ def load_loan():
"""
return load_data("loan.csv")


def load_bank_attrition_rates():
"""Load and return the Bank Attrition Rates dataset.

A manager at the bank is disturbed with more and more customers leaving their credit card services.
They would really appreciate if one could predict for them who is gonna get churned so
A manager at the bank is disturbed with more and more customers leaving their credit card services.
They would really appreciate if one could predict for them who is gonna get churned so
they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.

============== ==============
Expand Down
20 changes: 11 additions & 9 deletions src/pymasq/datasets/data_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
from .utils import rand_cat_change

from pymasq import BEARTYPE
from pymasq.config import DEFAULT_SEED

rg = np.random.default_rng(DEFAULT_SEED)

@BEARTYPE
def gen_geom_seq(start: float = 0.5, n: int = 6, rate: float = 2.0) -> List[float]:
Expand Down Expand Up @@ -117,12 +119,12 @@ def gen_num_df(n: int = 1000, seed: int = 1234) -> pd.DataFrame:


@BEARTYPE
def _l_div_sensitive_gen(l: int, n: int) -> List:
def _l_div_sensitive_gen(l_div: int, n: int) -> List[int]:
"""
Generates the sensitive variable for generate_l_diverse_table for each equivalence class
Parameters
----------
l : int
l_div : int
The specified diversity that the equivalence class needs to be
n : int
The size of the equivalence class (i.e. the lenght of the list returned)
Expand All @@ -132,17 +134,17 @@ def _l_div_sensitive_gen(l: int, n: int) -> List:
List of integer values for the sensitive column
"""

unique_entries = np.random.choice(range(n), l)
unique_entries = rg.choice(range(n), l_div)
while len(unique_entries) != len(set(unique_entries)):
unique_entries = np.random.choice(range(n), l)
unique_entries = rg.choice(range(n), l_div)

non_unique = np.random.choice(unique_entries, n - l)
non_unique = rg.choice(unique_entries, n - l_div)
return list(unique_entries) + list(non_unique)


@BEARTYPE
def generate_l_diverse_table(
l: Union[int, List[int]],
l_div: Union[int, List[int]],
num_col: int = 5,
num_q_blocks: int = 5,
q_block_sizes: Union[int, List[int]] = 5,
Expand All @@ -151,7 +153,7 @@ def generate_l_diverse_table(
Used for testing l-diversity. Creates a data set that is l-diverse for given l.
Parameters
----------
l : Union[int, List[int]]
l_div : Union[int, List[int]]
The specified diversity that the data set needs to be TODO: need to expand this to allow float l parameters for entropy
num_col : int, optional
The number of columns (in addition to the sensitive column) the data set should have
Expand All @@ -178,10 +180,10 @@ def generate_l_diverse_table(
if isinstance(q_block_sizes, int)
else q_block_sizes
)
l = [l] * num_q_blocks if not isinstance(l, list) else l
l_div: List[int] = [l_div] * num_q_blocks if not isinstance(l_div, list) else l_div

for n in range(num_q_blocks):
senn = _l_div_sensitive_gen(l[n], q_block_sizes[n])
senn = _l_div_sensitive_gen(l_div[n], q_block_sizes[n])
col_names["sensitive"] += senn
for cn in col_names:
if cn != "sensitive":
Expand Down
26 changes: 13 additions & 13 deletions src/pymasq/errors/__init__.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@

"""
Expose public exceptions & warnings
"""


class InputError(Exception):
""" Exception raised for errors in the input value. """
"""Exception raised for errors in the input value."""


class DataTypeError(Exception):
""" Exception raised for errors in the data type. """
"""Exception raised for errors in the data type."""


class SumNotEqualToOneError(ValueError):
""" Exception for sum of values not equal to 1. """
"""Exception for sum of values not equal to 1."""


class NotInRangeError(ValueError):
""" Exception for values not in specified interval. """
"""Exception for values not in specified interval."""


class LessThanZeroError(ValueError):
""" Exceptions for values < 0. """
"""Exceptions for values < 0."""


class LessThanOrEqualToZeroError(ValueError):
""" Exceptions for values <= 0. """
"""Exceptions for values <= 0."""


class NoMutationAvailableError(ValueError):
""" Exception when all mutations have been discarded and not replaced """
"""Exception when all mutations have been discarded and not replaced"""


__all__ = [
Expand All @@ -38,5 +38,5 @@ class NoMutationAvailableError(ValueError):
"NotInRangeError",
"LessThanZeroError",
"LessThanOrEqualToZeroError",
"NoMutationAvailableError"
]
"NoMutationAvailableError",
]
Loading