yigbt · soulios · Jan 1, 2024 · Jul 11, 2024 · Oct 9, 2024 · Oct 9, 2024
diff --git a/README.md b/README.md
@@ -10,38 +10,76 @@ neural network (GNN) that predicts an association to a target molecule, e.g., a
 DeepFPlearn<sup>+</sup> is an extension of deepFPlearn[[2]](#2), which uses binary fingerprints to represent the
 molecule's structure computationally.
 
-## Setting up Python environment
+## Installation
 
 The DFPL package requires a particular Python environment to work properly.
 It consists of a recent Python interpreter and packages for data-science and neural networks.
-The exact dependencies can be found in the
-[`requirements.txt`](requirements.txt) (which is used when installing the package with pip)
-and [`environment.yml`](environment.yml) (for installation with conda).
 
 You have several ways to provide the correct environment to run code from the DFPL package.
 
-1. Use the automatically built docker/Singularity containers
-2. Build your own container [following the steps here](container/README.md)
-3. Setup a python virtual environment
-4. Set up a conda environment install the requirements via conda and the DFPL package via pip
+1. Use bioconda to install the package
+2. Set up a Python virtual environment
+3. Use the automatically built Docker
+4. Use the automatically built Singularity containers
 
-In the following, you find details for option 1., 3., and 4.
+### Bioconda
+
+The package is also available on Bioconda. You can find the Bioconda recipe here and
+[![install with bioconda](http://bioconda.github.io/recipes/deepfplearn/README.html)]
+
+First create an environment with Python 3.8:
+
+```shell
+conda create -n dfpl python=3.8
+conda activate dfpl
+```
+
+Then install the package:
+
+```shell
+conda install -c bioconda deepfplearn
+```
+
+### Set up DFPL in a python virtual environment
+
+From within the `deepFPlearn` directory call
+
+```
+virtualenv -p python3 ENV_PATH
+. ENV_PATH/bin/activate
+pip install ./
+```
+
+replace `ENV_PATH` by the directory where the python virtual environment should be created.
+If your system has only python3 installed `-p python3` may be removed.
+
+In order to use the environment, it needs to be activated with `. ENV_PATH/bin/activate`.
 
 ### Docker container
 
-You need docker installed on you machine.
+You need docker installed on your machine. If you don't have it installed yet, you can find the installation
+instructions [here](https://docs.docker.com/engine/install/).
+
+In order to run DFPL pull the image using the following command line:
 
-In order to run DFPL use the following command line
+```shell
+docker pull quay.io/biocontainers/deepfplearn:TAG
+```
+Then mount the directory containing the data you want to process and run the container with the following command:
 
 ```shell
-docker run --gpus GPU_REQUEST registry.hzdr.de/department-computational-biology/deepfplearn/deepfplearn:TAG dfpl DFPL_ARGS
+docker run -v /path/to/local/repo quay.io/biocontainers/deepfplearn:TAG dfpl DFPL_ARGS
+```
+And then you can run the container with the following command:
+
+```shell
+docker run --gpus GPU_REQUEST quay.io/biocontainers/deepfplearn:TAG dfpl DFPL_ARGS
 ```
 
 where you replace
 
-- `TAG` by the version you want to use or `latest` if you want to use latest available version)
-- You can see available tags
-  here https://gitlab.hzdr.de/department-computational-biology/deepfplearn/container_registry/5827.
+- `TAG` by the version you want to use
+- You can see available tags in [biocontainers](https://biocontainers.pro/tools/deepfplearn).
   In general a container should be available for each released version of DFPL.
 - `GPU_REQUEST` by the GPUs you want to use or `all` if all GPUs should be used (remove `--gpus GPU_REQUEST` if only the
   CPU should)
@@ -50,21 +88,22 @@ where you replace
 In order to get an interactive bash shell in the container use:
 
 ```shell
-docker run -it --gpus GPU_REQUEST registry.hzdr.de/department-computational-biology/deepfplearn/deepfplearn:TAG bash
+docker run -it --gpus GPU_REQUEST quay.io/biocontainers/deepfplearn:TAG bash
 ```
 
+
 ### Singularity container
 
-You need Singularity installed on your machine. You can download a container with
+You need Singularity installed on your machine. You can find the installation instructions
+[here](https://apptainer.org/user-docs/master/quick_start.html).
 
 ```shell
-singularity pull dfpl.TAG.sif docker://registry.hzdr.de/department-computational-biology/deepfplearn/deepfplearn:TAG
+singularity pull dfpl.TAG.sif docker://quay.io/biocontainers/deepfplearn:TAG
 ```
 
-- replace `TAG` by the version you want to use or `latest` if you want to use latest available version)
+- replace `TAG` by the version you want to use 
 - You can see available tags
-  here https://gitlab.hzdr.de/department-computational-biology/deepfplearn/container_registry/5827.
-  In general a container should be available for each released version of DFPL.
+  [here](https://biocontainers.pro/tools/deepfplearn).
 
 This stores the container as a file `dfpl.TAG.sif` which can be run as follows:
 
@@ -78,10 +117,6 @@ singularity run --nv dfpl.TAG.sif dfpl DFPL_ARGS
 or you can start a shell script (look at [run-all-cases.sh](scripts/run-all-cases.sh) for an
 example)
 
-```shell script
-singularity run --nv dfpl.sif ". ./example/run-multiple-cases.sh"
-```
-
 It's also possible to get an interactive shell into the container
 
 ```shell script
@@ -90,47 +125,12 @@ singularity shell --nv dfpl.TAG.sif
 
 **Note:** The Singularity container is intended to be used on HPC cluster where your ability to install software might
 be limited.
-For local testing or development, setting up the conda environment is preferable.
-
-### Set up DFPL in a python virtual environment
-
-From within the `deepFPlearn` directory call
-
-```
-virtualenv -p python3 ENV_PATH
-. ENV_PATH/bin/activate
-pip install ./
-```
-
-replace `ENV_PATH` by the directory where the python virtual environment should be created.
-If your system has only python3 installed `-p python3` may be removed.
-
-In order to use the environment it needs to be activated with `. ENV_PATH/bin/activate`.
-
-### Set up DFPL in a conda environment
-
-To use this tool in a conda environment:
-
-1. Create the conda env from scratch
-
-   From within the `deepFPlearn` directory, you can create the conda environment with the provided yaml file that
-   contains all information and necessary packages
+For local testing or development, setting up the bioconda environment is preferable.
 
-   ```shell
-   conda env create -f environment.yml
-   ```
 
-2. Activate the `dfpl_env` environment with
 
-   ```shell
-   conda activate dfpl_env
-   ```
 
-3. Install the local `dfpl` package by calling
 
-   ```shell
-   pip install --no-deps ./
-   ```
 
 ## Prepare data
 

diff --git a/dfpl/__main__.py b/dfpl/__main__.py
@@ -17,108 +17,45 @@
 from dfpl import vae as vae
 from dfpl.utils import createArgsFromJson, createDirectory, makePathAbsolute
 
-project_directory = pathlib.Path(".").parent.parent.absolute()
-test_train_opts = options.Options(
-    inputFile=f"{project_directory}/input_datasets/S_dataset.pkl",
-    outputDir=f"{project_directory}/output_data/console_test",
-    ecWeightsFile=f"{project_directory}/output_data/case_00/AE_S/ae_S.encoder.hdf5",
-    ecModelDir=f"{project_directory}/output_data/case_00/AE_S/saved_model",
-    type="smiles",
-    fpType="topological",
-    epochs=100,
-    batchSize=1024,
-    fpSize=2048,
-    encFPSize=256,
-    enableMultiLabel=False,
-    testSize=0.2,
-    kFolds=2,
-    verbose=2,
-    trainAC=False,
-    trainFNN=True,
-    compressFeatures=True,
-    activationFunction="selu",
-    lossFunction="bce",
-    optimizer="Adam",
-    fnnType="FNN",
-)
 
-test_pred_opts = options.Options(
-    inputFile=f"{project_directory}/input_datasets/S_dataset.pkl",
-    outputDir=f"{project_directory}/output_data/console_test",
-    outputFile=f"{project_directory}/output_data/console_test/S_dataset.predictions_ER.csv",
-    ecModelDir=f"{project_directory}/output_data/case_00/AE_S/saved_model",
-    fnnModelDir=f"{project_directory}/output_data/console_test/ER_saved_model",
-    type="smiles",
-    fpType="topological",
-)
-
-
-def traindmpnn(opts: options.GnnOptions):
+def traindmpnn(opts: options.GnnOptions) -> None:
     """
     Train a D-MPNN model using the given options.
     Args:
     - opts: options.GnnOptions instance containing the details of the training
     Returns:
     - None
     """
-    os.environ["CUDA_VISIBLE_DEVICES"] = f"{opts.gpu}"
-    ignore_elements = ["py/object"]
     # Load options from a JSON file and replace the relevant attributes in `opts`
-    arguments = createArgsFromJson(
-        opts.configFile, ignore_elements, return_json_object=False
-    )
+    arguments = createArgsFromJson(jsonFile = opts.configFile)
     opts = cp.args.TrainArgs().parse_args(arguments)
     logging.info("Training DMPNN...")
-    # Train the model and get the mean and standard deviation of AUC score from cross-validation
     mean_score, std_score = cp.train.cross_validate(
         args=opts, train_func=cp.train.run_training
     )
     logging.info(f"Results: {mean_score:.5f} +/- {std_score:.5f}")
 
 
-def predictdmpnn(opts: options.GnnOptions, json_arg_path: str) -> None:
+def predictdmpnn(opts: options.GnnOptions) -> None:
     """
     Predict the values using a trained D-MPNN model with the given options.
     Args:
     - opts: options.GnnOptions instance containing the details of the prediction
-    - JSON_ARG_PATH: path to a JSON file containing additional arguments for prediction
     Returns:
     - None
     """
-    ignore_elements = [
-        "py/object",
-        "checkpoint_paths",
-        "save_dir",
-        "saving_name",
-    ]
     # Load options and additional arguments from a JSON file
-    arguments, data = createArgsFromJson(
-        json_arg_path, ignore_elements, return_json_object=True
-    )
-    arguments.append("--preds_path")
-    arguments.append("")
-    save_dir = data.get("save_dir")
-    name = data.get("saving_name")
-    # Replace relevant attributes in `opts` with loaded options
+    arguments = createArgsFromJson(jsonFile = opts.configFile)
     opts = cp.args.PredictArgs().parse_args(arguments)
-    opts.preds_path = save_dir + "/" + name
-    df = pd.read_csv(opts.test_path)
-    smiles = []
-    for index, rows in df.iterrows():
-        my_list = [rows.smiles]
-        smiles.append(my_list)
-    # Make predictions and return the result
-    cp.train.make_predictions(args=opts, smiles=smiles)
+
+    cp.train.make_predictions(args=opts)
 
 
 def train(opts: options.Options):
     """
     Run the main training procedure
     :param opts: Options defining the details of the training
     """
-
-    os.environ["CUDA_VISIBLE_DEVICES"] = f"{opts.gpu}"
-
     # import data from file and create DataFrame
     if "tsv" in opts.inputFile:
         df = fp.importDataFile(
@@ -128,7 +65,7 @@ def train(opts: options.Options):
         df = fp.importDataFile(
             opts.inputFile, import_function=fp.importSmilesCSV, fp_size=opts.fpSize
         )
-    # initialize encoders to None
+    # initialize (auto)encoders to None
     encoder = None
     autoencoder = None
     if opts.trainAC:
@@ -142,26 +79,28 @@ def train(opts: options.Options):
     # if feature compression is enabled
     if opts.compressFeatures:
         if not opts.trainAC:
-            if opts.aeType == "deterministic":
-                (autoencoder, encoder) = ac.define_ac_model(opts=options.Options())
-            elif opts.aeType == "variational":
+            if opts.aeType == "variational":
                 (autoencoder, encoder) = vae.define_vae_model(opts=options.Options())
-            elif opts.ecWeightsFile == "":
+            else:
+                (autoencoder, encoder) = ac.define_ac_model(opts=options.Options())
+
+            if opts.ecWeightsFile == "":
                 encoder = load_model(opts.ecModelDir)
             else:
                 autoencoder.load_weights(
                     os.path.join(opts.ecModelDir, opts.ecWeightsFile)
                 )
         # compress the fingerprints using the autoencoder
         df = ac.compress_fingerprints(df, encoder)
-        # ac.visualize_fingerprints(
-        #     df,
-        #     before_col="fp",
-        #     after_col="fpcompressed",
-        #     train_indices=train_indices,
-        #     test_indices=test_indices,
-        #     save_as=f"UMAP_{opts.aeSplitType}.png",
-        # )
+        if opts.visualizeLatent:
+            ac.visualize_fingerprints(
+                df,
+                before_col="fp",
+                after_col="fpcompressed",
+                train_indices=train_indices,
+                test_indices=test_indices,
+                save_as=f"UMAP_{opts.aeSplitType}.png",
+            )
     # train single label models if requested
     if opts.trainFNN and not opts.enableMultiLabel:
         sl.train_single_label_models(df=df, opts=opts)
@@ -257,7 +196,7 @@ def main():
                 raise ValueError("Input directory is not a directory")
         elif prog_args.method == "traingnn":
             traingnn_opts = options.GnnOptions.fromCmdArgs(prog_args)
-
+            createLogger("traingnn.log")
             traindmpnn(traingnn_opts)
 
         elif prog_args.method == "predictgnn":
@@ -267,12 +206,8 @@ def main():
                 test_path=makePathAbsolute(predictgnn_opts.test_path),
                 preds_path=makePathAbsolute(predictgnn_opts.preds_path),
             )
-
-            logging.info(
-                f"The following arguments are received or filled with default values:\n{prog_args}"
-            )
-
-            predictdmpnn(fixed_opts, prog_args.configFile)
+            createLogger("predictgnn.log")
+            predictdmpnn(fixed_opts)
 
         elif prog_args.method == "train":
             train_opts = options.Options.fromCmdArgs(prog_args)
@@ -298,8 +233,6 @@ def main():
                 ),
                 ecModelDir=makePathAbsolute(predict_opts.ecModelDir),
                 fnnModelDir=makePathAbsolute(predict_opts.fnnModelDir),
-                trainAC=False,
-                trainFNN=False,
             )
             createDirectory(fixed_opts.outputDir)
             createLogger(path.join(fixed_opts.outputDir, "predict.log"))