Skip to content

For netDx developers only: GeneMANIA documentation

Shraddha Pai edited this page May 3, 2019 · 1 revision

This page is of relevance to netDx developers, and not to a user. You do not need this information if you are only using netDx to build custom predictors.

This page was copied from a cached version of http://pages.genemania.org/tools. The GeneMANIA tools used by netDx are ProfileToNetworkDriver and QueryRunner. While this page keeps all the information for these other functions, only these two functions are required for netDx. GeneMANIA Jar is packaged as part of the netDx R software package

Query Runner

Runs one or more predictions and writes the results to disk. Each prediction needs to be provided in the form of a query file. One prediction report is generated for each query file. Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar QueryRunner options query-file-1 [ query-file-2 ... ]

Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar QueryRunner options query-file-1 [ query-file-2 ... ]

Options:

Name Description

–data  directory	Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–in    input-format	Optional. The format of the query files, which can be one of:
                                   flat (default): Tab-delimited (example).
                                   xml: Not yet supported. 
–out   output-format	Optional. The format of the output files, which can be one of:
                               genes (default): List of result genes ordered by score; one per line.
                               flat: Tab-delimited report containing details of prediction results and query parameters.
                               xml: XML-formatted report containing details of prediction results and query parameters.
                               scores: List of result genes with scores ordered by score for the entire genome (ignores related genes limit); one per line.
–scoring-method method	Optional. The method used to compute the gene scores, which can be one of:
                               discriminant (default): GeneMANIA’s classic scoring method.
                               z: Z-scores.
–ids id-types	               Optional. A comma-separated list of identifier types, in descending order of preference, which may be one or more of the following:
 				Ensembl Gene Name
				Entrez Gene Name
				Ensembl Gene ID
				RefSeq mRNA ID
				TAIR ID
				Uniprot ID
				Uniprot AC
				RefSeq Protein ID
				Ensembl Protein ID
				Entrez Gene ID
If the most preferred identifier is not available for a given gene, the next most preferred identifier is selected. The list above reflects the default order of preference.
–results directory	        Optional. Path to where the prediction result files will be created (one per input query file). Defaults to the current working directory.
–threads number	Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1.
–verbose	Optional. Makes QueryRunner print more details about what’s happening.
–list-networks organism-name	Optional. Lists the available networks for the given organism. You may need to put quotes around the organism name if invoked from a shell.
–list-genes organism-name	Optional. Lists the genes that are recognized for the given organism. You may need to put quotes around the organism name if invoked from a shell. Each line in the output contains a gene and all its synonyms, if any.

Example Query (Flat):

yeast-example.query
S. Cerevisiae
CDC27	APC11	APC4	XRS2	RAD54	APC2	RAD52	RAD10	MRE11	APC5
coexp	pi	gi
150
bp

Flat Query File Format:

organism-name 
query-gene-1 [ \t query-gene-2 ... ]
networks 
related-gene-limit
[ combining-method ]

--- Tools not used by netDx ---

Installation

First, you need to download the GeneMANIA JAR file. If you already installed the plugin through Cytoscape, you can find it in one of the following places: Unix/Mac: ~/.cytoscape/Cytoscape Version/plugins/GeneMANIA-Version/ Windows: My Documents\.cytoscape\Cytoscape Version\plugins\GeneMANIA-Version\

Second, you need a data set. If you’ve used the Cytoscape plugin to perform predictions, you’ll already have one installed in one of these locations:

Unix/Mac: ~/genemania_plugin/gmdata-id/ Windows: My Documents\genemania_plugin\gmdata-id\ Otherwise, you can use Data Admin to install one of the available data sets from genemania.org.

Available Tools

Data Import & Management

Data Admin	Downloads and manages GeneMANIA data sets from genemania.org.
Gene Sanitizer	Prints out the mappings between the given gene list and GeneMANIA’s preferred identifiers.
Id Importer	Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels.
Network Importer	Imports network/profile data from a file into a GeneMANIA data set.
Prediction
Query Runner	Runs one or more predictions and writes the results to disk. Each prediction needs to be provided in the form of a query file. One prediction report is generated for each query file.
Validation
Cross Validator	Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall.
Network Assessor	Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported.
Validation Set Maker	Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations.

Data Admin

Downloads and manages GeneMANIA data sets from genemania.org. Each data set consists of multiple organisms which are identified by their data-id. Organisms can be installed and removed individually as needed.

Commands:

list: Lists available data sets. Usage:

list Example:

$ java -jar GeneMania.jar DataAdmin list Data Set ID Total Size Database Version 2013-10-15 9351.08 MB 15 October 2013 2013-10-15-core 2059.38 MB 15 October 2013 2013-10-15-open_license 9324.49 MB 15 October 2013 2012-08-02 5994.14 MB 19 July 2012 2012-08-02-core 1764.09 MB 19 July 2012 2012-08-02-open_license 5963.38 MB 19 July 2012 ... install: Installs the infrastructure for the given data set ID without actually installing any organism data. Usage:

install data-set-id …where data-set-id is one of the IDs given by the list command above.

Example:

$ java -jar GeneMania.jar DataAdmin install 2013-10-15-core This example will download the data set into the directory gmdata-2013-10-15-core in the current directory.

list-data: Lists the data available for download for a particular data set. Usage:

list path/to/data/set …where path/to/data/set is the path to a data set downloaded by the Cytoscape plugin, or the install command above.

Example:

$ java -jar GeneMania.jar DataAdmin list-data gmdata-2013-10-15-core
Data ID	Description	Status
1	A. thaliana Arabidopsis (424 MB)	
2	C. elegans Worm (141 MB)	
3	D. melanogaster Fly (237 MB)	
4	H. sapiens Human (413 MB)	
5	M. musculus Mouse (412 MB)	
6	S. cerevisiae Baker's yeast (148 MB)	
7	R. norvegicus Rat (154 MB)	
8	D. rerio Zebrafish (126 MB)	

install-data: Downloads and installs data with the given ID from genemania.org. Usage:

install-data path/to/data/set data-id [data-id ...]

…where path/to/data/set is the path to a data set downloaded by the Cytoscape plugin, or the install command above; and data-id is one of the IDs given by the list-data command, or all, which is an alias for all available data for the given data set.

Example: Installing yeast data

$ java -jar GeneMania.jar DataAdmin install-data gmdata-2013-10-15-core 6

Example: Installing all data for 2013-10-15-core

$ java -jar GeneMania.jar DataAdmin install-data gmdata-2013-10-15-core all

uninstall-data: Deletes previously installed data from a data set. Usage:

uninstall-data path/to/data/set data-id [data-id ...] …where path/to/data/set is the path to a data set downloaded by the Cytoscape plugin, or the install command above; and data-id is one of the IDs given by the list-data command.

Example: Uninstalling human data

$ java -jar GeneMania.jar DataAdmin uninstall-data gmdata-2013-10-15-core 4 Gene Sanitizer

Prints out the mappings between the given gene list and GeneMANIA’s preferred identifiers. This tool is useful for checking which of your genes are recognized by GeneMANIA. The output is a tab-delimited text file containing one mapping per line. The first item is GeneMANIA’s preferred identifier, or nothing, if the identifier that follows isn’t recognized.

Usage:

java -Xmx900M -jar GeneMANIA.jar GeneSanitizer options gene-list-file

Example Gene List:

YMR043W
YPR113W
YCL067C
YIL015W
YNOT?
YCR084C
YFL026W
YHR084W
YGL008C
YNL145W

Example Output:

YMR043W	MCM1
YPR113W	PIS1
YCL067C	HMLALPHA2
YIL015W	BAR1
YNOT?	
YCR084C	TUP1
YFL026W	STE2
YHR084W	STE12
YGL008C	PMA1
YNL145W	MFA2

Options:

Name	Description
–data directory	Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–organism name	The name or taxonomy id of an organism whose genes should be considered.
Id Importer

Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels. Although the resulting data set is generally treated like an organism, where the given ids denote its genome, it does not have to be an organism. The identifiers can be anything, as long as they’re unique within the data set.

Usage:

java -Xmx900M -jar GeneMANIA.jar IdImporter options

Options:

Name	Description
–data directory	Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15).
–filename file-name	The path to a file that contains a complete set of identifiers that will serve as the basis of a new data set. Each line in the file should follow this format:
primary-id ( \t alias-1 ... )
–name entity-name	The name of the resulting entity (e.g. organism).
–alias entity-name	Optional. An alias for the resulting entity (e.g. shorter, informal name)
–taxid number	Optional. The taxonomy id of the resulting entity, if applicable.
–description description	Optional. A description of the resulting entity.

Network Importer

Imports network/profile data from a file into a GeneMANIA data set.

Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar NetworkImporter options Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar NetworkImporter options Options:

Name Description –data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15). –organism name The name or taxonomy id of an organism whose genes should be considered. –filename path Path to a file containing either interaction or profile data. Supported types of data include: unweighted networks GENE1 \t GENE2 weighted networks GENE1 \t GENE2 \t SCORE expression profiles GENE \t EXPR1 ( \t EXPR2 ... ) SOFT-formatted expression profiles (e.g. from GEO) –name network-name The name of the new network. –description description Optional. A description of the new network. –group network-type Optional. The network group to which the new network will be added. If this group does not exist, it will be created. Defaults to other. –group-description description Optional. A short description for a network group being created. Only applicable when the group specified by --group does not already exist. –color RRGGBB Optional. The colour of the network group being created. Only applicable when the group specified by --group does not already exist. Defaults to 000000 (i.e. black). –verbose Optional. Makes NetworkImporter print more details about what’s happening.

Cross Validator

Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall.

Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar CrossValidator options Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar CrossValidator options Options:

Name Description –data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15). –organism name The name or taxonomy id of an organism whose genes should be considered. –query file-name Perform validation against the gene sets listed in the given file. It must be formatted this way. –networks network-list A comma-separated list of network types and/or network names. To get a full listing of network names, use the option --list-networks with Query Runner. –exclude-networks network-list Optional. A comma-separated list of network types and/or network names to exclude from the --networks list. –folds number Optional. The number of folds to use during cross validation. Defaults to 5. –min number Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10. –max number Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300. –use-go-cache Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300. –outfile file-name Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console). –auto-negatives Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file. –method weighting-method Optional. The weighting method to use when combining the individual networks. Defaults to automatic. –seed number Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random. –threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1. –verbose Optional. Makes CrossValidator print more details about what’s happening. Query File Format:

Multiple gene sets may be used during cross validation. Each gene set should be on its own line using the format below:

GENE_SET_ID \t + \t gene_symbol1 [ \t gene_symbol2 ... ] [ \t - \t neg_gene_symbol1 [ \t neg_gene_symbol2 ... ] ] …where GENE_SET_ID is the name of your gene set, gene_symbol is a positive gene example, and neg_gene_symbol is a negative gene example (i.e. definitely not a member of the gene set).

If --use-go-cache is also specified, the query file should contain one GO id per line (e.g. GO:0005786).

Example Query File (S. Cerevisiae):

This query file only lists positive examples of genes. Use the option --auto-negatives to automatically label all other genes in each set as negative examples.

GO:0005786 + SCR1 SRP54 SEC65 SRP14 SRP68 SRP21 SRP72 GO:0022626 + RPS21A RPS21B HEF3 RPS8B RDN18-2 RDN18-1 RPL9A RPL9B RPS11B RPS11A RPS29A RPS29B RPS14A RPL1A RPL1B YGR054W RPS19B RPS19A RPS6B RDN5-1 RDN5-2 RDN5-3 RDN5-4 RDN5-5 RDN5-6 RPL24B RPL8B RPL8A RPL24A RPS22A RPS12 RPS22B RPL18A FES1 RPL10 RPS8A RPL41A RPL42A ASC1 RPS18A RPS18B SQT1 RPL14A RPL31A RPL31B RPL14B RPS2 RPL37B RPL16B RPL16A RPL37A RPS17A RPS17B RPS27B RPL27B RPL27A RPL5 RPL3 RPL7B RPL7A NMD3 RPL41B RPL11B RPL11A RPP2A TIF5 RPP2B RPL20B RPL20A RPS16B RPL17A RPL17B RPS16A RPL26A RPL26B RPS7A RPL6A RPL6B RPS28B RPS28A RDN25-1 TEF1 SIS1 RRP14 RPS31 REI1 RDN25-2 JJJ1 RPL42B RPL35A RPL35B RPL18B RPS5 RPS3 RPS25A RPS25B RPS15 RPL13A RPL13B RDN58-2 RDN58-1 RPS9B RPL22A RPL22B RPS9A RPL36A RPS4A RPS4B RPL36B RPS30B RPS20 RPS30A RPS26A NAT1 RPS26B RPL19B NAT5 RPL19A GCN1 GCN2 RPS7B RPS6A RPL4B RPL4A ARX1 RPL21A RPL21B RPS13 RPP1A RPP1B RPS23B RPL23B RPL23A RPS23A RPL40A RPL40B RPS14B ARD1 MAP1 NIP7 RPS10A RPL29 RPL28 RPL25 GCN20 RPL15B RPL15A RPS10B RPS0A RPS0B RLI1 RPL34B RPL34A RPL43A RPL43B RPS24B RPS24A FUN12 RPS27A RPL2A RPL2B PAT1 RPL38 RPL39 STM1 RPL32 RPP0 RPL30 RPS1B RPS1A RPL33B RPL12B RPL12A RPL33A

Network Assessor

Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported.

Usage (32-bit JVM):

java -Xmx1800M -jar GeneMANIA.jar NetworkAssessor options Usage (64-bit JVM):

java -d64 -Xmx3G -jar GeneMANIA.jar NetworkAssessor options Options:

Name Description –data directory Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2013-10-15). –organism name The name or taxonomy id of an organism whose genes should be considered. –query file-name Perform validation against the gene sets listed in the given file. It must be formatted this way. –baseline network-list A comma-separated list of network types and/or network names to use as a baseline for comparison. To get a full listing of network names, use the option --list-networks with Query Runner. –exclude-baseline network-list Optional. A comma-separated list of network types and/or network names to exclude from the --baseline list. –networks network-list A comma-separated list of network types and/or network names representing the networks to assess. To get a full listing of network names, use the option --list-networks with Query Runner. –exclude-networks network-list Optional. A comma-separated list of network types and/or network names to exclude from the --networks list. –folds number Optional. The number of folds to use during cross validation. Defaults to 5. –min number Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10. –max number Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300. –use-go-cache Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300. –outfile file-name Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console). –auto-negatives Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file. –method weighting-method Optional. The weighting method to use when combining the individual networks. Defaults to automatic. –seed number Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random. –threads number Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1. –verbose Optional. Makes NetworkAssessor print more details about what’s happening. Query File Format:

Network Assessor uses the same query file format as Cross Validator.

Validation Set Maker

Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations.

Usage (32-bit JVM): java -Xmx900M -jar GeneMANIA.jar ValidationSetMaker options Usage (64-bit JVM): java -d64 -Xmx3G -jar GeneMANIA.jar ValidationSetMaker options Options:

Name	Description
–organism name	The name or taxonomy id of an organism whose genes should be considered.
–query filename	The file where the resulting validation set should be saved.
–db JDBC-connection-string	Optional. A JDBC connection string for a GO MySQL database. No other database backends are currently supported. Defaults to EBI’s MySQL instance (i.e. jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest?user=go_select&password=amigo)
–branch GO-branch	Optional. One of bp, mf, cc, or all, which selects GO categories from the biological process, molecular function, cellular component, or all branches, respectively. Defaults to all.
Common Options

Organisms:

Name	Taxonomy Id
A. Thaliana	3702
C. Elegans	6239
D. Melanogaster	7227
H. Sapiens	9606
M. Musculus	10090
S. Cerevisiae	4932
R. Norvegicus	10116

###Networks

Networks may be specified by type or by name. To get a full listing of network names, use the option --list-networks. Available Network Types:

coexp	Co-expression
coloc	Co-localization
gi	Genetic interactions
path	Pathway interactions
pi	Physical interactions
predict	Predicted
spd	Shared protein domains
other	Networks that don’t belong to any of the above types.
default	The default set of networks used by the Cytoscape plugin and genemania.org.
all	Shorthand for specifying all available networks
preferred	Shorthand for coexp,pi,gi. Typically used for cross validation.

Weighting Methods

automatic	Default — The networks are weighted such that the query genes interact as much as possible.
Note: This option corresponds to the query gene-based combining method on the web site. If you want the same behaviour as the web site’s automatic combining method, use automatic_relevance.

automatic_relevance	A weighting method is chosen based on your query. This is the same behaviour as the “Automatically selected weighting method” option on the web site.
average	All networks are weighted equally.
average_category	Networks are weighted such that each type of network has the same overall weight.
For Organisms With GO Annotations:
bp	Networks are weighted in an attempt to reproduce Gene Ontology Biological Process co-annotation patterns.
mf	Networks are weighted in an attempt to reproduce Gene Ontology Molecular Function co-annotation patterns.
cc	Networks are weighted in an attempt to reproduce Gene Ontology Cellular Component co-annotation patterns.
Clone this wiki locally