pyGCluster.html



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>2. Module pyGCluster &mdash; pyGCluster 0.18.4 documentation</title>
    
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '',
        VERSION:     '0.18.4',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="pyGCluster 0.18.4 documentation" href="index.html" />
    <link rel="next" title="3. Usage" href="usage.html" />
    <link rel="prev" title="1. Introduction" href="intro.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="usage.html" title="3. Usage"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="intro.html" title="1. Introduction"
             accesskey="P">previous</a> |</li>
        <li><a href="index.html">pyGCluster 0.18.4 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="module-pyGCluster">
<span id="module-pygcluster"></span><h1>2. Module pyGCluster<a class="headerlink" href="#module-pyGCluster" title="Permalink to this headline">¶</a></h1>
<p>pyGCluster is a clustering algorithm focusing on noise injection for subsequent cluster validation.
By requesting identical cluster identity, the reproducibility of a large amount of clusters
obtained with agglomerative hierarchical clustering (AHC) is assessed.
Furthermore, a multitude of different distance-linkage combinations (DLCs) are evaluated.
Finally, associations of highly reproducible clusters, called communities, are created.
Graphical representation of the results as node maps and expression maps is implemented.</p>
<dl class="docutils">
<dt>The pyGCluster module contains the main class <a class="reference internal" href="#pyGCluster.Cluster" title="pyGCluster.Cluster"><tt class="xref py py-class docutils literal"><span class="pre">pyGCluster.Cluster</span></tt></a> and some functions</dt>
<dd><div class="first last line-block">
<div class="line"><a class="reference internal" href="#pyGCluster.create_default_alphabet" title="pyGCluster.create_default_alphabet"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.create_default_alphabet()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.resampling_multiprocess" title="pyGCluster.resampling_multiprocess"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.resampling_multiprocess()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.seekAndDestry" title="pyGCluster.seekAndDestry"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.seekAndDestry()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.yield_noisejected_dataset" title="pyGCluster.yield_noisejected_dataset"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.yield_noisejected_dataset()</span></tt></a></div>
</div>
</dd>
</dl>
<dl class="class">
<dt id="pyGCluster.Cluster">
<em class="property">class </em><tt class="descclassname">pyGCluster.</tt><tt class="descname">Cluster</tt><big>(</big><em>data=None</em>, <em>working_directory=None</em>, <em>verbosity_level=1</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster" title="Permalink to this definition">¶</a></dt>
<dd><p>The pyGCluster class</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>working_directory</strong> (<em>string</em>) &#8211; directory in which all results are written (requires write-permission!).</li>
<li><strong>verbosity_level</strong> (<em>int</em>) &#8211; either 0, 1 or 2.</li>
<li><strong>data</strong> (<em>dict</em>) &#8211; Dictionary containing the data which is to be clustered.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>In order to work with the default noise-injection function as well as plot
expression maps correctly, the data-dict <strong>has</strong> to have the following
structure.</p>
<p>Example:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
<span class="gp">... </span>           <span class="n">Identifier1</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span>                           <span class="n">condition1</span> <span class="p">:</span>  <span class="p">(</span> <span class="n">mean11</span><span class="p">,</span> <span class="n">sd11</span> <span class="p">),</span>
<span class="gp">... </span>                           <span class="n">condition2</span> <span class="p">:</span>  <span class="p">(</span> <span class="n">mean12</span><span class="p">,</span> <span class="n">sd12</span> <span class="p">),</span>
<span class="gp">... </span>                           <span class="n">condition3</span> <span class="p">:</span>  <span class="p">(</span> <span class="n">mean13</span><span class="p">,</span> <span class="n">sd13</span> <span class="p">),</span>
<span class="gp">... </span>            <span class="p">},</span>
<span class="gp">... </span>           <span class="n">Identifier2</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span>                           <span class="n">condition2</span> <span class="p">:</span>  <span class="p">(</span> <span class="n">mean22</span><span class="p">,</span> <span class="n">sd22</span> <span class="p">),</span>
<span class="gp">... </span>                           <span class="n">condition3</span> <span class="p">:</span>  <span class="p">(</span> <span class="n">mean23</span><span class="p">,</span> <span class="n">sd23</span> <span class="p">),</span>
<span class="gp">... </span>                           <span class="n">condition3</span> <span class="p">:</span>  <span class="p">(</span> <span class="n">mean13</span><span class="p">,</span> <span class="n">sd13</span> <span class="p">),</span>
<span class="gp">... </span>            <span class="p">},</span>
<span class="gp">... </span><span class="p">}</span>
<span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">pyGCluster</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span> <span class="o">=</span> <span class="n">pyGCluster</span><span class="o">.</span><span class="n">Cluster</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">verbosity_level</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">working_directory</span><span class="o">=...</span><span class="p">)</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If any condition for an identifier in the &#8220;nested_data_dict&#8221;-dict is missing,
this entry is discarded, i.e. not imported into the Cluster Class.
This is because pyGCluster does not implement any missing value estimation.
One possible solution is to replace missing values by a mean value and a standard
deviation that is representative for the complete data range in the given condition.</p>
</div>
<p>pyGCluster inherits from the regular Python Dictionary object.
Hence, the attributes of pyGCluster can be accessed as Python Dictionary keys.</p>
<p>A selection of the most important attributes / keys are:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="c"># general</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Working directory&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># this is the directory where all pyGCluster results</span>
<span class="gp">... </span>    <span class="c"># (pickle objects, expression maps, node map, ...) are saved into.</span>
<span class="go">/Users/Shared/moClusterDirectory</span>
<span class="gp">&gt;&gt;&gt; </span><span class="c"># original data ca be accessed via</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Data&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># this collections.OrderedDict contains the data that has been</span>
<span class="gp">... </span>    <span class="c"># or will be clustered (see also below).</span>
<span class="gp">... </span><span class="n">plenty</span> <span class="n">of</span> <span class="n">data</span> <span class="p">;)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Conditions&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># sorted list of all conditions that are defined in the &quot;Data&quot;-dictionary</span>
<span class="go">[ &#39;condition1&#39;, &#39;condition2&#39;, &#39;condition3&#39; ]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Identifiers&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># sorted tuple of all identifiers, i.e. ClusterClass[ &#39;Data&#39; ].keys()</span>
<span class="go">( &#39;Identifier1&#39;, &#39;Identifier2&#39; , ... &#39;IdentifierN&#39; )</span>
<span class="gp">&gt;&gt;&gt; </span><span class="c"># re-sampling paramerters</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Iterations&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># the number of datasets that were clustered.</span>
<span class="go">1000000</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Cluster 2 clusterID&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># dictionary with clusters as keys, and their respective row index</span>
<span class="gp">... </span>    <span class="c"># in the &quot;Cluster count&quot;-matrix (= clusterID) as values.</span>
<span class="go">{ ... }</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Cluster counts&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># numpy.uint32 matrix holding the counts for each</span>
<span class="gp">... </span>    <span class="c"># distance-linkage combination of the clusters.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Distance-linkage combinations&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># sorted list containing the distance-linkage combinations</span>
<span class="gp">... </span>    <span class="c"># that were evaluted in the re-sampling routine.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="c"># Communities</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Communities&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># see function pyGCluster.Cluster.build_nodemap for further information.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="c"># Visualization</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;Additional labels&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># dictionary with an identifier of the &quot;Data&quot;-dict as key,</span>
<span class="gp">... </span>    <span class="c"># and a list of additional information (e.g. annotation, GO terms) as value.</span>
<span class="go">{</span>
<span class="go">    &#39;Identifier1&#39; :</span>
<span class="go">                [&#39;Photosynthesis related&#39; , &#39;zeroFactor: 12.31&#39; ],</span>
<span class="go">    &#39;Identifier2&#39; : [ ... ] ,</span>
<span class="go">     ...</span>
<span class="go">}</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">&#39;for IO skip clusters bigger than&#39;</span> <span class="p">]</span>
<span class="gp">... </span>    <span class="c"># Default = 100. Since some clusters are really large</span>
<span class="gp">... </span>    <span class="c"># (with sizes close to the root (the cluster holding all objects)),</span>
<span class="gp">... </span>    <span class="c"># clusters with more objects than this value</span>
<span class="gp">... </span>    <span class="c"># are not plotted as expression maps or expression profile plots.</span>
</pre></div>
</div>
<p>pyGCluster offers the possibility to save the analysis (e.g. after re-sampling)
via <a class="reference internal" href="#pyGCluster.Cluster.save" title="pyGCluster.Cluster.save"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.save()</span></tt></a> , and continue
via <a class="reference internal" href="#pyGCluster.Cluster.load" title="pyGCluster.Cluster.load"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.load()</span></tt></a>
Initializes  pyGCluster.Cluster class</p>
<p>Classically, users start the multiprocessing clustering routine with multiple
distance linkage combinations via the <a class="reference internal" href="#pyGCluster.Cluster.do_it_all" title="pyGCluster.Cluster.do_it_all"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.do_it_all()</span></tt></a>
function. This function allows to update the pyGCluster class with all user
parameters before it calls <a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a>.
The main advantage in calling <a class="reference internal" href="#pyGCluster.Cluster.do_it_all" title="pyGCluster.Cluster.do_it_all"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.do_it_all()</span></tt></a> is
that all general plotting functions are called afterwards as well, these are:</p>
<blockquote>
<div><div class="line-block">
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.plot_clusterfreqs" title="pyGCluster.Cluster.plot_clusterfreqs"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.plot_clusterfreqs()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.build_nodemap" title="pyGCluster.Cluster.build_nodemap"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.build_nodemap()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.draw_community_expression_maps" title="pyGCluster.Cluster.draw_community_expression_maps"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_community_expression_maps()</span></tt></a></div>
</div>
</div></blockquote>
<p>If one choses, one can manually update the parameters (setting the key, value
pairs in pyGCluster) and then evoke <a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a>
with the appropriate parameters. This useful if certain memory intensive
distance-linkage combinations are to be clustered on a specific computer.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Cluster Class can be initilized empty and filled using <a class="reference internal" href="#pyGCluster.Cluster.load" title="pyGCluster.Cluster.load"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.load()</span></tt></a></p>
</div>
<dl class="method">
<dt id="pyGCluster.Cluster.build_nodemap">
<tt class="descname">build_nodemap</tt><big>(</big><em>min_cluster_size=4</em>, <em>top_X_clusters=0</em>, <em>threshold_4_the_lowest_max_freq=0.01</em>, <em>starting_min_overlap=0.1</em>, <em>increasing_min_overlap=0.05</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.build_nodemap"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.build_nodemap" title="Permalink to this definition">¶</a></dt>
<dd><p>Construction of communities from a set of most_frequent_cluster.
This set is obtained via <tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster._get_most_frequent_clusters()</span></tt>, to which the first three parameters are passed.
These clusters are then subjected to AHC with complete linkage.
The distance matrix is calculated via <a class="reference internal" href="#pyGCluster.Cluster.calculate_distance_matrix" title="pyGCluster.Cluster.calculate_distance_matrix"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.calculate_distance_matrix()</span></tt></a>.
The combination of complete linkage and the distance matrix assures that all clusters in a community exhibit at least the &#8220;starting_min_overlap&#8221; to each other.
From the resulting cluster tree, a &#8220;first draft&#8221; of communities is obtained.
These &#8220;first&#8221; communities are then themselves considered as clusters, and subjected to AHC again, until the community assignment of clusters remains constant.
By this, clusters are inserted into a target community, which initially did not overlap with each cluster inside the target community,
but do overlap if the clusters in the target community are combined into a single cluster.
By this, the degree of stringency is reduced; the clusters fit into a community in a broader sense.
For further information on the community construction, see the publication of pyGCluster.</p>
<dl class="docutils">
<dt>Internal structure of communities:</dt>
<dd><div class="first last highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">name</span> <span class="o">=</span> <span class="p">(</span> <span class="n">cluster</span><span class="p">,</span> <span class="n">level</span> <span class="p">)</span>
<span class="gp">... </span>        <span class="c"># internal name of the community.</span>
<span class="gp">... </span>        <span class="c"># The first element in the tuple (&quot;cluster&quot;) contains the indices</span>
<span class="gp">... </span>        <span class="c"># of the objects that comprise a community.</span>
<span class="gp">... </span>        <span class="c"># The second element gives the level,</span>
<span class="gp">... </span>        <span class="c"># or iteration when the community was formed.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="bp">self</span><span class="p">[</span> <span class="s">&#39;Communities&#39;</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">&#39;children&#39;</span> <span class="p">]</span>
<span class="gp">... </span>        <span class="c"># list containing the clusters that build the community.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="bp">self</span><span class="p">[</span> <span class="s">&#39;Communities&#39;</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">&#39;# of nodes merged into community&#39;</span> <span class="p">]</span>
<span class="gp">... </span>        <span class="c"># the number of clusters that build the community.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="bp">self</span><span class="p">[</span> <span class="s">&#39;Communities&#39;</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">&#39;index 2 obCoFreq dict&#39;</span> <span class="p">]</span>
<span class="gp">... </span>        <span class="c"># an OrderedDict in which each index is assigned its obCoFreq.</span>
<span class="gp">... </span>        <span class="c"># Negative indices correspond to &quot;placeholders&quot;,</span>
<span class="gp">... </span>        <span class="c"># which are required for the insertion of black lines into expression maps.</span>
<span class="gp">... </span>        <span class="c"># Black lines in expression maps seperate the individual clusters</span>
<span class="gp">... </span>        <span class="c"># that form a community, sorted by when</span>
<span class="gp">... </span>        <span class="c"># they were inserted into the community.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="bp">self</span><span class="p">[</span> <span class="s">&#39;Communities&#39;</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">&#39;highest obCoFreq&#39;</span> <span class="p">]</span>
<span class="gp">... </span>        <span class="c"># the highest obCoFreq encountered in a community.</span>
<span class="gp">&gt;&gt;&gt; </span><span class="bp">self</span><span class="p">[</span> <span class="s">&#39;Communities&#39;</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">&#39;cluster ID&#39;</span> <span class="p">]</span>
<span class="gp">... </span>        <span class="c"># the ID of the cluster containing the object with the highest obCoFreq.</span>
</pre></div>
</div>
</dd>
</dl>
<p>Of the following parameters, the first three are passed to <tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster._get_most_frequent_clusters()</span></tt>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_cluster_size</strong> (<em>int</em>) &#8211; clusters smaller than this threshold are not considered for the community construction.</li>
<li><strong>top_X_clusters</strong> (<em>int</em>) &#8211; form communities from the top X clusters sorted by their maximum frequency.</li>
<li><strong>threshold_4_the_lowest_max_freq</strong> (<em>float</em>) &#8211; [0, 1[ form communities from clusters whose maximum frequency is at least this value.</li>
<li><strong>starting_min_overlap</strong> (<em>float</em>) &#8211; ]0, 1[ minimum required relative overlap between clusters so that they are assigned the same community. The relative overlap is defined as the size of the overlap between two clusters, divided by the size of the larger cluster.</li>
<li><strong>increasing_min_overlap</strong> (<em>float</em>) &#8211; defines the increase of the required overlap between communities</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.calculate_distance_matrix">
<tt class="descname">calculate_distance_matrix</tt><big>(</big><em>clusters</em>, <em>min_overlap=0.25</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.calculate_distance_matrix"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.calculate_distance_matrix" title="Permalink to this definition">¶</a></dt>
<dd><dl class="docutils">
<dt>Calculates the specifically developed distance matrix for the AHC of clusters:</dt>
<dd><ol class="first last arabic simple">
<li>Clusters sharing <em>not</em> the minimum overlap are attributed a distance of &#8220;self[ &#8216;Root size&#8217; ]&#8221; (i.e. len( self[ &#8216;Data&#8217; ] ) ).</li>
<li>Clusters are attributed a distance of &#8220;self[ &#8216;Root size&#8217; ] - 1&#8221; to the root cluster.</li>
<li>Clusters sharing the minimum overlap are attributed a distance of &#8220;size of the larger of the two clusters minus size of the overlap&#8221;.</li>
</ol>
</dd>
</dl>
<p>The overlap betweeen a pair of clusters is relative, i.e. defined as the size of the overlap divided by the size of the larger of the two clusters.</p>
<p>The resulting condensed distance matrix in not returned, but rather stored in self[ &#8216;Nodemap - condensed distance matrix&#8217; ].</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>clusters</strong> (<em>list of clusters. Clusters are represented as tuples consisting of their object&#8217;s indices.</em>) &#8211; the most frequent clusters whose &#8220;distance&#8221; is to be determined.</li>
<li><strong>min_overlap</strong> (<em>float</em>) &#8211; ]0, 1[ threshold value to determine if the distance between two clusters is calculated according to (1) or (3).</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.check4convergence">
<tt class="descname">check4convergence</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.check4convergence"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.check4convergence" title="Permalink to this definition">¶</a></dt>
<dd><p>Checks if the re-sampling routine may be terminated, because the number of most frequent clusters remains almost constant.
This is done by examining a plot of the amount of most frequent clusters vs. the number of iterations.
Convergence is declared once the median normalized slope in a given window of iterations is equal or below &#8220;iter_tol&#8221;.
For further information see Supplementary Material of the corresponding publication.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">boolean</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.check_if_data_is_log2_transformed">
<tt class="descname">check_if_data_is_log2_transformed</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.check_if_data_is_log2_transformed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.check_if_data_is_log2_transformed" title="Permalink to this definition">¶</a></dt>
<dd><p>Simple check if any value of the data_tuples (i.e. any mean) is below zero.
Below zero indicates that the input data was log2 transformed.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">boolean</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.convergence_plot">
<tt class="descname">convergence_plot</tt><big>(</big><em>filename='convergence_plot.pdf'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.convergence_plot"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.convergence_plot" title="Permalink to this definition">¶</a></dt>
<dd><p>Creates a two-sided PDF file containing the full picture of the convergence plot, as well as a zoom of it.
The convergence plot illustrates the development of the amount of most frequent clusters vs. the number of iterations.
The dotted line in this plots represents the normalized slope, which is used for internal convergence determination.</p>
<p>If rpy2 cannot be imported, a CSV file is created instead.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) &#8211; the filename of the PDF (or CSV) file.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.create_rainbow_colors">
<tt class="descname">create_rainbow_colors</tt><big>(</big><em>n_colors=10</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.create_rainbow_colors"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.create_rainbow_colors" title="Permalink to this definition">¶</a></dt>
<dd><p>Returns a list of rainbow colors. Colors are expressed as hexcodes of RGB values.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>n_colors</strong> (<em>int</em>) &#8211; number of rainbow colors.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">list</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.delete_resampling_results">
<tt class="descname">delete_resampling_results</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.delete_resampling_results"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.delete_resampling_results" title="Permalink to this definition">¶</a></dt>
<dd><p>Resets all variables holding any result of the re-sampling process.
This includes the convergence determination as well as the community structure.
Does not delete the data that is intended to be clustered.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">None</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.do_it_all">
<tt class="descname">do_it_all</tt><big>(</big><em>working_directory=None</em>, <em>distances=None</em>, <em>linkages=None</em>, <em>function_2_generate_noise_injected_datasets=None</em>, <em>min_cluster_size=4</em>, <em>alphabet=None</em>, <em>force_plotting=False</em>, <em>min_cluster_freq_2_retain=0.001</em>, <em>pickle_filename='pyGCluster_resampled.pkl'</em>, <em>cpus_2_use=None</em>, <em>iter_max=250000</em>, <em>iter_tol=1e-07</em>, <em>iter_step=5000</em>, <em>iter_top_P=0.001</em>, <em>iter_window=50000</em>, <em>iter_till_the_end=False</em>, <em>top_X_clusters=0</em>, <em>threshold_4_the_lowest_max_freq=0.01</em>, <em>starting_min_overlap=0.1</em>, <em>increasing_min_overlap=0.05</em>, <em>color_gradient='1337'</em>, <em>box_style='classic'</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>additional_labels=None</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.do_it_all"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.do_it_all" title="Permalink to this definition">¶</a></dt>
<dd><p>Evokes all necessary functions which constitute the main functionality of pyGCluster.
This is AHC clustering with noise injection and a variety of DLCs,
in order to identify highly reproducible clusters,
followed by a meta-clustering of highly reproducible clusters into so-called &#8216;communities&#8217;.</p>
<p>The functions that are called are:</p>
<blockquote>
<div><ul class="simple">
<li><a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.build_nodemap" title="pyGCluster.Cluster.build_nodemap"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.build_nodemap()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.draw_community_expression_maps" title="pyGCluster.Cluster.draw_community_expression_maps"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_community_expression_maps()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.draw_expression_profiles" title="pyGCluster.Cluster.draw_expression_profiles"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_profiles()</span></tt></a></li>
</ul>
</div></blockquote>
<p>For a complete list of possible 
Distance matrix calculations
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/spatial.distance.html">http://docs.scipy.org/doc/scipy/reference/spatial.distance.html</a>
or Linkage methods
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html">http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html</a></p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If memory is of concern (e.g. for a large dataset, &gt; 5000 objects), cpus_2_use should be kept low.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>distances</strong> (<em>list</em>) &#8211; list of distance metrices, given as strings, e.g. [ &#8216;correlation&#8217;, &#8216;euclidean&#8217; ]</li>
<li><strong>linkages</strong> (<em>list</em>) &#8211; list of distance metrices, given as strings, e.g. [ &#8216;average&#8217;, &#8216;complete&#8217;, &#8216;ward&#8217; ]</li>
<li><strong>function_2_generate_noise_injected_datasets</strong> (<em>function</em>) &#8211; function to generate noise-injected datasets. If None (default), Gaussian distributions are used.</li>
<li><strong>min_cluster_size</strong> (<em>int</em>) &#8211; minimum size of a cluster, so that it is included in the assessment of cluster reproducibilities.</li>
<li><strong>alphabet</strong> (<em>string</em>) &#8211; alphabet used to convert decimal indices to characters to save memory. Defaults to string.printable, without &#8216;,&#8217;.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If alphabet contains &#8216;,&#8217;, this character is removed from alphabet, because the indices comprising a cluster are saved comma-seperated.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>force_plotting</strong> (<em>boolean</em>) &#8211; the convergence plot is created after each iter_step iteration (otherwise only when convergence is detected).</li>
<li><strong>min_cluster_freq_2_retain</strong> (<em>float</em>) &#8211; ]0, 1[ minimum frequency of a cluster (only the maximum of the dlc-frequencies matters here) it has to exhibit to be stored in pyGCluster once all iterations are finished.</li>
<li><strong>cpus_2_use</strong> (<em>int</em>) &#8211; number of threads that are evoked in the re-sampling routine.</li>
<li><strong>iter_max</strong> (<em>int</em>) &#8211; maximum number of re-sampling iterations.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Convergence determination:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>iter_tol</strong> (<em>float</em>) &#8211; ]0, 1e-3[ value for the threshold of the median of normalized slopes, in order to declare convergence.</li>
<li><strong>iter_step</strong> (<em>int</em>) &#8211; number of iterations each multiprocess performs and simultaneously the interval in which to check for convergence.</li>
<li><strong>iter_top_P</strong> (<em>float</em>) &#8211; ]0, 1[ for the convergence estmation, the amount of most frequent clusters is examined. This is the threshold for the minimum frequency of a cluster to be included.</li>
<li><strong>iter_window</strong> (<em>int</em>) &#8211; size of the sliding window in iterations. The median is obtained from normalized slopes inside this window - <em>should be a multiple of iter_step</em></li>
<li><strong>iter_till_the_end</strong> (<em>boolean</em>) &#8211; if set to True, the convergence determination is switched off; hence, re-sampling is performed until iter_max is reached.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Output/Plotting:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>pickle_filename</strong> (<em>string</em>) &#8211; Filename of the output pickle object</li>
<li><strong>top_X_clusters</strong> (<em>int</em>) &#8211; Plot of the top X clusters in the sorted list (by freq) of clusters having a maximum cluster frequency of at least threshold_4_the_lowest_max_freq (clusterfreq-plot is still sorted by size).</li>
<li><strong>threshold_4_the_lowest_max_freq</strong> (<em>float</em>) &#8211; ]0, 1[ Clusters must have a maximum frequency of at least threshold_4_the_lowest_max_freq to appear in the plot.</li>
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) &#8211; lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be &lt; 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) &#8211; upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) &#8211; name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>expression_map_filename</strong> (<em>string</em>) &#8211; file name for expression map. .svg will be added if required.</li>
<li><strong>legend_filename</strong> (<em>string</em>) &#8211; file name for legend .svg will be added if required.</li>
<li><strong>box_style</strong> (<em>string</em>) &#8211; the way the relative standard deviation is visualized in the expression map. Currently supported are &#8216;modern&#8217;, &#8216;fusion&#8217; or &#8216;classic&#8217;.</li>
<li><strong>starting_min_overlap</strong> (<em>float</em>) &#8211; ]0, 1[ minimum required relative overlap between clusters so that they are assigned the same community. The relative overlap is defined as the size of the overlap between two clusters, divided by the size of the larger cluster.</li>
<li><strong>increasing_min_overlap</strong> (<em>float</em>) &#8211; defines the increase of the required overlap between communities</li>
<li><strong>additional_labels</strong> (<em>dict</em>) &#8211; dictionary, where additional labels can be defined which will be added in the expression map plots to the gene/protein names</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">None</p>
</td>
</tr>
</tbody>
</table>
<p>For more information to each parameter, please refer to <a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a>,
and the subsequent functions:
<a class="reference internal" href="#pyGCluster.Cluster.build_nodemap" title="pyGCluster.Cluster.build_nodemap"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.build_nodemap()</span></tt></a>,
<a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a>,
<a class="reference internal" href="#pyGCluster.Cluster.draw_community_expression_maps" title="pyGCluster.Cluster.draw_community_expression_maps"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_community_expression_maps()</span></tt></a>,
<a class="reference internal" href="#pyGCluster.Cluster.draw_expression_profiles" title="pyGCluster.Cluster.draw_expression_profiles"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_profiles()</span></tt></a>.</p>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.draw_community_expression_maps">
<tt class="descname">draw_community_expression_maps</tt><big>(</big><em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='1337'</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_community_expression_maps"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_community_expression_maps" title="Permalink to this definition">¶</a></dt>
<dd><p>Plots the expression map for each community showing its object composition.</p>
<p>The following parameters are passed to <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map" title="pyGCluster.Cluster.draw_expression_map"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map()</span></tt></a>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) &#8211; lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be &lt; 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) &#8211; upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) &#8211; name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>box_style</strong> (<em>string</em>) &#8211; name of box style used in SVG. Currently supported are classic, modern, fusion.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_map">
<tt class="descname">draw_expression_map</tt><big>(</big><em>identifiers=None</em>, <em>data=None</em>, <em>conditions=None</em>, <em>additional_labels=None</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>expression_map_filename=None</em>, <em>legend_filename=None</em>, <em>color_gradient=None</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_map"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_map" title="Permalink to this definition">¶</a></dt>
<dd><p>Draws expression map as SVG</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) &#8211; lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be &lt; 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) &#8211; upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) &#8211; name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>expression_map_filename</strong> (<em>string</em>) &#8211; file name for expression map. .svg will be added if required.</li>
<li><strong>legend_filename</strong> (<em>string</em>) &#8211; file name for legend .svg will be added if required.</li>
<li><strong>box_style</strong> (<em>string</em>) &#8211; the way the relative standard deviation is visualized in the expression map. Currently supported are &#8216;modern&#8217;, &#8216;fusion&#8217; or &#8216;classic&#8217;.</li>
<li><strong>additional_labels</strong> (<em>dict</em>) &#8211; dictionary, where additional labels can be defined which will be added in the expression map plots to the gene/protein names</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
<dl class="docutils">
<dt>Data has to be a nested dict in the following format:</dt>
<dd><div class="first last highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span> <span class="n">data</span> <span class="o">=</span>   <span class="p">{</span>
<span class="gp">... </span>        <span class="n">fastaID1</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span>                <span class="n">cond1</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">)</span> <span class="p">,</span> <span class="n">cond2</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">),</span> <span class="o">...</span>
<span class="gp">... </span>        <span class="p">}</span>
<span class="gp">... </span>        <span class="n">fastaID2</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span>                <span class="n">cond1</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">)</span> <span class="p">,</span> <span class="n">cond2</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">),</span> <span class="o">...</span>
<span class="gp">... </span>        <span class="p">}</span>
<span class="gp">... </span> <span class="p">}</span>
</pre></div>
</div>
</dd>
<dt>optional and, if needed, data will be extracted from</dt>
<dd><div class="first last line-block">
<div class="line">self[ &#8216;Data&#8217; ]</div>
<div class="line">self[ &#8216;Identifiers&#8217; ]</div>
<div class="line">self[ &#8216;Conditions&#8217; ]</div>
</div>
</dd>
</dl>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_map_for_cluster">
<tt class="descname">draw_expression_map_for_cluster</tt><big>(</big><em>clusterID=None</em>, <em>cluster=None</em>, <em>filename=None</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='default'</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_map_for_cluster"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_map_for_cluster" title="Permalink to this definition">¶</a></dt>
<dd><p>Plots an expression map for a given cluster.
Either the parameter &#8220;clusterID&#8221; or &#8220;cluster&#8221; can be defined.
This function is useful to plot a user-defined cluster, e.g. knowledge-based cluster (TCA-cluster, Glycolysis-cluster ...). In this case, the parameter &#8220;cluster&#8221; should be defined.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>clusterID</strong> (<em>int</em>) &#8211; ID of a cluster (those are obtained e.g. from the plot of cluster frequencies or the node map)</li>
<li><strong>cluster</strong> (<em>tuple</em>) &#8211; tuple containing the indices of the objects describing a cluster.</li>
<li><strong>filename</strong> (<em>string</em>) &#8211; name of the SVG file for the expression map.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>The following parameters are passed to <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map" title="pyGCluster.Cluster.draw_expression_map"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map()</span></tt></a>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) &#8211; lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be &lt; 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) &#8211; upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) &#8211; name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>box_style</strong> (<em>string</em>) &#8211; name of box style used in SVG. Currently supported are classic, modern, fusion.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_map_for_community_cluster">
<tt class="descname">draw_expression_map_for_community_cluster</tt><big>(</big><em>name</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='1337'</em>, <em>sub_folder=None</em>, <em>min_obcofreq_2_plot=None</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_map_for_community_cluster"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_map_for_community_cluster" title="Permalink to this definition">¶</a></dt>
<dd><p>Plots the expression map for a given &#8220;community cluster&#8221;:
Any cluster in the community node map is internally represented as a tuple with two elements:
&#8220;cluster&#8221; and &#8220;level&#8221;. Those objects are stored as keys in self[ &#8216;Communities&#8217; ],
from where they may be extracted and fed into this function.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>name</strong> (<em>tuple</em>) &#8211; &#8220;community cluster&#8221; -&gt; best obtain from self[ &#8216;Communities&#8217; ].keys()</li>
<li><strong>min_obcofreq_2_plot</strong> (<em>float</em>) &#8211; minimum obCoFreq of an cluster&#8217;s object to be shown in the expression map.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>The following parameters are passed to <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map" title="pyGCluster.Cluster.draw_expression_map"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map()</span></tt></a>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) &#8211; lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be &lt; 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) &#8211; upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) &#8211; name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>box_style</strong> (<em>string</em>) &#8211; name of box style used in SVG. Currently supported are classic, modern, fusion.</li>
<li><strong>sub_folder</strong> (<em>string</em>) &#8211; if specified, the expression map is saved in this folder, rather than in pyGCluster&#8217;s working directory.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_profiles">
<tt class="descname">draw_expression_profiles</tt><big>(</big><em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_profiles"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_profiles" title="Permalink to this definition">¶</a></dt>
<dd><p>Draws an expression profile plot (SVG) for each community, illustrating the main &#8220;expression pattern&#8221; of a community.
Each line in this plot represents an object. The &#8220;grey cloud&#8221; illustrates the range of the standard deviation of the mean values.
The plots are named prefixed by &#8220;exProf&#8221;, followed by the community name as it is shown in the node map.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>int</em>) &#8211; minimum of the y-axis (since data should be log2 values, this value should typically be &lt; 0).</li>
<li><strong>max_value_4_expression_map</strong> (<em>int</em>) &#8211; maximum for the y-axis.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.frequencies">
<tt class="descname">frequencies</tt><big>(</big><em>identifier=None</em>, <em>clusterID=None</em>, <em>cluster=None</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.frequencies"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.frequencies" title="Permalink to this definition">¶</a></dt>
<dd><p>Returns a tuple with (i) the cFreq and (ii) a Collections.DefaultDict containing the DLC:frequency pairs for either
an identifier, e.g. &#8220;JGI4|Chlre4|123456&#8221;
or clusterID
or cluster.
Returns &#8216;None&#8217; if the identifier is not part of the data set, or clusterID or cluster was not found during iterations.</p>
<p>Example:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">cFreq</span><span class="p">,</span> <span class="n">dlc_freq_dict</span> <span class="o">=</span> <span class="n">cluster</span><span class="o">.</span><span class="n">frequencies</span><span class="p">(</span> <span class="n">identifier</span> <span class="o">=</span> <span class="s">&#39;JGI4|Chlre4|123456&#39;</span> <span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dlc_freq_dict</span>
<span class="gp">... </span><span class="n">defaultdict</span><span class="p">(</span><span class="o">&lt;</span><span class="nb">type</span> <span class="s">&#39;float&#39;</span><span class="o">&gt;</span><span class="p">,</span>
<span class="gp">... </span><span class="p">{</span><span class="s">&#39;average-correlation&#39;</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span> <span class="s">&#39;complete-correlation&#39;</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span>
<span class="gp">... </span><span class="s">&#39;centroid-euclidean&#39;</span><span class="p">:</span> <span class="mf">0.0015</span><span class="p">,</span> <span class="s">&#39;median-euclidean&#39;</span><span class="p">:</span> <span class="mf">0.0064666666666666666</span><span class="p">,</span>
<span class="gp">... </span><span class="s">&#39;ward-euclidean&#39;</span><span class="p">:</span> <span class="mf">0.0041333333333333335</span><span class="p">,</span> <span class="s">&#39;weighted-correlation&#39;</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span>
<span class="gp">... </span><span class="s">&#39;complete-euclidean&#39;</span><span class="p">:</span> <span class="mf">0.0014</span><span class="p">,</span> <span class="s">&#39;weighted-euclidean&#39;</span><span class="p">:</span> <span class="mf">0.0066333333333333331</span><span class="p">,</span>
<span class="gp">... </span><span class="s">&#39;average-euclidean&#39;</span><span class="p">:</span> <span class="mf">0.0020333333333333332</span><span class="p">})</span>
</pre></div>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>identifier</strong> (<em>string</em>) &#8211; search frequencies by identifier input</li>
<li><strong>clusterID</strong> (<em>int</em>) &#8211; search frequencies by cluster ID input</li>
<li><strong>cluster</strong> (<em>tuple</em>) &#8211; search frequencies by cluster (tuple of ints) input</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">tuple</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.info">
<tt class="descname">info</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.info"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.info" title="Permalink to this definition">¶</a></dt>
<dd><p>Prints some information about the clustering via pyGCluster:</p>
<blockquote>
<div><ul class="simple">
<li>number of genes/proteins clustered</li>
<li>number of conditions defined</li>
<li>number of distance-linkage combinations</li>
<li>number of iterations performed</li>
</ul>
</div></blockquote>
<p>as well as some information about the communities, the legend for the shapes of nodes in the node map and the way the functions were called.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.load">
<tt class="descname">load</tt><big>(</big><em>filename</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.load"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.load" title="Permalink to this definition">¶</a></dt>
<dd><p>Fills a pyGCluster.Cluster object with the session saved as &#8220;filename&#8221;.
If &#8220;filename&#8221; is not a complete path, e.g. &#8220;example.pkl&#8221; (instead of &#8220;/home/user/Desktop/example.pkl&#8221;), the directory given by self[ &#8216;Working directory&#8217; ] is used.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<dl class="last docutils">
<dt>Loading of pyGCluster has to be performed as a 2-step-procedure:</dt>
<dd><div class="first last highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">LoadedClustering</span> <span class="o">=</span> <span class="n">pyGCluster</span><span class="o">.</span><span class="n">Cluster</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">LoadedClustering</span><span class="o">.</span><span class="n">load</span><span class="p">(</span> <span class="s">&quot;/home/user/Desktop/example.pkl&quot;</span> <span class="p">)</span>
</pre></div>
</div>
</dd>
</dl>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) &#8211; may be either a simple file name (&#8220;example.pkl&#8221;) or a complete path (e.g. &#8220;/home/user/Desktop/example.pkl&#8221;).</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.median">
<tt class="descname">median</tt><big>(</big><em>_list</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.median"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.median" title="Permalink to this definition">¶</a></dt>
<dd><p>Returns the median from a list of numeric values.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>_list</strong> (<em>list</em>) &#8211; </td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">int / float</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.plot_clusterfreqs">
<tt class="descname">plot_clusterfreqs</tt><big>(</big><em>min_cluster_size=4</em>, <em>top_X_clusters=0</em>, <em>threshold_4_the_lowest_max_freq=0.01</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.plot_clusterfreqs"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.plot_clusterfreqs" title="Permalink to this definition">¶</a></dt>
<dd><p>Plot the frequencies of each cluster as a expression map:
which cluster was found by which distance-linkage combination, and with what frequency?
The plot&#8217;s filename is prefixed by &#8216;clusterFreqsMap&#8217;, followed by the values of the parameters.
E.g. &#8216;clusterFreqsMap_minSize4_top0clusters_top10promille.svg&#8217;.
Clusters are sorted by size.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>min_cluster_size</strong> (<em>int</em>) &#8211; only clusters with a size equal or greater than min_cluster_size appear in the plot of the cluster freqs.</li>
<li><strong>threshold_4_the_lowest_max_freq</strong> (<em>float</em>) &#8211; ]0, 1[ Clusters must have a maximum frequency of at least threshold_4_the_lowest_max_freq to appear in the plot.</li>
<li><strong>top_X_clusters</strong> (<em>int</em>) &#8211; Plot of the top X clusters in the sorted list (by freq) of clusters having a maximum cluster frequency of at least threshold_4_the_lowest_max_freq (clusterfreq-plot is still sorted by size).</li>
</ul>
</td>
</tr>
</tbody>
</table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">if top_X_clusters is set to zero ( 0 ), this filter is switched off (switched off by default).</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">None</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.plot_mean_distributions">
<tt class="descname">plot_mean_distributions</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.plot_mean_distributions"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.plot_mean_distributions" title="Permalink to this definition">¶</a></dt>
<dd><p>Creates a density plot of mean values for each condition via rpy2.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.plot_nodetree">
<tt class="descname">plot_nodetree</tt><big>(</big><em>tree_filename='tree.dot'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.plot_nodetree"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.plot_nodetree" title="Permalink to this definition">¶</a></dt>
<dd><dl class="docutils">
<dt>plot the dendrogram for the clustering of the most_frequent_clusters.</dt>
<dd><ul class="first last">
<li><p class="first">node label = nodeID internally used for self[&#8216;Nodemap&#8217;] (not the same as clusterID!).</p>
</li>
<li><p class="first">node border color is white if the node is a close2root-cluster (i.e. larger than self[ &#8216;for IO skip clusters bigger than&#8217; ] ).</p>
</li>
<li><p class="first">edge label = distance between parent and children.</p>
</li>
<li><dl class="first docutils">
<dt>edge - color codes:</dt>
<dd><ul class="first last simple">
<li>black   = default; highlights child which is not a most_frequent_cluster but was created during formation of the dendrogram.</li>
<li>green   = children are connected with the root.</li>
<li>red     = highlights child which is a most_frequent_cluster.</li>
<li>yellow  = most_frequent_cluster is directly connected with the root.</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>tree_filename</strong> (<em>string</em>) &#8211; name of the Graphviz DOT file containing the dendrogram of the AHC of most frequent clusters. Best given with &#8221;.dot&#8221;-extension!</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.resample">
<tt class="descname">resample</tt><big>(</big><em>distances</em>, <em>linkages</em>, <em>function_2_generate_noise_injected_datasets=None</em>, <em>min_cluster_size=4</em>, <em>alphabet=None</em>, <em>force_plotting=False</em>, <em>min_cluster_freq_2_retain=0.001</em>, <em>pickle_filename='pyGCluster_resampled.pkl'</em>, <em>cpus_2_use=None</em>, <em>iter_tol=1e-07</em>, <em>iter_step=5000</em>, <em>iter_max=250000</em>, <em>iter_top_P=0.001</em>, <em>iter_window=50000</em>, <em>iter_till_the_end=False</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.resample"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.resample" title="Permalink to this definition">¶</a></dt>
<dd><p>Routine for the assessment of cluster reproducibility (re-sampling routine).
To this, a high number of noise-injected datasets are created, which are subsequently clustered by AHC.
Those are created via <tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.function_2_generate_noise_injected_datasets()</span></tt> (default = usage of Gaussian distributions).
Each &#8216;simulated&#8217; dataset is then subjected to AHC x times, where x equals the number of distance-linkage combinations that come from all possible combinations of &#8220;distances&#8221; and &#8220;linkages&#8221;.
In order to speed up the re-sampling routine, it is distributed to multiple threads, if cpus_2_use &gt; 1.</p>
<p>The re-sampling routine stops once either convergence (see below) is detected or iter_max iterations have been performed.
Eventually, only clusters with a maximum frequency of at least min_cluster_freq_2_retain are stored; all others are discarded.</p>
<p>In order to visually inspect convergence, a convergence plot is created.
For more information about the convergence estimation, see Supplementary Material of pyGCluster&#8217;s publication.</p>
<p>For a complete list of possible
Distance matrix calculations
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/spatial.distance.html">http://docs.scipy.org/doc/scipy/reference/spatial.distance.html</a>
or Linkage methods
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html">http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html</a></p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If memory is of concern (e.g. for a large dataset, &gt; 5000 objects), cpus_2_use should be kept low.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>distances</strong> (<em>list</em>) &#8211; list of distance metrices, given as strings, e.g. [ &#8216;correlation&#8217;, &#8216;euclidean&#8217; ]</li>
<li><strong>linkages</strong> (<em>list</em>) &#8211; list of distance metrices, given as strings, e.g. [ &#8216;average&#8217;, &#8216;complete&#8217;, &#8216;ward&#8217; ]</li>
<li><strong>function_2_generate_noise_injected_datasets</strong> (<em>function</em>) &#8211; function to generate noise-injected datasets. If None (default), Gaussian distributions are used.</li>
<li><strong>min_cluster_size</strong> (<em>int</em>) &#8211; minimum size of a cluster, so that it is included in the assessment of cluster reproducibilities.</li>
<li><strong>alphabet</strong> (<em>string</em>) &#8211; alphabet used to convert decimal indices to characters to save memory. Defaults to string.printable, without &#8216;,&#8217;.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If alphabet contains &#8216;,&#8217;, this character is removed from alphabet, because the indices comprising a cluster are saved comma-seperated.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>force_plotting</strong> (<em>boolean</em>) &#8211; the convergence plot is created after each iter_step iteration (otherwise only when convergence is detected).</li>
<li><strong>min_cluster_freq_2_retain</strong> (<em>float</em>) &#8211; ]0, 1[ minimum frequency of a cluster (only the maximum of the dlc-frequencies matters here) it has to exhibit to be stored in pyGCluster once all iterations are finished.</li>
<li><strong>cpus_2_use</strong> (<em>int</em>) &#8211; number of threads that are evoked in the re-sampling routine.</li>
<li><strong>iter_max</strong> (<em>int</em>) &#8211; maximum number of re-sampling iterations.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Convergence determination:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>iter_tol</strong> (<em>float</em>) &#8211; ]0, 1e-3[ value for the threshold of the median of normalized slopes, in order to declare convergence.</li>
<li><strong>iter_step</strong> (<em>int</em>) &#8211; number of iterations each multiprocess performs and simultaneously the interval in which to check for convergence.</li>
<li><strong>iter_top_P</strong> (<em>float</em>) &#8211; ]0, 1[ for the convergence estmation, the amount of most frequent clusters is examined. This is the threshold for the minimum frequency of a cluster to be included.</li>
<li><strong>iter_window</strong> (<em>int</em>) &#8211; size of the sliding window in iterations. The median is obtained from normalized slopes inside this window - <em>should be a multiple of iter_step</em></li>
<li><strong>iter_till_the_end</strong> (<em>boolean</em>) &#8211; if set to True, the convergence determination is switched off; hence, re-sampling is performed until iter_max is reached.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">None</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.save">
<tt class="descname">save</tt><big>(</big><em>filename='pyGCluster.pkl'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.save"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.save" title="Permalink to this definition">¶</a></dt>
<dd><p>Saves the current pyGCluster.Cluster object in a Pickle object.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) &#8211; may be either a simple file name (&#8220;example.pkl&#8221;) or a complete path (e.g. &#8220;/home/user/Desktop/example.pkl&#8221;). In the former case, the pickle is stored in pyGCluster&#8217;s working directory.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.write_dot">
<tt class="descname">write_dot</tt><big>(</big><em>filename</em>, <em>scaleByFreq=True</em>, <em>min_obcofreq_2_plot=None</em>, <em>n_legend_nodes=5</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='1337'</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.write_dot"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.write_dot" title="Permalink to this definition">¶</a></dt>
<dd><p>Writes a Graphviz DOT file representing the cluster composition of communities.
Herein, each node represents a cluster. Its name is a combination of the cluster&#8217;s ID, followed by the level / iteration it was inserted into the community:</p>
<blockquote>
<div><ul class="simple">
<li>The node&#8217;s size reflects the cluster&#8217;s cFreq.</li>
<li>The node&#8217;s shape illustrates by which distance metric the cluster was found (if the shape is a point, this illustrates that this cluster was not among the most_frequent_clusters, but only formed during AHC of clusters).</li>
<li>The node&#8217;s color shows the community membership; except for clusters which are larger than self[ &#8216;for IO skip clusters bigger than&#8217; ], those are highlighted in grey.</li>
<li>The node connecting all clusters is the root (the cluster holding all objects), which is highlighted in white.</li>
</ul>
</div></blockquote>
<p>The DOT file may be rendered with &#8220;Graphviz&#8221; or further processed with other appropriate programs such as e.g. &#8220;Gephi&#8221;.
If &#8220;Graphviz&#8221; is available, the DOT file is eventually rendered with &#8220;Graphviz&#8220;&#8216;s dot-algorithm.</p>
<p>In addition, a expression map for each cluster of the node map is created (via <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map_for_community_cluster" title="pyGCluster.Cluster.draw_expression_map_for_community_cluster"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map_for_community_cluster()</span></tt></a>).</p>
<p>Those are saved in the sub-folder &#8220;communityClusters&#8221;.</p>
<p>This function also calls <a class="reference internal" href="#pyGCluster.Cluster.write_legend" title="pyGCluster.Cluster.write_legend"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_legend()</span></tt></a>,
which creates a TXT file containing the object composition of all clusters, as well as their frequencies.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>filename</strong> (<em>string</em>) &#8211; file name of the Graphviz DOT file representing the node map, best given with extension &#8221;.dot&#8221;.</li>
<li><strong>scaleByFreq</strong> (<em>boolean</em>) &#8211; switch to either scale nodes (= clusters) by cFreq or apply a constant size to each node (the latter may be useful to put emphasis on the nodes&#8217; shapes).</li>
<li><strong>min_obcofreq_2_plot</strong> (<em>float</em>) &#8211; if defined, clusters with lower cFreq than this value are skipped, i.e. not plotted.</li>
<li><strong>n_legend_nodes</strong> (<em>int</em>) &#8211; number of nodes representing the legend for the node sizes. The node sizes themselves encode for the cFreq. &#8220;Legend nodes&#8221; are drawn as grey boxes.</li>
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) &#8211; lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be &lt; 0.</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) &#8211; upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) &#8211; name of the color gradient used for plotting the expression map.</li>
<li><strong>box_style</strong> (<em>string</em>) &#8211; the way the relative standard deviation is visualized in the expression map. Currently supported are &#8216;modern&#8217;, &#8216;fusion&#8217; or &#8216;classic&#8217;.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="method">
<dt id="pyGCluster.Cluster.write_legend">
<tt class="descname">write_legend</tt><big>(</big><em>filename='legend.txt'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.write_legend"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.write_legend" title="Permalink to this definition">¶</a></dt>
<dd><p>Creates a legend for the community node map as a TXT file.
Herein, the object composition of each cluster of the node map as well as its frequencies are recorded.
Since this function is internally called by <a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a>, it is typically not necessary to call this function.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) &#8211; name of the legend TXT file, best given with extension &#8221;.txt&#8221;.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

</dd></dl>

<dl class="function">
<dt id="pyGCluster.create_default_alphabet">
<tt class="descclassname">pyGCluster.</tt><tt class="descname">create_default_alphabet</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#create_default_alphabet"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.create_default_alphabet" title="Permalink to this definition">¶</a></dt>
<dd><p>Returns the default alphabet which is used to save clusters in a lesser memory-intense form:
instead of saving e.g. a cluster containing identifiers with indices of 1,20,30 as &#8220;1,20,30&#8221;, the indices are converted to a baseX system -&gt; &#8220;1,k,u&#8221;.</p>
<dl class="docutils">
<dt>The default alphabet that is returned is:</dt>
<dd><div class="first last highlight-python"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="n">string</span><span class="o">.</span><span class="n">printable</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span> <span class="s">&#39;,&#39;</span><span class="p">,</span> <span class="s">&#39;&#39;</span> <span class="p">)</span>
</pre></div>
</div>
</dd>
</dl>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">string</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="function">
<dt id="pyGCluster.resampling_multiprocess">
<tt class="descclassname">pyGCluster.</tt><tt class="descname">resampling_multiprocess</tt><big>(</big><em>DataQ=None</em>, <em>data=None</em>, <em>iterations=5000</em>, <em>alphabet=None</em>, <em>dlc=None</em>, <em>min_cluster_size=4</em>, <em>min_cluster_freq_2_retain=0.001</em>, <em>function_2_generate_noise_injected_datasets=None</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#resampling_multiprocess"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.resampling_multiprocess" title="Permalink to this definition">¶</a></dt>
<dd><p>This is the function that is called for each multiprocesses that is evoked internally in pyGCluster during the re-sampling routine.
Agglomerative hierarchical clustering is performed for each distance-linkage combination (DLC) on each of iteration datasets.
Clusters from each hierarchical tree are extracted, and their counts are saved in a temporary cluster-count matrix.
After <em>iterations</em> iterations, clusters are filtered according to min_cluster_freq_2_retain.
These clusters, together with their respective counts among all DLCs, are returned.
The return value is a list containing tuples with two elements: cluster (string) and counts ( one dimensional np.array )</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>DataQ</strong> (<em>multiprocessing.Queue()</em>) &#8211; data queue which is used to pipe the re-sampling results back to pyGCluster.</li>
<li><strong>data</strong> (<em>collections.OrderedDict()</em>) &#8211; dictionary ( OrderedDict! ) holding the data to be clustered -&gt; passed through to the noise-function.</li>
<li><strong>iterations</strong> (<em>int</em>) &#8211; the number of iterations this multiprocess is going to perform.</li>
<li><strong>alphabet</strong> (<em>string</em>) &#8211; in order to save memory, the indices describing a cluster are converted to a specific alphabet (rather than decimal system).</li>
<li><strong>dlc</strong> (<em>list</em>) &#8211; list of the distance-linkage combinations that are going to be evaluated.</li>
<li><strong>min_cluster_size</strong> (<em>int</em>) &#8211; minimum size of a cluster to be considered in the re-sampling routine (smaller clusters are discarded)</li>
<li><strong>min_cluster_freq_2_retain</strong> (<em>float</em>) &#8211; once all iterations are performed, clusters are filtered according to 50% (because typically forwarded from pyGCluster) of this threshold.</li>
<li><strong>function_2_generate_noise_injected_datasets</strong> (<em>function</em>) &#8211; function to generate re-sampled datasets.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">list</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="function">
<dt id="pyGCluster.seekAndDestry">
<tt class="descclassname">pyGCluster.</tt><tt class="descname">seekAndDestry</tt><big>(</big><em>processes</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#seekAndDestry"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.seekAndDestry" title="Permalink to this definition">¶</a></dt>
<dd><p>Any multiprocesses given by processes are terminated.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>processes</strong> (<em>list</em>) &#8211; list containing multiprocess.Process()</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>

<dl class="function">
<dt id="pyGCluster.yield_noisejected_dataset">
<tt class="descclassname">pyGCluster.</tt><tt class="descname">yield_noisejected_dataset</tt><big>(</big><em>data</em>, <em>iterations</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#yield_noisejected_dataset"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.yield_noisejected_dataset" title="Permalink to this definition">¶</a></dt>
<dd><p>Generator yielding a re-sampled dataset with each iteration.
A re-sampled dataset is created by re-sampling each data point
from the normal distribution given by its associated mean and standard deviation value.
See the example in Supplementary Material in pyGCluster&#8217;s publication for how to define an own noise-function (e.g. uniform noise).</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>data</strong> (<em>collections.OrderedDict()</em>) &#8211; dictionary ( OrderedDict! ) holding the data to be re-sampled.</li>
<li><strong>iterations</strong> (<em>int</em>) &#8211; the number of re-sampled datasets this generator will yield.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>

</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h4>Previous topic</h4>
  <p class="topless"><a href="intro.html"
                        title="previous chapter">1. Introduction</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="usage.html"
                        title="next chapter">3. Usage</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/pyGCluster.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="usage.html" title="3. Usage"
             >next</a> |</li>
        <li class="right" >
          <a href="intro.html" title="1. Introduction"
             >previous</a> |</li>
        <li><a href="index.html">pyGCluster 0.18.4 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2013, Daniel Jaeger, Johannes Barth, Anna Niehues and Christian Fufezan.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.2.
    </div>
  </body>
</html>