-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathpyGCluster.html
1154 lines (1114 loc) · 91.1 KB
/
pyGCluster.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>2. Module pyGCluster — pyGCluster 0.18.4 documentation</title>
<link rel="stylesheet" href="_static/default.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '0.18.4',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="pyGCluster 0.18.4 documentation" href="index.html" />
<link rel="next" title="3. Usage" href="usage.html" />
<link rel="prev" title="1. Introduction" href="intro.html" />
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="usage.html" title="3. Usage"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="intro.html" title="1. Introduction"
accesskey="P">previous</a> |</li>
<li><a href="index.html">pyGCluster 0.18.4 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="module-pyGCluster">
<span id="module-pygcluster"></span><h1>2. Module pyGCluster<a class="headerlink" href="#module-pyGCluster" title="Permalink to this headline">¶</a></h1>
<p>pyGCluster is a clustering algorithm focusing on noise injection for subsequent cluster validation.
By requesting identical cluster identity, the reproducibility of a large amount of clusters
obtained with agglomerative hierarchical clustering (AHC) is assessed.
Furthermore, a multitude of different distance-linkage combinations (DLCs) are evaluated.
Finally, associations of highly reproducible clusters, called communities, are created.
Graphical representation of the results as node maps and expression maps is implemented.</p>
<dl class="docutils">
<dt>The pyGCluster module contains the main class <a class="reference internal" href="#pyGCluster.Cluster" title="pyGCluster.Cluster"><tt class="xref py py-class docutils literal"><span class="pre">pyGCluster.Cluster</span></tt></a> and some functions</dt>
<dd><div class="first last line-block">
<div class="line"><a class="reference internal" href="#pyGCluster.create_default_alphabet" title="pyGCluster.create_default_alphabet"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.create_default_alphabet()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.resampling_multiprocess" title="pyGCluster.resampling_multiprocess"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.resampling_multiprocess()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.seekAndDestry" title="pyGCluster.seekAndDestry"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.seekAndDestry()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.yield_noisejected_dataset" title="pyGCluster.yield_noisejected_dataset"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.yield_noisejected_dataset()</span></tt></a></div>
</div>
</dd>
</dl>
<dl class="class">
<dt id="pyGCluster.Cluster">
<em class="property">class </em><tt class="descclassname">pyGCluster.</tt><tt class="descname">Cluster</tt><big>(</big><em>data=None</em>, <em>working_directory=None</em>, <em>verbosity_level=1</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster" title="Permalink to this definition">¶</a></dt>
<dd><p>The pyGCluster class</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>working_directory</strong> (<em>string</em>) – directory in which all results are written (requires write-permission!).</li>
<li><strong>verbosity_level</strong> (<em>int</em>) – either 0, 1 or 2.</li>
<li><strong>data</strong> (<em>dict</em>) – Dictionary containing the data which is to be clustered.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>In order to work with the default noise-injection function as well as plot
expression maps correctly, the data-dict <strong>has</strong> to have the following
structure.</p>
<p>Example:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
<span class="gp">... </span> <span class="n">Identifier1</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span> <span class="n">condition1</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean11</span><span class="p">,</span> <span class="n">sd11</span> <span class="p">),</span>
<span class="gp">... </span> <span class="n">condition2</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean12</span><span class="p">,</span> <span class="n">sd12</span> <span class="p">),</span>
<span class="gp">... </span> <span class="n">condition3</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean13</span><span class="p">,</span> <span class="n">sd13</span> <span class="p">),</span>
<span class="gp">... </span> <span class="p">},</span>
<span class="gp">... </span> <span class="n">Identifier2</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span> <span class="n">condition2</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean22</span><span class="p">,</span> <span class="n">sd22</span> <span class="p">),</span>
<span class="gp">... </span> <span class="n">condition3</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean23</span><span class="p">,</span> <span class="n">sd23</span> <span class="p">),</span>
<span class="gp">... </span> <span class="n">condition3</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean13</span><span class="p">,</span> <span class="n">sd13</span> <span class="p">),</span>
<span class="gp">... </span> <span class="p">},</span>
<span class="gp">... </span><span class="p">}</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">pyGCluster</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span> <span class="o">=</span> <span class="n">pyGCluster</span><span class="o">.</span><span class="n">Cluster</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">,</span> <span class="n">verbosity_level</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">working_directory</span><span class="o">=...</span><span class="p">)</span>
</pre></div>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If any condition for an identifier in the “nested_data_dict”-dict is missing,
this entry is discarded, i.e. not imported into the Cluster Class.
This is because pyGCluster does not implement any missing value estimation.
One possible solution is to replace missing values by a mean value and a standard
deviation that is representative for the complete data range in the given condition.</p>
</div>
<p>pyGCluster inherits from the regular Python Dictionary object.
Hence, the attributes of pyGCluster can be accessed as Python Dictionary keys.</p>
<p>A selection of the most important attributes / keys are:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="c"># general</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Working directory'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># this is the directory where all pyGCluster results</span>
<span class="gp">... </span> <span class="c"># (pickle objects, expression maps, node map, ...) are saved into.</span>
<span class="go">/Users/Shared/moClusterDirectory</span>
<span class="gp">>>> </span><span class="c"># original data ca be accessed via</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Data'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># this collections.OrderedDict contains the data that has been</span>
<span class="gp">... </span> <span class="c"># or will be clustered (see also below).</span>
<span class="gp">... </span><span class="n">plenty</span> <span class="n">of</span> <span class="n">data</span> <span class="p">;)</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Conditions'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># sorted list of all conditions that are defined in the "Data"-dictionary</span>
<span class="go">[ 'condition1', 'condition2', 'condition3' ]</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Identifiers'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># sorted tuple of all identifiers, i.e. ClusterClass[ 'Data' ].keys()</span>
<span class="go">( 'Identifier1', 'Identifier2' , ... 'IdentifierN' )</span>
<span class="gp">>>> </span><span class="c"># re-sampling paramerters</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Iterations'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># the number of datasets that were clustered.</span>
<span class="go">1000000</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Cluster 2 clusterID'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># dictionary with clusters as keys, and their respective row index</span>
<span class="gp">... </span> <span class="c"># in the "Cluster count"-matrix (= clusterID) as values.</span>
<span class="go">{ ... }</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Cluster counts'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># numpy.uint32 matrix holding the counts for each</span>
<span class="gp">... </span> <span class="c"># distance-linkage combination of the clusters.</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Distance-linkage combinations'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># sorted list containing the distance-linkage combinations</span>
<span class="gp">... </span> <span class="c"># that were evaluted in the re-sampling routine.</span>
<span class="gp">>>> </span><span class="c"># Communities</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Communities'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># see function pyGCluster.Cluster.build_nodemap for further information.</span>
<span class="gp">>>> </span><span class="c"># Visualization</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'Additional labels'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># dictionary with an identifier of the "Data"-dict as key,</span>
<span class="gp">... </span> <span class="c"># and a list of additional information (e.g. annotation, GO terms) as value.</span>
<span class="go">{</span>
<span class="go"> 'Identifier1' :</span>
<span class="go"> ['Photosynthesis related' , 'zeroFactor: 12.31' ],</span>
<span class="go"> 'Identifier2' : [ ... ] ,</span>
<span class="go"> ...</span>
<span class="go">}</span>
<span class="gp">>>> </span><span class="n">ClusterClass</span><span class="p">[</span> <span class="s">'for IO skip clusters bigger than'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># Default = 100. Since some clusters are really large</span>
<span class="gp">... </span> <span class="c"># (with sizes close to the root (the cluster holding all objects)),</span>
<span class="gp">... </span> <span class="c"># clusters with more objects than this value</span>
<span class="gp">... </span> <span class="c"># are not plotted as expression maps or expression profile plots.</span>
</pre></div>
</div>
<p>pyGCluster offers the possibility to save the analysis (e.g. after re-sampling)
via <a class="reference internal" href="#pyGCluster.Cluster.save" title="pyGCluster.Cluster.save"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.save()</span></tt></a> , and continue
via <a class="reference internal" href="#pyGCluster.Cluster.load" title="pyGCluster.Cluster.load"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.load()</span></tt></a>
Initializes pyGCluster.Cluster class</p>
<p>Classically, users start the multiprocessing clustering routine with multiple
distance linkage combinations via the <a class="reference internal" href="#pyGCluster.Cluster.do_it_all" title="pyGCluster.Cluster.do_it_all"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.do_it_all()</span></tt></a>
function. This function allows to update the pyGCluster class with all user
parameters before it calls <a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a>.
The main advantage in calling <a class="reference internal" href="#pyGCluster.Cluster.do_it_all" title="pyGCluster.Cluster.do_it_all"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.do_it_all()</span></tt></a> is
that all general plotting functions are called afterwards as well, these are:</p>
<blockquote>
<div><div class="line-block">
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.plot_clusterfreqs" title="pyGCluster.Cluster.plot_clusterfreqs"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.plot_clusterfreqs()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.build_nodemap" title="pyGCluster.Cluster.build_nodemap"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.build_nodemap()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a></div>
<div class="line"><a class="reference internal" href="#pyGCluster.Cluster.draw_community_expression_maps" title="pyGCluster.Cluster.draw_community_expression_maps"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_community_expression_maps()</span></tt></a></div>
</div>
</div></blockquote>
<p>If one choses, one can manually update the parameters (setting the key, value
pairs in pyGCluster) and then evoke <a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a>
with the appropriate parameters. This useful if certain memory intensive
distance-linkage combinations are to be clustered on a specific computer.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Cluster Class can be initilized empty and filled using <a class="reference internal" href="#pyGCluster.Cluster.load" title="pyGCluster.Cluster.load"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.load()</span></tt></a></p>
</div>
<dl class="method">
<dt id="pyGCluster.Cluster.build_nodemap">
<tt class="descname">build_nodemap</tt><big>(</big><em>min_cluster_size=4</em>, <em>top_X_clusters=0</em>, <em>threshold_4_the_lowest_max_freq=0.01</em>, <em>starting_min_overlap=0.1</em>, <em>increasing_min_overlap=0.05</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.build_nodemap"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.build_nodemap" title="Permalink to this definition">¶</a></dt>
<dd><p>Construction of communities from a set of most_frequent_cluster.
This set is obtained via <tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster._get_most_frequent_clusters()</span></tt>, to which the first three parameters are passed.
These clusters are then subjected to AHC with complete linkage.
The distance matrix is calculated via <a class="reference internal" href="#pyGCluster.Cluster.calculate_distance_matrix" title="pyGCluster.Cluster.calculate_distance_matrix"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.calculate_distance_matrix()</span></tt></a>.
The combination of complete linkage and the distance matrix assures that all clusters in a community exhibit at least the “starting_min_overlap” to each other.
From the resulting cluster tree, a “first draft” of communities is obtained.
These “first” communities are then themselves considered as clusters, and subjected to AHC again, until the community assignment of clusters remains constant.
By this, clusters are inserted into a target community, which initially did not overlap with each cluster inside the target community,
but do overlap if the clusters in the target community are combined into a single cluster.
By this, the degree of stringency is reduced; the clusters fit into a community in a broader sense.
For further information on the community construction, see the publication of pyGCluster.</p>
<dl class="docutils">
<dt>Internal structure of communities:</dt>
<dd><div class="first last highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">name</span> <span class="o">=</span> <span class="p">(</span> <span class="n">cluster</span><span class="p">,</span> <span class="n">level</span> <span class="p">)</span>
<span class="gp">... </span> <span class="c"># internal name of the community.</span>
<span class="gp">... </span> <span class="c"># The first element in the tuple ("cluster") contains the indices</span>
<span class="gp">... </span> <span class="c"># of the objects that comprise a community.</span>
<span class="gp">... </span> <span class="c"># The second element gives the level,</span>
<span class="gp">... </span> <span class="c"># or iteration when the community was formed.</span>
<span class="gp">>>> </span><span class="bp">self</span><span class="p">[</span> <span class="s">'Communities'</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">'children'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># list containing the clusters that build the community.</span>
<span class="gp">>>> </span><span class="bp">self</span><span class="p">[</span> <span class="s">'Communities'</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">'# of nodes merged into community'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># the number of clusters that build the community.</span>
<span class="gp">>>> </span><span class="bp">self</span><span class="p">[</span> <span class="s">'Communities'</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">'index 2 obCoFreq dict'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># an OrderedDict in which each index is assigned its obCoFreq.</span>
<span class="gp">... </span> <span class="c"># Negative indices correspond to "placeholders",</span>
<span class="gp">... </span> <span class="c"># which are required for the insertion of black lines into expression maps.</span>
<span class="gp">... </span> <span class="c"># Black lines in expression maps seperate the individual clusters</span>
<span class="gp">... </span> <span class="c"># that form a community, sorted by when</span>
<span class="gp">... </span> <span class="c"># they were inserted into the community.</span>
<span class="gp">>>> </span><span class="bp">self</span><span class="p">[</span> <span class="s">'Communities'</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">'highest obCoFreq'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># the highest obCoFreq encountered in a community.</span>
<span class="gp">>>> </span><span class="bp">self</span><span class="p">[</span> <span class="s">'Communities'</span> <span class="p">][</span> <span class="n">name</span> <span class="p">][</span> <span class="s">'cluster ID'</span> <span class="p">]</span>
<span class="gp">... </span> <span class="c"># the ID of the cluster containing the object with the highest obCoFreq.</span>
</pre></div>
</div>
</dd>
</dl>
<p>Of the following parameters, the first three are passed to <tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster._get_most_frequent_clusters()</span></tt>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_cluster_size</strong> (<em>int</em>) – clusters smaller than this threshold are not considered for the community construction.</li>
<li><strong>top_X_clusters</strong> (<em>int</em>) – form communities from the top X clusters sorted by their maximum frequency.</li>
<li><strong>threshold_4_the_lowest_max_freq</strong> (<em>float</em>) – [0, 1[ form communities from clusters whose maximum frequency is at least this value.</li>
<li><strong>starting_min_overlap</strong> (<em>float</em>) – ]0, 1[ minimum required relative overlap between clusters so that they are assigned the same community. The relative overlap is defined as the size of the overlap between two clusters, divided by the size of the larger cluster.</li>
<li><strong>increasing_min_overlap</strong> (<em>float</em>) – defines the increase of the required overlap between communities</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.calculate_distance_matrix">
<tt class="descname">calculate_distance_matrix</tt><big>(</big><em>clusters</em>, <em>min_overlap=0.25</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.calculate_distance_matrix"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.calculate_distance_matrix" title="Permalink to this definition">¶</a></dt>
<dd><dl class="docutils">
<dt>Calculates the specifically developed distance matrix for the AHC of clusters:</dt>
<dd><ol class="first last arabic simple">
<li>Clusters sharing <em>not</em> the minimum overlap are attributed a distance of “self[ ‘Root size’ ]” (i.e. len( self[ ‘Data’ ] ) ).</li>
<li>Clusters are attributed a distance of “self[ ‘Root size’ ] - 1” to the root cluster.</li>
<li>Clusters sharing the minimum overlap are attributed a distance of “size of the larger of the two clusters minus size of the overlap”.</li>
</ol>
</dd>
</dl>
<p>The overlap betweeen a pair of clusters is relative, i.e. defined as the size of the overlap divided by the size of the larger of the two clusters.</p>
<p>The resulting condensed distance matrix in not returned, but rather stored in self[ ‘Nodemap - condensed distance matrix’ ].</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>clusters</strong> (<em>list of clusters. Clusters are represented as tuples consisting of their object’s indices.</em>) – the most frequent clusters whose “distance” is to be determined.</li>
<li><strong>min_overlap</strong> (<em>float</em>) – ]0, 1[ threshold value to determine if the distance between two clusters is calculated according to (1) or (3).</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.check4convergence">
<tt class="descname">check4convergence</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.check4convergence"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.check4convergence" title="Permalink to this definition">¶</a></dt>
<dd><p>Checks if the re-sampling routine may be terminated, because the number of most frequent clusters remains almost constant.
This is done by examining a plot of the amount of most frequent clusters vs. the number of iterations.
Convergence is declared once the median normalized slope in a given window of iterations is equal or below “iter_tol”.
For further information see Supplementary Material of the corresponding publication.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">boolean</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.check_if_data_is_log2_transformed">
<tt class="descname">check_if_data_is_log2_transformed</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.check_if_data_is_log2_transformed"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.check_if_data_is_log2_transformed" title="Permalink to this definition">¶</a></dt>
<dd><p>Simple check if any value of the data_tuples (i.e. any mean) is below zero.
Below zero indicates that the input data was log2 transformed.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">boolean</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.convergence_plot">
<tt class="descname">convergence_plot</tt><big>(</big><em>filename='convergence_plot.pdf'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.convergence_plot"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.convergence_plot" title="Permalink to this definition">¶</a></dt>
<dd><p>Creates a two-sided PDF file containing the full picture of the convergence plot, as well as a zoom of it.
The convergence plot illustrates the development of the amount of most frequent clusters vs. the number of iterations.
The dotted line in this plots represents the normalized slope, which is used for internal convergence determination.</p>
<p>If rpy2 cannot be imported, a CSV file is created instead.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) – the filename of the PDF (or CSV) file.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.create_rainbow_colors">
<tt class="descname">create_rainbow_colors</tt><big>(</big><em>n_colors=10</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.create_rainbow_colors"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.create_rainbow_colors" title="Permalink to this definition">¶</a></dt>
<dd><p>Returns a list of rainbow colors. Colors are expressed as hexcodes of RGB values.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>n_colors</strong> (<em>int</em>) – number of rainbow colors.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">list</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.delete_resampling_results">
<tt class="descname">delete_resampling_results</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.delete_resampling_results"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.delete_resampling_results" title="Permalink to this definition">¶</a></dt>
<dd><p>Resets all variables holding any result of the re-sampling process.
This includes the convergence determination as well as the community structure.
Does not delete the data that is intended to be clustered.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">None</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.do_it_all">
<tt class="descname">do_it_all</tt><big>(</big><em>working_directory=None</em>, <em>distances=None</em>, <em>linkages=None</em>, <em>function_2_generate_noise_injected_datasets=None</em>, <em>min_cluster_size=4</em>, <em>alphabet=None</em>, <em>force_plotting=False</em>, <em>min_cluster_freq_2_retain=0.001</em>, <em>pickle_filename='pyGCluster_resampled.pkl'</em>, <em>cpus_2_use=None</em>, <em>iter_max=250000</em>, <em>iter_tol=1e-07</em>, <em>iter_step=5000</em>, <em>iter_top_P=0.001</em>, <em>iter_window=50000</em>, <em>iter_till_the_end=False</em>, <em>top_X_clusters=0</em>, <em>threshold_4_the_lowest_max_freq=0.01</em>, <em>starting_min_overlap=0.1</em>, <em>increasing_min_overlap=0.05</em>, <em>color_gradient='1337'</em>, <em>box_style='classic'</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>additional_labels=None</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.do_it_all"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.do_it_all" title="Permalink to this definition">¶</a></dt>
<dd><p>Evokes all necessary functions which constitute the main functionality of pyGCluster.
This is AHC clustering with noise injection and a variety of DLCs,
in order to identify highly reproducible clusters,
followed by a meta-clustering of highly reproducible clusters into so-called ‘communities’.</p>
<p>The functions that are called are:</p>
<blockquote>
<div><ul class="simple">
<li><a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.build_nodemap" title="pyGCluster.Cluster.build_nodemap"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.build_nodemap()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.draw_community_expression_maps" title="pyGCluster.Cluster.draw_community_expression_maps"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_community_expression_maps()</span></tt></a></li>
<li><a class="reference internal" href="#pyGCluster.Cluster.draw_expression_profiles" title="pyGCluster.Cluster.draw_expression_profiles"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_profiles()</span></tt></a></li>
</ul>
</div></blockquote>
<p>For a complete list of possible
Distance matrix calculations
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/spatial.distance.html">http://docs.scipy.org/doc/scipy/reference/spatial.distance.html</a>
or Linkage methods
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html">http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html</a></p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If memory is of concern (e.g. for a large dataset, > 5000 objects), cpus_2_use should be kept low.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>distances</strong> (<em>list</em>) – list of distance metrices, given as strings, e.g. [ ‘correlation’, ‘euclidean’ ]</li>
<li><strong>linkages</strong> (<em>list</em>) – list of distance metrices, given as strings, e.g. [ ‘average’, ‘complete’, ‘ward’ ]</li>
<li><strong>function_2_generate_noise_injected_datasets</strong> (<em>function</em>) – function to generate noise-injected datasets. If None (default), Gaussian distributions are used.</li>
<li><strong>min_cluster_size</strong> (<em>int</em>) – minimum size of a cluster, so that it is included in the assessment of cluster reproducibilities.</li>
<li><strong>alphabet</strong> (<em>string</em>) – alphabet used to convert decimal indices to characters to save memory. Defaults to string.printable, without ‘,’.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If alphabet contains ‘,’, this character is removed from alphabet, because the indices comprising a cluster are saved comma-seperated.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>force_plotting</strong> (<em>boolean</em>) – the convergence plot is created after each iter_step iteration (otherwise only when convergence is detected).</li>
<li><strong>min_cluster_freq_2_retain</strong> (<em>float</em>) – ]0, 1[ minimum frequency of a cluster (only the maximum of the dlc-frequencies matters here) it has to exhibit to be stored in pyGCluster once all iterations are finished.</li>
<li><strong>cpus_2_use</strong> (<em>int</em>) – number of threads that are evoked in the re-sampling routine.</li>
<li><strong>iter_max</strong> (<em>int</em>) – maximum number of re-sampling iterations.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Convergence determination:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>iter_tol</strong> (<em>float</em>) – ]0, 1e-3[ value for the threshold of the median of normalized slopes, in order to declare convergence.</li>
<li><strong>iter_step</strong> (<em>int</em>) – number of iterations each multiprocess performs and simultaneously the interval in which to check for convergence.</li>
<li><strong>iter_top_P</strong> (<em>float</em>) – ]0, 1[ for the convergence estmation, the amount of most frequent clusters is examined. This is the threshold for the minimum frequency of a cluster to be included.</li>
<li><strong>iter_window</strong> (<em>int</em>) – size of the sliding window in iterations. The median is obtained from normalized slopes inside this window - <em>should be a multiple of iter_step</em></li>
<li><strong>iter_till_the_end</strong> (<em>boolean</em>) – if set to True, the convergence determination is switched off; hence, re-sampling is performed until iter_max is reached.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Output/Plotting:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>pickle_filename</strong> (<em>string</em>) – Filename of the output pickle object</li>
<li><strong>top_X_clusters</strong> (<em>int</em>) – Plot of the top X clusters in the sorted list (by freq) of clusters having a maximum cluster frequency of at least threshold_4_the_lowest_max_freq (clusterfreq-plot is still sorted by size).</li>
<li><strong>threshold_4_the_lowest_max_freq</strong> (<em>float</em>) – ]0, 1[ Clusters must have a maximum frequency of at least threshold_4_the_lowest_max_freq to appear in the plot.</li>
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) – lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be < 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) – upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) – name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>expression_map_filename</strong> (<em>string</em>) – file name for expression map. .svg will be added if required.</li>
<li><strong>legend_filename</strong> (<em>string</em>) – file name for legend .svg will be added if required.</li>
<li><strong>box_style</strong> (<em>string</em>) – the way the relative standard deviation is visualized in the expression map. Currently supported are ‘modern’, ‘fusion’ or ‘classic’.</li>
<li><strong>starting_min_overlap</strong> (<em>float</em>) – ]0, 1[ minimum required relative overlap between clusters so that they are assigned the same community. The relative overlap is defined as the size of the overlap between two clusters, divided by the size of the larger cluster.</li>
<li><strong>increasing_min_overlap</strong> (<em>float</em>) – defines the increase of the required overlap between communities</li>
<li><strong>additional_labels</strong> (<em>dict</em>) – dictionary, where additional labels can be defined which will be added in the expression map plots to the gene/protein names</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">None</p>
</td>
</tr>
</tbody>
</table>
<p>For more information to each parameter, please refer to <a class="reference internal" href="#pyGCluster.Cluster.resample" title="pyGCluster.Cluster.resample"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.resample()</span></tt></a>,
and the subsequent functions:
<a class="reference internal" href="#pyGCluster.Cluster.build_nodemap" title="pyGCluster.Cluster.build_nodemap"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.build_nodemap()</span></tt></a>,
<a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a>,
<a class="reference internal" href="#pyGCluster.Cluster.draw_community_expression_maps" title="pyGCluster.Cluster.draw_community_expression_maps"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_community_expression_maps()</span></tt></a>,
<a class="reference internal" href="#pyGCluster.Cluster.draw_expression_profiles" title="pyGCluster.Cluster.draw_expression_profiles"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_profiles()</span></tt></a>.</p>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.draw_community_expression_maps">
<tt class="descname">draw_community_expression_maps</tt><big>(</big><em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='1337'</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_community_expression_maps"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_community_expression_maps" title="Permalink to this definition">¶</a></dt>
<dd><p>Plots the expression map for each community showing its object composition.</p>
<p>The following parameters are passed to <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map" title="pyGCluster.Cluster.draw_expression_map"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map()</span></tt></a>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) – lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be < 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) – upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) – name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>box_style</strong> (<em>string</em>) – name of box style used in SVG. Currently supported are classic, modern, fusion.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_map">
<tt class="descname">draw_expression_map</tt><big>(</big><em>identifiers=None</em>, <em>data=None</em>, <em>conditions=None</em>, <em>additional_labels=None</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>expression_map_filename=None</em>, <em>legend_filename=None</em>, <em>color_gradient=None</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_map"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_map" title="Permalink to this definition">¶</a></dt>
<dd><p>Draws expression map as SVG</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) – lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be < 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) – upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) – name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>expression_map_filename</strong> (<em>string</em>) – file name for expression map. .svg will be added if required.</li>
<li><strong>legend_filename</strong> (<em>string</em>) – file name for legend .svg will be added if required.</li>
<li><strong>box_style</strong> (<em>string</em>) – the way the relative standard deviation is visualized in the expression map. Currently supported are ‘modern’, ‘fusion’ or ‘classic’.</li>
<li><strong>additional_labels</strong> (<em>dict</em>) – dictionary, where additional labels can be defined which will be added in the expression map plots to the gene/protein names</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
<dl class="docutils">
<dt>Data has to be a nested dict in the following format:</dt>
<dd><div class="first last highlight-python"><div class="highlight"><pre><span class="gp">>>> </span> <span class="n">data</span> <span class="o">=</span> <span class="p">{</span>
<span class="gp">... </span> <span class="n">fastaID1</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span> <span class="n">cond1</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">)</span> <span class="p">,</span> <span class="n">cond2</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">),</span> <span class="o">...</span>
<span class="gp">... </span> <span class="p">}</span>
<span class="gp">... </span> <span class="n">fastaID2</span> <span class="p">:</span> <span class="p">{</span>
<span class="gp">... </span> <span class="n">cond1</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">)</span> <span class="p">,</span> <span class="n">cond2</span> <span class="p">:</span> <span class="p">(</span> <span class="n">mean</span><span class="p">,</span> <span class="n">sd</span> <span class="p">),</span> <span class="o">...</span>
<span class="gp">... </span> <span class="p">}</span>
<span class="gp">... </span> <span class="p">}</span>
</pre></div>
</div>
</dd>
<dt>optional and, if needed, data will be extracted from</dt>
<dd><div class="first last line-block">
<div class="line">self[ ‘Data’ ]</div>
<div class="line">self[ ‘Identifiers’ ]</div>
<div class="line">self[ ‘Conditions’ ]</div>
</div>
</dd>
</dl>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_map_for_cluster">
<tt class="descname">draw_expression_map_for_cluster</tt><big>(</big><em>clusterID=None</em>, <em>cluster=None</em>, <em>filename=None</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='default'</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_map_for_cluster"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_map_for_cluster" title="Permalink to this definition">¶</a></dt>
<dd><p>Plots an expression map for a given cluster.
Either the parameter “clusterID” or “cluster” can be defined.
This function is useful to plot a user-defined cluster, e.g. knowledge-based cluster (TCA-cluster, Glycolysis-cluster ...). In this case, the parameter “cluster” should be defined.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>clusterID</strong> (<em>int</em>) – ID of a cluster (those are obtained e.g. from the plot of cluster frequencies or the node map)</li>
<li><strong>cluster</strong> (<em>tuple</em>) – tuple containing the indices of the objects describing a cluster.</li>
<li><strong>filename</strong> (<em>string</em>) – name of the SVG file for the expression map.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>The following parameters are passed to <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map" title="pyGCluster.Cluster.draw_expression_map"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map()</span></tt></a>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) – lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be < 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) – upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) – name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>box_style</strong> (<em>string</em>) – name of box style used in SVG. Currently supported are classic, modern, fusion.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_map_for_community_cluster">
<tt class="descname">draw_expression_map_for_community_cluster</tt><big>(</big><em>name</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='1337'</em>, <em>sub_folder=None</em>, <em>min_obcofreq_2_plot=None</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_map_for_community_cluster"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_map_for_community_cluster" title="Permalink to this definition">¶</a></dt>
<dd><p>Plots the expression map for a given “community cluster”:
Any cluster in the community node map is internally represented as a tuple with two elements:
“cluster” and “level”. Those objects are stored as keys in self[ ‘Communities’ ],
from where they may be extracted and fed into this function.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>name</strong> (<em>tuple</em>) – “community cluster” -> best obtain from self[ ‘Communities’ ].keys()</li>
<li><strong>min_obcofreq_2_plot</strong> (<em>float</em>) – minimum obCoFreq of an cluster’s object to be shown in the expression map.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>The following parameters are passed to <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map" title="pyGCluster.Cluster.draw_expression_map"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map()</span></tt></a>:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) – lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be < 0!</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) – upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) – name of the color gradient used for plotting the expression map. Currently supported are default, Daniel, barplot, 1337, BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn and Spectral</li>
<li><strong>box_style</strong> (<em>string</em>) – name of box style used in SVG. Currently supported are classic, modern, fusion.</li>
<li><strong>sub_folder</strong> (<em>string</em>) – if specified, the expression map is saved in this folder, rather than in pyGCluster’s working directory.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.draw_expression_profiles">
<tt class="descname">draw_expression_profiles</tt><big>(</big><em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.draw_expression_profiles"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.draw_expression_profiles" title="Permalink to this definition">¶</a></dt>
<dd><p>Draws an expression profile plot (SVG) for each community, illustrating the main “expression pattern” of a community.
Each line in this plot represents an object. The “grey cloud” illustrates the range of the standard deviation of the mean values.
The plots are named prefixed by “exProf”, followed by the community name as it is shown in the node map.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>min_value_4_expression_map</strong> (<em>int</em>) – minimum of the y-axis (since data should be log2 values, this value should typically be < 0).</li>
<li><strong>max_value_4_expression_map</strong> (<em>int</em>) – maximum for the y-axis.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.frequencies">
<tt class="descname">frequencies</tt><big>(</big><em>identifier=None</em>, <em>clusterID=None</em>, <em>cluster=None</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.frequencies"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.frequencies" title="Permalink to this definition">¶</a></dt>
<dd><p>Returns a tuple with (i) the cFreq and (ii) a Collections.DefaultDict containing the DLC:frequency pairs for either
an identifier, e.g. “JGI4|Chlre4|123456”
or clusterID
or cluster.
Returns ‘None’ if the identifier is not part of the data set, or clusterID or cluster was not found during iterations.</p>
<p>Example:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">cFreq</span><span class="p">,</span> <span class="n">dlc_freq_dict</span> <span class="o">=</span> <span class="n">cluster</span><span class="o">.</span><span class="n">frequencies</span><span class="p">(</span> <span class="n">identifier</span> <span class="o">=</span> <span class="s">'JGI4|Chlre4|123456'</span> <span class="p">)</span>
<span class="gp">>>> </span><span class="n">dlc_freq_dict</span>
<span class="gp">... </span><span class="n">defaultdict</span><span class="p">(</span><span class="o"><</span><span class="nb">type</span> <span class="s">'float'</span><span class="o">></span><span class="p">,</span>
<span class="gp">... </span><span class="p">{</span><span class="s">'average-correlation'</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span> <span class="s">'complete-correlation'</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span>
<span class="gp">... </span><span class="s">'centroid-euclidean'</span><span class="p">:</span> <span class="mf">0.0015</span><span class="p">,</span> <span class="s">'median-euclidean'</span><span class="p">:</span> <span class="mf">0.0064666666666666666</span><span class="p">,</span>
<span class="gp">... </span><span class="s">'ward-euclidean'</span><span class="p">:</span> <span class="mf">0.0041333333333333335</span><span class="p">,</span> <span class="s">'weighted-correlation'</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span>
<span class="gp">... </span><span class="s">'complete-euclidean'</span><span class="p">:</span> <span class="mf">0.0014</span><span class="p">,</span> <span class="s">'weighted-euclidean'</span><span class="p">:</span> <span class="mf">0.0066333333333333331</span><span class="p">,</span>
<span class="gp">... </span><span class="s">'average-euclidean'</span><span class="p">:</span> <span class="mf">0.0020333333333333332</span><span class="p">})</span>
</pre></div>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>identifier</strong> (<em>string</em>) – search frequencies by identifier input</li>
<li><strong>clusterID</strong> (<em>int</em>) – search frequencies by cluster ID input</li>
<li><strong>cluster</strong> (<em>tuple</em>) – search frequencies by cluster (tuple of ints) input</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">tuple</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.info">
<tt class="descname">info</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.info"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.info" title="Permalink to this definition">¶</a></dt>
<dd><p>Prints some information about the clustering via pyGCluster:</p>
<blockquote>
<div><ul class="simple">
<li>number of genes/proteins clustered</li>
<li>number of conditions defined</li>
<li>number of distance-linkage combinations</li>
<li>number of iterations performed</li>
</ul>
</div></blockquote>
<p>as well as some information about the communities, the legend for the shapes of nodes in the node map and the way the functions were called.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.load">
<tt class="descname">load</tt><big>(</big><em>filename</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.load"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.load" title="Permalink to this definition">¶</a></dt>
<dd><p>Fills a pyGCluster.Cluster object with the session saved as “filename”.
If “filename” is not a complete path, e.g. “example.pkl” (instead of “/home/user/Desktop/example.pkl”), the directory given by self[ ‘Working directory’ ] is used.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<dl class="last docutils">
<dt>Loading of pyGCluster has to be performed as a 2-step-procedure:</dt>
<dd><div class="first last highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">LoadedClustering</span> <span class="o">=</span> <span class="n">pyGCluster</span><span class="o">.</span><span class="n">Cluster</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">LoadedClustering</span><span class="o">.</span><span class="n">load</span><span class="p">(</span> <span class="s">"/home/user/Desktop/example.pkl"</span> <span class="p">)</span>
</pre></div>
</div>
</dd>
</dl>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) – may be either a simple file name (“example.pkl”) or a complete path (e.g. “/home/user/Desktop/example.pkl”).</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.median">
<tt class="descname">median</tt><big>(</big><em>_list</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.median"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.median" title="Permalink to this definition">¶</a></dt>
<dd><p>Returns the median from a list of numeric values.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>_list</strong> (<em>list</em>) – </td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">int / float</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.plot_clusterfreqs">
<tt class="descname">plot_clusterfreqs</tt><big>(</big><em>min_cluster_size=4</em>, <em>top_X_clusters=0</em>, <em>threshold_4_the_lowest_max_freq=0.01</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.plot_clusterfreqs"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.plot_clusterfreqs" title="Permalink to this definition">¶</a></dt>
<dd><p>Plot the frequencies of each cluster as a expression map:
which cluster was found by which distance-linkage combination, and with what frequency?
The plot’s filename is prefixed by ‘clusterFreqsMap’, followed by the values of the parameters.
E.g. ‘clusterFreqsMap_minSize4_top0clusters_top10promille.svg’.
Clusters are sorted by size.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>min_cluster_size</strong> (<em>int</em>) – only clusters with a size equal or greater than min_cluster_size appear in the plot of the cluster freqs.</li>
<li><strong>threshold_4_the_lowest_max_freq</strong> (<em>float</em>) – ]0, 1[ Clusters must have a maximum frequency of at least threshold_4_the_lowest_max_freq to appear in the plot.</li>
<li><strong>top_X_clusters</strong> (<em>int</em>) – Plot of the top X clusters in the sorted list (by freq) of clusters having a maximum cluster frequency of at least threshold_4_the_lowest_max_freq (clusterfreq-plot is still sorted by size).</li>
</ul>
</td>
</tr>
</tbody>
</table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">if top_X_clusters is set to zero ( 0 ), this filter is switched off (switched off by default).</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">None</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.plot_mean_distributions">
<tt class="descname">plot_mean_distributions</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.plot_mean_distributions"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.plot_mean_distributions" title="Permalink to this definition">¶</a></dt>
<dd><p>Creates a density plot of mean values for each condition via rpy2.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.plot_nodetree">
<tt class="descname">plot_nodetree</tt><big>(</big><em>tree_filename='tree.dot'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.plot_nodetree"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.plot_nodetree" title="Permalink to this definition">¶</a></dt>
<dd><dl class="docutils">
<dt>plot the dendrogram for the clustering of the most_frequent_clusters.</dt>
<dd><ul class="first last">
<li><p class="first">node label = nodeID internally used for self[‘Nodemap’] (not the same as clusterID!).</p>
</li>
<li><p class="first">node border color is white if the node is a close2root-cluster (i.e. larger than self[ ‘for IO skip clusters bigger than’ ] ).</p>
</li>
<li><p class="first">edge label = distance between parent and children.</p>
</li>
<li><dl class="first docutils">
<dt>edge - color codes:</dt>
<dd><ul class="first last simple">
<li>black = default; highlights child which is not a most_frequent_cluster but was created during formation of the dendrogram.</li>
<li>green = children are connected with the root.</li>
<li>red = highlights child which is a most_frequent_cluster.</li>
<li>yellow = most_frequent_cluster is directly connected with the root.</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>tree_filename</strong> (<em>string</em>) – name of the Graphviz DOT file containing the dendrogram of the AHC of most frequent clusters. Best given with ”.dot”-extension!</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.resample">
<tt class="descname">resample</tt><big>(</big><em>distances</em>, <em>linkages</em>, <em>function_2_generate_noise_injected_datasets=None</em>, <em>min_cluster_size=4</em>, <em>alphabet=None</em>, <em>force_plotting=False</em>, <em>min_cluster_freq_2_retain=0.001</em>, <em>pickle_filename='pyGCluster_resampled.pkl'</em>, <em>cpus_2_use=None</em>, <em>iter_tol=1e-07</em>, <em>iter_step=5000</em>, <em>iter_max=250000</em>, <em>iter_top_P=0.001</em>, <em>iter_window=50000</em>, <em>iter_till_the_end=False</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.resample"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.resample" title="Permalink to this definition">¶</a></dt>
<dd><p>Routine for the assessment of cluster reproducibility (re-sampling routine).
To this, a high number of noise-injected datasets are created, which are subsequently clustered by AHC.
Those are created via <tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.function_2_generate_noise_injected_datasets()</span></tt> (default = usage of Gaussian distributions).
Each ‘simulated’ dataset is then subjected to AHC x times, where x equals the number of distance-linkage combinations that come from all possible combinations of “distances” and “linkages”.
In order to speed up the re-sampling routine, it is distributed to multiple threads, if cpus_2_use > 1.</p>
<p>The re-sampling routine stops once either convergence (see below) is detected or iter_max iterations have been performed.
Eventually, only clusters with a maximum frequency of at least min_cluster_freq_2_retain are stored; all others are discarded.</p>
<p>In order to visually inspect convergence, a convergence plot is created.
For more information about the convergence estimation, see Supplementary Material of pyGCluster’s publication.</p>
<p>For a complete list of possible
Distance matrix calculations
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/spatial.distance.html">http://docs.scipy.org/doc/scipy/reference/spatial.distance.html</a>
or Linkage methods
see: <a class="reference external" href="http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html">http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html</a></p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If memory is of concern (e.g. for a large dataset, > 5000 objects), cpus_2_use should be kept low.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>distances</strong> (<em>list</em>) – list of distance metrices, given as strings, e.g. [ ‘correlation’, ‘euclidean’ ]</li>
<li><strong>linkages</strong> (<em>list</em>) – list of distance metrices, given as strings, e.g. [ ‘average’, ‘complete’, ‘ward’ ]</li>
<li><strong>function_2_generate_noise_injected_datasets</strong> (<em>function</em>) – function to generate noise-injected datasets. If None (default), Gaussian distributions are used.</li>
<li><strong>min_cluster_size</strong> (<em>int</em>) – minimum size of a cluster, so that it is included in the assessment of cluster reproducibilities.</li>
<li><strong>alphabet</strong> (<em>string</em>) – alphabet used to convert decimal indices to characters to save memory. Defaults to string.printable, without ‘,’.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">If alphabet contains ‘,’, this character is removed from alphabet, because the indices comprising a cluster are saved comma-seperated.</p>
</div>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>force_plotting</strong> (<em>boolean</em>) – the convergence plot is created after each iter_step iteration (otherwise only when convergence is detected).</li>
<li><strong>min_cluster_freq_2_retain</strong> (<em>float</em>) – ]0, 1[ minimum frequency of a cluster (only the maximum of the dlc-frequencies matters here) it has to exhibit to be stored in pyGCluster once all iterations are finished.</li>
<li><strong>cpus_2_use</strong> (<em>int</em>) – number of threads that are evoked in the re-sampling routine.</li>
<li><strong>iter_max</strong> (<em>int</em>) – maximum number of re-sampling iterations.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Convergence determination:</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>iter_tol</strong> (<em>float</em>) – ]0, 1e-3[ value for the threshold of the median of normalized slopes, in order to declare convergence.</li>
<li><strong>iter_step</strong> (<em>int</em>) – number of iterations each multiprocess performs and simultaneously the interval in which to check for convergence.</li>
<li><strong>iter_top_P</strong> (<em>float</em>) – ]0, 1[ for the convergence estmation, the amount of most frequent clusters is examined. This is the threshold for the minimum frequency of a cluster to be included.</li>
<li><strong>iter_window</strong> (<em>int</em>) – size of the sliding window in iterations. The median is obtained from normalized slopes inside this window - <em>should be a multiple of iter_step</em></li>
<li><strong>iter_till_the_end</strong> (<em>boolean</em>) – if set to True, the convergence determination is switched off; hence, re-sampling is performed until iter_max is reached.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">None</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.save">
<tt class="descname">save</tt><big>(</big><em>filename='pyGCluster.pkl'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.save"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.save" title="Permalink to this definition">¶</a></dt>
<dd><p>Saves the current pyGCluster.Cluster object in a Pickle object.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) – may be either a simple file name (“example.pkl”) or a complete path (e.g. “/home/user/Desktop/example.pkl”). In the former case, the pickle is stored in pyGCluster’s working directory.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.write_dot">
<tt class="descname">write_dot</tt><big>(</big><em>filename</em>, <em>scaleByFreq=True</em>, <em>min_obcofreq_2_plot=None</em>, <em>n_legend_nodes=5</em>, <em>min_value_4_expression_map=None</em>, <em>max_value_4_expression_map=None</em>, <em>color_gradient='1337'</em>, <em>box_style='classic'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.write_dot"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.write_dot" title="Permalink to this definition">¶</a></dt>
<dd><p>Writes a Graphviz DOT file representing the cluster composition of communities.
Herein, each node represents a cluster. Its name is a combination of the cluster’s ID, followed by the level / iteration it was inserted into the community:</p>
<blockquote>
<div><ul class="simple">
<li>The node’s size reflects the cluster’s cFreq.</li>
<li>The node’s shape illustrates by which distance metric the cluster was found (if the shape is a point, this illustrates that this cluster was not among the most_frequent_clusters, but only formed during AHC of clusters).</li>
<li>The node’s color shows the community membership; except for clusters which are larger than self[ ‘for IO skip clusters bigger than’ ], those are highlighted in grey.</li>
<li>The node connecting all clusters is the root (the cluster holding all objects), which is highlighted in white.</li>
</ul>
</div></blockquote>
<p>The DOT file may be rendered with “Graphviz” or further processed with other appropriate programs such as e.g. “Gephi”.
If “Graphviz” is available, the DOT file is eventually rendered with “Graphviz“‘s dot-algorithm.</p>
<p>In addition, a expression map for each cluster of the node map is created (via <a class="reference internal" href="#pyGCluster.Cluster.draw_expression_map_for_community_cluster" title="pyGCluster.Cluster.draw_expression_map_for_community_cluster"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.draw_expression_map_for_community_cluster()</span></tt></a>).</p>
<p>Those are saved in the sub-folder “communityClusters”.</p>
<p>This function also calls <a class="reference internal" href="#pyGCluster.Cluster.write_legend" title="pyGCluster.Cluster.write_legend"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_legend()</span></tt></a>,
which creates a TXT file containing the object composition of all clusters, as well as their frequencies.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>filename</strong> (<em>string</em>) – file name of the Graphviz DOT file representing the node map, best given with extension ”.dot”.</li>
<li><strong>scaleByFreq</strong> (<em>boolean</em>) – switch to either scale nodes (= clusters) by cFreq or apply a constant size to each node (the latter may be useful to put emphasis on the nodes’ shapes).</li>
<li><strong>min_obcofreq_2_plot</strong> (<em>float</em>) – if defined, clusters with lower cFreq than this value are skipped, i.e. not plotted.</li>
<li><strong>n_legend_nodes</strong> (<em>int</em>) – number of nodes representing the legend for the node sizes. The node sizes themselves encode for the cFreq. “Legend nodes” are drawn as grey boxes.</li>
<li><strong>min_value_4_expression_map</strong> (<em>float</em>) – lower bound for color coding of values in the expression map. Remember that log2-values are expected, i.e. this value should be < 0.</li>
<li><strong>max_value_4_expression_map</strong> (<em>float</em>) – upper bound for color coding of values in the expression map.</li>
<li><strong>color_gradient</strong> (<em>string</em>) – name of the color gradient used for plotting the expression map.</li>
<li><strong>box_style</strong> (<em>string</em>) – the way the relative standard deviation is visualized in the expression map. Currently supported are ‘modern’, ‘fusion’ or ‘classic’.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">none</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="method">
<dt id="pyGCluster.Cluster.write_legend">
<tt class="descname">write_legend</tt><big>(</big><em>filename='legend.txt'</em><big>)</big><a class="reference internal" href="_modules/pyGCluster.html#Cluster.write_legend"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyGCluster.Cluster.write_legend" title="Permalink to this definition">¶</a></dt>
<dd><p>Creates a legend for the community node map as a TXT file.
Herein, the object composition of each cluster of the node map as well as its frequencies are recorded.
Since this function is internally called by <a class="reference internal" href="#pyGCluster.Cluster.write_dot" title="pyGCluster.Cluster.write_dot"><tt class="xref py py-func docutils literal"><span class="pre">pyGCluster.Cluster.write_dot()</span></tt></a>, it is typically not necessary to call this function.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>filename</strong> (<em>string</em>) – name of the legend TXT file, best given with extension ”.txt”.</td>
</tr>
<tr class="field-even field"><th class="field-name">Return type:</th><td class="field-body">none</td>
</tr>
</tbody>
</table>
</dd></dl>
</dd></dl>