-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathintro.html
330 lines (317 loc) · 18.9 KB
/
intro.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>1. Introduction — pyGCluster 0.18.4 documentation</title>
<link rel="stylesheet" href="_static/default.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '0.18.4',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="pyGCluster 0.18.4 documentation" href="index.html" />
<link rel="next" title="2. Module pyGCluster" href="pyGCluster.html" />
<link rel="prev" title="Welcome to pyGCluster’s documentation!" href="index.html" />
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="pyGCluster.html" title="2. Module pyGCluster"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="index.html" title="Welcome to pyGCluster’s documentation!"
accesskey="P">previous</a> |</li>
<li><a href="index.html">pyGCluster 0.18.4 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="introduction">
<h1>1. Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h1>
<p>‘Omics’ technologies yield large datasets, which are commonly subjected to cluster analysis in order to group them into comprehensible communities, i.e. co-regulated groups, which might be functionally related (Si et al., 2011). A critical step in cluster analysis is cluster validation (Handl et al., 2005), the most stringent form of validation being the assessment of exact reproducibility of a cluster in the light of the uncertainty of the data.
This issue is addressed by pyGCluster, an algorithm working in two steps. Firstly it creates many agglomerative hierarchical clusterings (AHCs) of the input data by injecting noise based on the uncertainty of the data and clusters them using different distance linkage combinations (DLCs). Secondly, pyGCluster creates a meta-clustering, i.e. clustering of the resulting, highly reproducible clusters into communities to gain a most complete representation of common patterns in the data. Communities are defined as sets of clusters with a specific pairwise overlap.</p>
<div class="section" id="algorithm-workflow">
<h2>1.1. Algorithm Workflow<a class="headerlink" href="#algorithm-workflow" title="Permalink to this headline">¶</a></h2>
<p>The workflow of pyGCluster can be divided in:</p>
<blockquote>
<div><ul>
<li><dl class="first docutils">
<dt>iterative steps</dt>
<dd><ul class="first last simple">
<li>re-sample the data based on mean and standard deviation</li>
<li>clustering of data using different distance linkage combinations (DLCs)</li>
</ul>
</dd>
</dl>
</li>
<li><p class="first">meta-clustering of highly reproducible clusters into communities, i.e. sets of clusters with a specific overlap</p>
</li>
<li><p class="first">visualize results via node maps, expression maps and expression profiles</p>
</li>
</ul>
</div></blockquote>
<div class="section" id="re-sampling-clustering">
<h3>1.1.1. Re-sampling & clustering<a class="headerlink" href="#re-sampling-clustering" title="Permalink to this headline">¶</a></h3>
<p>For each iteration, a new dataset is generated evoking the re-sampling routine. pyGCluster uses by default a noise injection function that generates a new data set by drawing from normal distributions defined by each data point, i.e. object o in condition l is defined by μ<sub>ol</sub>± σ<sub>ol</sub>.
Clustering is then performed using SciPy or fastcluster routines.</p>
</div>
<div class="section" id="community-construction">
<h3>1.1.2. Community construction<a class="headerlink" href="#community-construction" title="Permalink to this headline">¶</a></h3>
<p>Communities are created after the iterations by a meta-clustering of the most frequent clusters, i.e. top X% or top Y number of clusters. Community construction is performed iteratively through an AHC approach with a specifically developed distance metric (see publication) and complete linkage. Complete linkage was chosen because it insures that all clusters or meta-clusters have overlapping objects. The customized distance metric ensures that a) smaller clusters are merged earlier in the hierarchy (closer to the bottom) and b) clusters that have a smaller overlap to each other than the threshold will merge after the root, i.e. never into the same branch. After each iteration, very closely related clusters (in terms of their object content) are merged in the hierarchy forming one branch or community starting from the root. The final node map shows these iterations and where meta clusters are merged into the community. The closest node to the root in the final node map is the last iteration, in which no change to the community composition was detected. Using this approach the number of final clusters or communities to consider and analyze is reduced.</p>
</div>
<div class="section" id="node-map-and-expression-map-example">
<h3>1.1.3. Node map and expression map example<a class="headerlink" href="#node-map-and-expression-map-example" title="Permalink to this headline">¶</a></h3>
<p>The figures show an example of a node map and expression map generated by pyGCluster. The node map illustrates the data set of Höhner et al. (2013). The node shapes indicate whether a cluster was found using Euclidean distance (squares), correlation distance (circles) or both (triangles). The node color indicates the community membership. The strength of pyGCluster is shown in the green community in which Euclidean and correlation distance identified high frequency clusters (see arrow). Both distance metrics were required to identify all clusters. The black triangle in the middle represents the root node. Since the community construction is performed iteratively, the different iter steps are visible in the node map. For each community, the node closest to the root is the last iteration in which no change in the communities with respect to their composition was detected.</p>
<div class="figure align-center">
<img alt="_images/_Fig_1_revision.png" src="_images/_Fig_1_revision.png" style="width: 500px;" />
</div>
<div class="figure align-center">
<img alt="_images/EXPMRH.png" src="_images/EXPMRH.png" style="width: 500px;" />
</div>
<p>The example expression map is taken from Höhner et al. (2013)</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Some texts where copied from the original publication and are thus hereby marked as citation</p>
</div>
</div>
</div>
<div class="section" id="general-information">
<h2>1.2. General information<a class="headerlink" href="#general-information" title="Permalink to this headline">¶</a></h2>
<p>Copyright 2011-2013 by:</p>
<blockquote>
<div><div class="line-block">
<div class="line">D. Jaeger,</div>
<div class="line">J. Barth,</div>
<div class="line">A. Niehues,</div>
<div class="line">C. Fufezan</div>
</div>
</div></blockquote>
<p>The latest Documentation was generated on: November 04, 2013</p>
<div class="section" id="contact-information">
<h3>1.2.1. Contact information<a class="headerlink" href="#contact-information" title="Permalink to this headline">¶</a></h3>
<p>Please refer to:</p>
<blockquote>
<div><div class="line-block">
<div class="line">Dr. Christian Fufezan</div>
<div class="line">Institute of Plant Biology and Biotechnology</div>
<div class="line">Schlossplatz 8 , R 110.105</div>
<div class="line">University of Muenster</div>
<div class="line">Germany</div>
<div class="line">eMail: <a class="reference external" href="mailto:christian%40fufezan.net">christian<span>@</span>fufezan<span>.</span>net</a></div>
<div class="line">Tel: +049 251 83 24861</div>
<div class="line"><br /></div>
<div class="line"><a class="reference external" href="http://www.uni-muenster.de/Biologie.IBBP.AGFufezan">http://www.uni-muenster.de/Biologie.IBBP.AGFufezan</a></div>
</div>
</div></blockquote>
</div>
</div>
<div class="section" id="implementation">
<h2>1.3. Implementation<a class="headerlink" href="#implementation" title="Permalink to this headline">¶</a></h2>
<p>pyGCluster requires Python2.7 or higher, is freely available at <a class="reference external" href="http://pyGCluster.github.io">http://pyGCluster.github.io</a> and published under MIT license.</p>
<dl class="docutils">
<dt>pyGCluster dependencies are:</dt>
<dd><div class="first last line-block">
<div class="line">numpy</div>
<div class="line">scipy</div>
<div class="line">fastcluster (optionally)</div>
<div class="line">rpy2 (optionally)</div>
<div class="line">graphviz (optionally)</div>
</div>
</dd>
</dl>
<p>Fastcluster (Müllner,D. (2013)) offers significant speed increase compared to the same SciPy routines.</p>
</div>
<div class="section" id="download">
<h2>1.4. Download<a class="headerlink" href="#download" title="Permalink to this headline">¶</a></h2>
<dl class="docutils">
<dt>Get the latest version via github</dt>
<dd><div class="first last line-block">
<div class="line"><a class="reference external" href="https://github.com/pygcluster/pyGCluster">https://github.com/pygcluster/pyGCluster</a></div>
</div>
</dd>
<dt>or the latest package at</dt>
<dd><div class="first last line-block">
<div class="line"><a class="reference external" href="http://pyGCluster.github.com/dist/pyGCluster.tar.bz2">http://pyGCluster.github.com/dist/pyGCluster.tar.bz2</a></div>
<div class="line"><a class="reference external" href="http://pyGCluster.github.com/dist/pyGCluster.zip">http://pyGCluster.github.com/dist/pyGCluster.zip</a></div>
</div>
</dd>
<dt>The complete Documentation can be found as pdf</dt>
<dd><div class="first last line-block">
<div class="line"><a class="reference external" href="http://pyGCluster.github.com/dist/pyGCluster.pdf">http://pyGCluster.github.com/dist/pyGCluster.pdf</a></div>
</div>
</dd>
</dl>
</div>
<div class="section" id="citation">
<h2>1.5. Citation<a class="headerlink" href="#citation" title="Permalink to this headline">¶</a></h2>
<p>Please cite us when using pyGlcuster in your work.</p>
<p>Jaeger, D., Barth, B., Niehues, A. and Fufezan, C. (2013) pyGCluster, a novel hierarchical clustering approach</p>
<dl class="docutils">
<dt>The original publication can be found here:</dt>
<dd><div class="first last line-block">
<div class="line"><a class="reference external" href="http://bioinformatics.oxfordjournals.org">http://bioinformatics.oxfordjournals.org</a>...</div>
<div class="line"><a class="reference external" href="http://bioinformatics.oxfordjournals.org">http://bioinformatics.oxfordjournals.org</a>...</div>
</div>
</dd>
</dl>
</div>
<div class="section" id="installation">
<h2>1.6. Installation<a class="headerlink" href="#installation" title="Permalink to this headline">¶</a></h2>
<p>Please execute the following command in the pyGCluster folder:</p>
<div class="highlight-python"><pre>sudo python setup.py install</pre>
</div>
<div class="section" id="installation-notes">
<h3>1.6.1. Installation notes<a class="headerlink" href="#installation-notes" title="Permalink to this headline">¶</a></h3>
<p>If Windows XP (SP3) is used please make sure to install SciPy version 0.10.0</p>
</div>
<div class="section" id="functionality-check">
<h3>1.6.2. Functionality check<a class="headerlink" href="#functionality-check" title="Permalink to this headline">¶</a></h3>
<p>After installation, please run the script test_pyGCluster.py from the
exampleScripts folder to check if pyGCluster was installed properly.</p>
<span class="target" id="module-test_pyGCluster"></span><p>Testscript to demonstrate functionality of pyGCluster</p>
<p>A synthetic dataset is used to check the correct installation of pyGCluster.
This dataset cotains 10 ratios (Gene 0-9) which were randonmly sampled between
39.5 and 40.5 in 0.1 steps with a low standard deviation (randonmly sampled
between 0.1 and 1) and 90 ratios (Gene 10-99) which were randonmly sampled
between 3 and 7 in 0.1 steps with a high standard deviation (randonmly sampled
between 0.1 and 5)</p>
<p>5000 iterations are performed and the presence of the most frequent cluster is
checked.</p>
<p>This cluster should contain the Genes 0 to 9.</p>
<p>Usage:</p>
<div class="highlight-python"><pre>./test_pyGCluster.py</pre>
</div>
<p>When the iteration has finished (this should normally take not longer than 20
seconds), the script asks if you want to stop the iteration process or continue:</p>
<div class="highlight-python"><pre>iter_max reached. See convergence plot. Stopping re-sampling if not defined
otherwise ...
... plot of convergence finished.
See plot in "../exampleFiles/functionalityCheck/convergence_plot.pdf".
Enter how many iterations you would like to continue.
(Has to be a multiple of iterstep = 5000)
(enter "0" to stop resampling.)
(enter "-1" to resample until iter_max (= 5000) is reached.)
Enter a number ...</pre>
</div>
<p>Please enter 0 and hit enter (The script will stop and the test will finish).</p>
<p>The results are saved into the folder functionalityCheck.</p>
<p>Additionally expression maps and expression profiles are plotted.</p>
</div>
</div>
<div class="section" id="references">
<h2>1.7. References<a class="headerlink" href="#references" title="Permalink to this headline">¶</a></h2>
<blockquote>
<div><div class="line-block">
<div class="line">Bréhélin,L. et al. (2008) Using repeated measurements to validate hierarchical gene clusters. Bioinformatics, 24, 682-628.</div>
<div class="line">Gansner,E.R. and North,S.C. (2000) An open graph visualization system and its applications to software engineering. Software Pract. Exper., 30, 1203-1233.</div>
<div class="line">Handl,J. et al. (2005) Computational cluster validation in post-genomic data analysis. Bioinformatics, 21, 3201-3212.</div>
<div class="line">Höhner,R. et al. (2013) The metabolic status drives acclimation of iron deficiency responses in Chlamydomonas reinhardtii as revealed by proteomics based hierar-chical clustering and reverse genetics. Mol. Cell. Proteomics, in press.</div>
<div class="line">Müllner,D. (2013) fastcluster: fast hierarchical agglomerative clustering routines for R and Python. J. Stat. Softw., 53, 1-18.</div>
<div class="line">Saeed,A.I. et al. (2003) TM4: A free, open-source system for microarray data man-agement and analysis. Biotechniques, 34, 374-378.</div>
<div class="line">Shannon,P. et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res., 13, 2498-2504.</div>
<div class="line">Si,J. et al. (2011) Model-based clustering for rna-seq data. Joint statistical meeting, Juli 30 - August 4, Florida.</div>
</div>
</div></blockquote>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">1. Introduction</a><ul>
<li><a class="reference internal" href="#algorithm-workflow">1.1. Algorithm Workflow</a><ul>
<li><a class="reference internal" href="#re-sampling-clustering">1.1.1. Re-sampling & clustering</a></li>
<li><a class="reference internal" href="#community-construction">1.1.2. Community construction</a></li>
<li><a class="reference internal" href="#node-map-and-expression-map-example">1.1.3. Node map and expression map example</a></li>
</ul>
</li>
<li><a class="reference internal" href="#general-information">1.2. General information</a><ul>
<li><a class="reference internal" href="#contact-information">1.2.1. Contact information</a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation">1.3. Implementation</a></li>
<li><a class="reference internal" href="#download">1.4. Download</a></li>
<li><a class="reference internal" href="#citation">1.5. Citation</a></li>
<li><a class="reference internal" href="#installation">1.6. Installation</a><ul>
<li><a class="reference internal" href="#installation-notes">1.6.1. Installation notes</a></li>
<li><a class="reference internal" href="#functionality-check">1.6.2. Functionality check</a></li>
</ul>
</li>
<li><a class="reference internal" href="#references">1.7. References</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="index.html"
title="previous chapter">Welcome to pyGCluster’s documentation!</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="pyGCluster.html"
title="next chapter">2. Module pyGCluster</a></p>
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/intro.txt"
rel="nofollow">Show Source</a></li>
</ul>
<div id="searchbox" style="display: none">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="pyGCluster.html" title="2. Module pyGCluster"
>next</a> |</li>
<li class="right" >
<a href="index.html" title="Welcome to pyGCluster’s documentation!"
>previous</a> |</li>
<li><a href="index.html">pyGCluster 0.18.4 documentation</a> »</li>
</ul>
</div>
<div class="footer">
© Copyright 2013, Daniel Jaeger, Johannes Barth, Anna Niehues and Christian Fufezan.
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.2.
</div>
</body>
</html>