-The Basic Statistics module generates some simple composition
+The Basic statistics module generates some simple composition
statistics for the file analysed.
@@ -21,14 +21,14 @@
Summary
File type: Says whether the file appeared to contain actual base calls or
colorspace data which had to be converted to base calls
Encoding: Says which ASCII encoding of quality values was found in this
-file.
-
Total Sequences: A count of the total number of sequences processed.
+file.
+
Total Sequences: A count of the total number of sequences processed.
There are two values reported, actual and estimated. At the moment these
will always be the same. In the future it may be possible to analyse just
a subset of sequences and estimate the total number, to speed up the analysis,
but since we have found that problematic sequences are not evenly distributed
through a file we have disabled this for now.
-
Filtered Sequences: If running in Casava mode sequences flagged to be
+
Filtered Sequences: If running in Casava mode sequences flagged to be
filtered will be removed from all analyses. The number of such sequences
removed will be reported here. The total sequences count above will not include
these filtered sequences and will the number of sequences actually used for the
@@ -41,12 +41,12 @@
Summary
Warning
-Basic Statistics never raises a warning.
+Basic statistics never raises a warning.
Failure
-Basic Statistics never raises an error.
+Basic statistics never raises an error.
The Kmer Content module will do a generic analysis of all of the Kmers
-in your library to find those which do not have even coverage through
+in your library to find those which do not have even coverage through
the length of your reads. This can find a number of different sources
of bias in the library which can include the presence of read-through
adapter sequences building up on the end of your sequences.
@@ -26,15 +26,15 @@
Summary
be interested.
-One obvious class of sequences which you might want to analyse are
-adapter sequences. It is useful to know if your library contains a
-significant amount of adapter in order to be able to assess whether
-you need to adapter trim or not. Although the Kmer analysis can
+One obvious class of sequences which you might want to analyse are
+adapter sequences. It is useful to know if your library contains a
+significant amount of adapter in order to be able to assess whether
+you need to adapter trim or not. Although the Kmer analysis can
theoretically spot this kind of contamination it isn't always clear.
This module therefore does a specific search for a set of separately
defined Kmers and will give you a view of the total proportion of your
-library which contain these Kmers. A results trace will always be
-generated for all of the sequences present in the adapter config file
+library which contain these Kmers. A results trace will always be
+generated for all of the sequences present in the adapter config file
so you can see the adapter content of your library, even if it's low.
@@ -48,7 +48,7 @@
Summary
In addition to classic adapter sequences the default configuration also
includes polyA and polyG as sequences to search for. PolyA can be useful
to include when looking at RNA-Seq libraries. PolyG is present as a
-technical artefact in 2-colour illumina libraries where it is produced
+technical artefact in 2-colour illumina libraries where it is produced
when the signal from the cluster disappears. Both of these sequences
are generally trimmed from the 3' end of sequences, and are therefore
removed in a similar way to adapters, hence their inclusion in the default
@@ -71,7 +71,7 @@
Failure
Common reasons for warnings
Any library where a reasonable proportion of the insert sizes are shorter
-than the read length will trigger this module. This doesn't indicate a
+than the read length will trigger this module. This doesn't indicate a
problem as such - just that the sequences will need to be adapter trimmed
before proceeding with any downstream analysis.
diff --git a/Help/3 Analysis Modules/4 Per Base Sequence Content.html b/Help/3 Analysis Modules/4 Per Base Sequence Content.html
index bae1142..fc7bab3 100644
--- a/Help/3 Analysis Modules/4 Per Base Sequence Content.html
+++ b/Help/3 Analysis Modules/4 Per Base Sequence Content.html
@@ -27,11 +27,11 @@
Summary
It's worth noting that some types of library will always produce biased
-sequence composition, normally at the start of the read. Libraries
+sequence composition, normally at the start of the read. Libraries
produced by priming using random hexamers (including nearly all RNA-Seq libraries)
and those which were fragmented using transposases inherit an intrinsic
-bias in the positions at which reads start. This bias does not concern
-an absolute sequence, but instead provides enrichement of a number of
+bias in the positions at which reads start. This bias does not concern
+an absolute sequence, but instead provides enrichement of a number of
different K-mers at the 5' end of the reads. Whilst this is a true
technical bias, it isn't something which can be corrected by trimming
and in most cases doesn't seem to adversely affect the downstream analysis.
@@ -52,34 +52,35 @@
Failure
Common reasons for warnings
-There are a number of common scenarios which would ellicit a warning
+There are a number of common scenarios which would ellicit a warning
or error from this module.
Overrepresented sequences: If there is any evidence of overrepresented
-sequences such as adapter dimers or rRNA in a sample then these sequences
-may bias the overall composition and their sequence will emerge from this plot.
+sequences such as adapter dimers or rRNA in a sample then these sequences
+may bias the overall composition and their sequence will emerge from this plot.
Biased fragmentation: Any library which is generated based on the ligation
of random hexamers or through tagmentation should theoretically have good
-diversity through the sequence, but experience has shown that these libraries
+diversity through the sequence, but experience has shown that these libraries
always have a selection bias in around the first 12bp of each run. This is
due to a biased selection of random primers, but doesn't represent any individually
biased sequences. Nearly all RNA-Seq libraries will fail this module because of
-this bias, but this is not a problem which can be fixed by processing, and it
-doesn't seem to adversely affect the ablity to measure expression.
+this bias, but this is not a problem which can be fixed by processing, and it
+doesn't seem to adversely affect the ablity to measure expression.
Biased composition libraries: Some libraries are inherently biased in their
-sequence composition. The most obvious example would be a library which has been
+sequence composition. The most obvious example would be a library which has been
treated with sodium bisulphite which will then have converted most of the cytosines
-to thymines, meaning that the base composition will be almost devoid of cytosines
+to thymines, meaning that the base composition will be almost devoid of cytosines
and will thus trigger an error, despite this being entirely normal for that type of
library
-
If you are analysing a library which has been aggressivley adapter trimmed
-then you will naturally introduce a composition bias at the end of the reads as
-sequences which happen to match short stretches of adapter are removed, leaving
+
If you are analysing a library which has been aggressivley adapter trimmed
+then you will naturally introduce a composition bias at the end of the reads as
+sequences which happen to match short stretches of adapter are removed, leaving
only sequences which do not match. Sudden deviations in composition at the end
of libraries which have undergone aggressive trimming are therefore likely to be
spurious.
+
diff --git a/Help/3 Analysis Modules/6 Per Base N Content.html b/Help/3 Analysis Modules/6 Per Base N Content.html
index 443a119..0c10209 100644
--- a/Help/3 Analysis Modules/6 Per Base N Content.html
+++ b/Help/3 Analysis Modules/6 Per Base N Content.html
@@ -13,7 +13,7 @@
Per Base N Content
Summary
If a sequencer is unable to make a base call with sufficient confidence
-then it will normally substitute an N rather than a conventional base]
+then it will normally substitute an N rather than a conventional base
call
@@ -23,7 +23,7 @@
Summary
-It's not unusual to see a very low proportion of Ns appearing in a sequence,
+It's not unusual to see a very low proportion of Ns appearing in a sequence,
especially nearer the end of a sequence. However, if this proportion rises
above a few percent it suggests that the analysis pipeline was unable to
interpret the data well enough to make valid base calls.
@@ -43,15 +43,15 @@
Common reasons for warnings
The most common reason for the inclusion of significant proportions of Ns
is a general loss of quality, so the results of this module should be evaluated
-in concert with those of the various quality modules. You should check the
+in concert with those of the various quality modules. You should check the
coverage of a specific bin, since it's possible that the last bin in this analysis
-could contain very few sequences, and an error could be prematurely triggered in
+could contain very few sequences, and an error could be prematurely triggered in
this case.
Another common scenario is the incidence of a high proportions of N at a small
-number of positions early in the library, against a background of generally
-good quality. Such deviations can occur when you have very biased sequence
+number of positions early in the library, against a background of generally
+good quality. Such deviations can occur when you have very biased sequence
composition in the library to the point that base callers can become confused
and make poor calls. This type of problem will be apparent when looking at the
per-base sequence content results.
diff --git a/Help/3 Analysis Modules/7 Sequence Length Distribution.html b/Help/3 Analysis Modules/7 Sequence Length Distribution.html
index 27bbbb9..623b5e5 100644
--- a/Help/3 Analysis Modules/7 Sequence Length Distribution.html
+++ b/Help/3 Analysis Modules/7 Sequence Length Distribution.html
@@ -1,7 +1,7 @@
-Sequence Length Distribution
+Sequence length distribution
-
Sequence Length Distribution
+
Sequence length distribution
Summary
Some high throughput sequencers generate sequence fragments
of uniform length, but others can contain reads of wildly
-varying lengths. Even within uniform length libraries some
-pipelines will trim sequences to remove poor quality base calls
+varying lengths. Even within uniform length libraries some
+pipelines will trim sequences to remove poor quality base calls
from the end.