-
Notifications
You must be signed in to change notification settings - Fork 4
Full genome (Gf)
It can be used to simulate intergenic regions and how their length is affected by different events.
To better understand this, let us look at the figure below. Here we have an example of a genome with 5 genes and 5 intergenic regions. Their respective lengths are in the tables on the right. For the sake of clarity, you can see also a linear version of the genome.
We will use this genome to illustrate how events affect gene and intergene regions. But first of all, let us see how the legth of a segment affected by an event is determined. Zombi starts choosing randomly a position within the intergenes. Notice that the intergene region 4 (between D and E) has a length of 0 but it still could be chosen, since what is selected is the position between the "nucleotides" and not the nucleotides themselves. In the figure below, the position sampled is indicated by the black arrow.
Second, a direction is chosen randomly (either left or right), and the extension of the event is obtained by sampling from a geometric distribution (more details on this at the end of this explanation). In a first attempt (see the table), the direction is "right" and the extension of the event is 6.
In this case, the end of the event ends within a gene, which is considered a failure. A new direction and a new extension is chosen randomly until the end of the event ends within an intergene region. In the example this takes place in the 4th attempt (by default, if after 50 trials there is no success, the event does not take place)
Now let us see the different events, beginning with originations
Originations of new genes take place in a position within intergenes, but they do not have a parameter extension. A new gene is inserted in an intergene region, that is cut in two.
When a transposition takes place, the segment containing genes and intergenes is cut and inserted in a different position (outside the cut segment)
Inversion change the orientation of the genes affected
Transfer move genes between different genomes No replacement transfers are considered in this model
Duplication copy and paste the selected segment of the genome, right after the end of it.
There are two types of losses, and the frequency at which they occur can be established by changing the parameter PSEUDOGENIZATION (between 0 and 1), that gives the probability that a loss event is also a pseudogenization event.
If it is not, the affected region is removed completely from the genome. Otherwise, the genes affected become intergene regions.
Besides the files of the basic mode G, it will also output the Genes and the intergene lengths in the folder G/Genomes, in the files _LENGTHS.tsv
INTERGENE_LENGTHS
The mean length of the intergene region is controlled by the parameter length INTERGENE_LENGTH. When the simulation starts, the genome is created with a number of genes equals to INITIAL_GENOME_SIZE. There is also an equal number of intergene regions. The size of each is determined by sampling from a flat Dirichlet distribution with k = INITIAL_GENOME_SIZE and multiplied by k * INTERGENE_LENGTH.
The number is sampled from a geometric distribution with a parameter p equals to event_extension / ( intergene_length + mean(gene_length)). So far, this model does not support extension sampled from a non-geometric distribution.