Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the distance estimation as an output #18

Open
ShaiberAlon opened this issue May 2, 2017 · 4 comments
Open

Getting the distance estimation as an output #18

ShaiberAlon opened this issue May 2, 2017 · 4 comments

Comments

@ShaiberAlon
Copy link
Contributor

When using the -d option (for distance estimation) could there be a way to get the distance estimation as an output (for example in the SUMMARY file)?

@ShaiberAlon
Copy link
Contributor Author

ShaiberAlon commented May 3, 2017

My mistake! I see that there is an output '*distanceTable'. Sorry for that.

But I see that all the distances I got are either 1.0 or 0.0 does that make sense?
Also, is there a way to find out which contigs ended up in which scaffold?
And also, if I understand correctly the file '*network.gexf' has the orientation for each edge. Is there a way to also find, for each contig, to which strand it belongs (i.e. was the sequence of the contig converted to the reverse complement by MeDuSa)?

@EBosi
Copy link
Member

EBosi commented May 4, 2017

Hi

But I see that all the distances I got are either 1.0 or 0.0 does that make sense?

can you paste the command line used here?

Also, is there a way to find out which contigs ended up in which scaffold?
And also, if I understand correctly the file '*network.gexf' has the orientation for each edge. Is there a way to also find, for each contig, to which strand it belongs (i.e. was the sequence of the contig converted to the reverse complement by MeDuSa)?

I have to do that... I could merge both information in a single file, could you please tell me what do you think is the best way to do it? I'm looking at http://www.ebi.ac.uk/ena for usable formats

@ShaiberAlon
Copy link
Contributor Author

Hi,

Thank you very much for your quick response!
The command I used is:
java -jar medusa.jar -f 01_FASTA/ -i p214_sequence-FASTA-estimated-distance.fa -d -v -gexf -o p214_sequence-FASTA-medusa-fixed-estimated-distance.fa

If you wish, I could also provide you with the files I used (but notice that I used 106 reference genomes).

As for the format of exporting the information on contig arrangement, I think that a table with the full information on the contigs would be best. I would suggest a table, where each row corresponds to the input fasta file, and with the following columns: contig_number, contig_name, scaffold_name, contig_orientation, contig_reverse_complimented

Where:
contig_number: just a serial number from 1-N (N - the number of contigs in the input) according to the arrangement of the contigs in the final output fasta file
contig_name: the name of the contig in the input fasta file
scaffold_name: the name of the scaffold, in the output fasta, that contains the contig
contig_orientation: 1 or -1 according to whether the orientation of the contig was kept or changed.
contig_reverse_complimented: 1 or -1 according to whether the sequence nucleotides were kept or complimented.

In addition, if distance estimation is performed (-d), then I would suggest adding another column estimated_distance_to_next_contig (where the contigs at the end of a scaffold would just have a NaN or NA).

I hope this is helpful!

@oschwengers
Copy link
Contributor

Hello,
in addition to what @ShaiberAlon mentioned, coordinates of contig x in scaffold y (start, end) would be very helpful.

So maybe you could expand your output by a distinct and parsable (.tsv) file containing one line for each input contig:

contig_number, contig_name, scaffold_name, contig_orientation, contig_start, contig_end

Thanks a lot for this excellent tool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants