Add note about floating-point precision to README.md

althonos · Jun 22, 2020 · 2b4bfc9 · 2b4bfc9
1 parent 4a1c733
commit 2b4bfc9
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -15,6 +15,7 @@
 [![GitHub issues](https://img.shields.io/github/issues/althonos/orthoani.svg?style=flat-square&maxAge=600)](https://github.com/althonos/orthoani/issues)
 [![Downloads](https://img.shields.io/badge/dynamic/json?style=flat-square&color=303f9f&maxAge=86400&label=downloads&query=%24.total_downloads&url=https%3A%2F%2Fapi.pepy.tech%2Fapi%2Fprojects%2Forthoani)](https://pepy.tech/project/orthoani)
 
+
 ## 🗺️ Overview
 
 OrthoANI is a metric proposed by [Lee *et al.*](https://doi.org/10.1099/ijsem.0.000760)
@@ -40,6 +41,7 @@ $ pip install orthoani
 `orthoani` also requires the BLAST+ binaries to be installed on your machine
 and available somewhere in your `$PATH`.
 
+
 ## 💡 Example
 
 Use Biopython to load two FASTA files, and then `orthoani.orthoani` to compute
@@ -61,6 +63,7 @@ $ orthoani -q sequence1.fa -r sequence2.fa
 0.5725
 ```
 
+
 ## 🐏 Memory
 
 `orthoani` uses the machine temporary folder to handle BLAST+ input and output
@@ -72,6 +75,13 @@ happens, try changing the value of the `tempfile.tempdir` to a directory that
 is actually located on physical storage.
 
 
+## 📏 Precision
+
+Values computed by this package and the original Java implementation may differ
+slightly because in Java the authors perform rounding of floating-point values
+at the sub-percent level, while this library uses the full values.
+
+
 ## 📜 About
 
 This library is provided under the open-source

diff --git a/orthoani/__init__.py b/orthoani/__init__.py
@@ -86,29 +86,21 @@ def _hits(
         xdrop_gap=150,
         penalty=-1,
         reward=1,
-        num_alignments=1,
+        max_target_seqs=1,
         num_threads=threads,
         outfmt=5,
     )
     output = io.StringIO(cmd()[0])
 
-    identities = {}
+    hits = {}
     for record in NCBIXML.parse(output):
         if record.alignments:
             hsps = record.alignments[0].hsps
-            if all(hsp.align_length > 0.35 * blocksize for hsp in hsps):
-                q = record.query
-                r = record.alignments[0].hit_def
-
-                length = matches = 0
-                for hsp in hsps:
-                    for qn, sn in zip(hsp.query, hsp.sbjct):
-                        if _is_atgc(qn) and _is_atgc(sn):
-                            length += 1
-                            matches += (qn == sn)
-                identities[q, r] = matches / length
-
-    return identities
+            if all(hsp.align_length >= 0.35 * blocksize for hsp in hsps):
+                pos = sum(hsp.identities for hsp in hsps)
+                length = sum(hsp.align_length for hsp in hsps)
+                hits[record.query, record.alignments[0].hit_def] = pos / length
+    return hits
 
 
 def _orthoani(
@@ -133,9 +125,9 @@ def _orthoani(
 
     ani = 0.0
     for hit_q, hit_r in hits.items():
-        ani += (backward[hit_r, hit_q] + forward[hit_q, hit_r]) / 2
+        ani += backward[hit_r, hit_q] + forward[hit_q, hit_r]
     if hits:
-        ani /= len(hits)
+        ani /= len(hits) * 2
     return ani