-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add list of all names to Taxon #18
Conversation
Reviewer's Guide by SourceryThis pull request adds a new feature to the Taxon class in the taxopy library, introducing a 'names' property that provides a list of all names associated with a taxon, including scientific names, common names, and other variants. The implementation involves modifications to the TaxDb class to store and manage multiple names per taxon, and updates to the Taxon class to expose this new information. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mkuhn - I've reviewed your changes - here's some feedback:
Overall Comments:
- Please update the documentation to reflect the new
names
property and its usage. This will help users understand and utilize this new feature effectively. - Consider the performance implications of storing all names for every taxon. You might want to implement lazy loading of names or provide an option to disable this feature for users who don't need it, to minimize memory usage.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
Hi @mkuhn, Sorry for the delay. I’ve been on post-congress vacation and will be back to work on Thursday. Thanks for the PR! Let's merge it into the main branch :) A few comments:
|
No worries! I could go ahead with my own fork for now.
Hmm, it's probably cleaner to have on internal dictionary, taxid2names which contains a tuple of names for each kind of name. E.g. for Sus scrofa, these are the names from
So there are two common names. To initialize the name of a species, one could then get it with Here are the counts of name types: 25 genbank acronym So it's a limited list.
Just a rough estimate,
Please let me know if I should go ahead and make the internal |
So, would A potential solution is to convert >>> taxid2name[9823]
"Sus scrofa"
>>> taxid2name.common_name[9823]
("pigs <Sus scrofa>", "swine", "wild boar")
>>> taxid2name.acronym[9823]
None # or KeyError Converting |
Ah, I thought that I find the code example a bit confusing. As a Python coder, I wouldn't expect a dictionary(-like object) to have an attribute that is then also a dictionary. Which organization level makes more sense to you, How about:
With |
Good point. I like your suggestion! Do you think you could update the PR with this API? |
I've been writing too much R recently, which rots your brain: a single item in Python is of course not the same as a single-item list, unlike in R. So the API for all names needs to always return a list (while
I've update the code and added it also to |
Thanks you! I'll take a better look at the code in the next couple of days |
Sorry that it took me so long to get to this, @mkuhn. I wasn't in a rush because you mentioned that you didn't need this available in the package to get your work done. I only had one comment about the code. After we clarify that, I'll merge the PR. |
No worries! The other day I got a question on a PR from eight years ago, I had to jog my memory quite a bit on that one. Which comment did you have about the code? I didn't see it. |
Now I feel less bad :) Here's the review: https://github.com/apcamargo/taxopy/pull/18/files/099eb5322f19d1591a9182c066ad142a1a88ed5d#diff-6047f6e01ec3eb44d31bc0b381e7b7771a860b12a1dce2452636673f2ca088acR249 |
ok, I've addressed the comments in the "conversations" dropdown, hope everything is fine now! |
Sorry, @mkuhn, I think I overcomplicated things and you might have missed my comment. In lines 249-252 of if self._merged_dmp:
for oldtaxid, newtaxid in self._oldtaxid2newtaxid.items():
taxid2name[oldtaxid] = taxid2name[newtaxid]
taxid2all_names[oldtaxid] = taxid2all_names[newtaxid] As a result, this block is now executed within the |
Ah, I see! I can't remember this change, but it's more consistent to have it outside the |
Thank you, @mkuhn! |
Hi Antonio, not sure if you will find this useful, but I needed a more complete list of names for a taxon in addition to the scientific name. I didn't update the docs yet. If you'd consider merging this, I can also add something to the documentation.
Summary by Sourcery
Introduce a new feature to the
Taxon
class that allows retrieval of a comprehensive list of names for a taxon, enhancing the existing functionality by including various name types such as scientific, common, and authority names.New Features:
names
to theTaxon
class that provides a list of all names associated with a taxon, including scientific, common, and authority names.Enhancements:
_import_names
method to populate a new dictionarytaxid2names
that stores all names associated with each taxon ID.