Skip to content

Conversation

@ivokwee
Copy link
Member

@ivokwee ivokwee commented Jan 16, 2026

Summary

This PR improves the robustness and consistency of organism name handling and symbol annotation in the pgx-annot.R module.

Key Changes

1. Organism Name Normalization

  • Added normalization in getProbeAnnotation() for common organism names
    • "human" → "Homo sapiens"
    • "mouse" → "Mus musculus"
    • "rat" → "Rattus norvegicus"
    • "dog" / "canis familiaris" → "Canis familiaris"
  • Improved dog organism matching with regex: canis.*familiaris|^dog$
  • Applied consistent normalization across getGeneAnnotation() and getHumanOrtholog.biomart()

2. Symbol Handling Improvements

  • Breaking change (minor): Features with missing/NA symbols now use {feature} notation instead of empty string
    • Example: A probe "ENSG00000123456" with no symbol becomes {ENSG00000123456}
    • Ensures all features have readable, non-empty symbols for downstream analysis
  • Set gene_title to "Unknown feature" for features with missing symbols
  • Better preserves original probe names in annotation output

3. Error Handling Enhancements

  • Added try-catch for AnnotationHub::query() in probe2symbol()
  • Gracefully handle organisms not available in AnnotationHub
  • Prevents crashes when annotation resources are unavailable

Files Changed

  • R/pgx-annot.R (27 insertions, 14 deletions)

Impact

These changes improve:

  • Robustness: Better handling of edge cases and missing data
  • Consistency: Standardized organism name formats across functions
  • User Experience: All features now have meaningful display names
  • Error Prevention: Graceful degradation when annotation sources fail

Test Plan

  • Test with various organism name formats (human, Human, Homo sapiens)
  • Test with dog/canine annotations
  • Verify symbol annotation with missing/NA values
  • Test AnnotationHub fallback behavior
  • Confirm backward compatibility for existing workflows

Migration Notes

The main user-visible change is that features without symbols will now display as {feature_id} instead of empty strings. This should improve readability in plots and reports.

🤖 Generated with Claude Code

- Add organism name normalization in getProbeAnnotation (human/mouse/rat/dog)
- Improve dog organism matching with regex pattern (canis.*familiaris|^dog$)
- Fix symbol handling: use {feature} notation for missing/NA symbols instead of empty string
- Enhance probe2symbol error handling for organisms not in AnnotationHub
- Add try-catch for AnnotationHub queries to prevent crashes
- Preserve original probe names in annotation output
- Set gene_title to "Unknown feature" for features with missing symbols

These changes improve robustness when dealing with various organism name formats
and ensure all features have readable symbols for downstream analysis.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@ESCRI11 ESCRI11 self-requested a review January 16, 2026 13:45
Copy link
Contributor

@ESCRI11 ESCRI11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I see an issue throughout the file with this piece of code

if (tolower(organism) == "human") organism <- "Homo sapiens"
if (tolower(organism) == "mouse") organism <- "Mus musculus"
if (tolower(organism) == "rat") organism <- "Rattus norvegicus"
if (grepl("canis.*familiaris|^dog$",tolower(organism))) organism <- "Canis familiaris"  

I have two question about it

  1. why it has been added yet again on top of the file? all functions called contain the same code repeated
  2. it is not correctly repeated, in some functions it looks like
if (tolower(organism) == "human") organism <- "Homo sapiens"
if (tolower(organism) == "mouse") organism <- "Mus musculus"
if (tolower(organism) == "rat") organism <- "Rattus norvegicus"

(missing dog)

I assume its on each function in case we call them separately, in that case that piece of code should be made into an aux function to ensure consistency

…name handling

- Add normalizeOrganism() function to centralize organism name normalization
- Replace 9 duplicated if-statement blocks with calls to the helper function
- Ensure consistent dog/canine handling across all functions
- Fix "Canis LFamiliaris" typo (now consistently "Canis familiaris")

This addresses the code review feedback requesting DRY (Don't Repeat Yourself)
compliance for organism name normalization logic.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@ivokwee
Copy link
Member Author

ivokwee commented Jan 16, 2026

Addressed Review Feedback

Thanks for the review! I've addressed the concerns about code duplication:

Changes Made

  1. Created normalizeOrganism() helper function (lines 94-119) that centralizes all organism name normalization logic:

    normalizeOrganism <- function(organism) {
      if (is.null(organism) || is.na(organism)) return(organism)
      org_lower <- tolower(organism)
      if (org_lower == "human") return("Homo sapiens")
      if (org_lower == "mouse") return("Mus musculus")
      if (org_lower == "rat") return("Rattus norvegicus")
      if (grepl("canis.*familiaris|^dog$", org_lower)) return("Canis familiaris")
      organism
    }
  2. Replaced all 9 duplicated code blocks with a single call to organism <- normalizeOrganism(organism) in:

    • getProbeAnnotation()
    • getGeneAnnotation()
    • getGeneAnnotation.ANNOTHUB()
    • getHumanOrtholog.biomart()
    • .getOrgDb()
    • getOrgDb()
    • detect_probetype()
    • getOrganismGO()
    • convert_probetype()
  3. Fixed inconsistencies:

    • All functions now consistently support dog/canine organism names
    • Fixed typo "Canis LFamiliaris" → "Canis familiaris" in getHumanOrtholog.biomart()

Benefits

  • Single source of truth for organism mappings
  • Easy to add new organisms in one place
  • Consistent behavior across all functions
  • Easier to test and maintain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants