Skip to content

Portuguese stemmer vocabulary tests failing #34

@kljensen

Description

@kljensen

Issue

The Portuguese stemmer vocabulary tests are failing with many mismatches between expected and actual output.

Examples

  • abalado expected: abal, got: abalad
  • abandonado expected: abandon, got: abandonad
  • ação expected: açã, got: aça~o

The stemmer is not properly removing common suffixes like -ado, -ção, etc.

Root Cause

The Portuguese stemmer implementation doesn't fully match the official Snowball algorithm specification at https://snowballstem.org/algorithms/portuguese/stemmer.html

What Needs to be Done

  1. Review and correct all stemming steps against the official algorithm
  2. Verify standard suffix removal, verb suffix removal, and residual suffix removal
  3. Ensure RV, R1, and R2 region detection is correct
  4. Check vowel removal and final cleanup steps
  5. Run vocabulary tests until they pass

Test Command

go test ./portuguese_vocab -v

Related

This was introduced in the initial implementation of Portuguese stemmer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions