Issue
The Portuguese stemmer vocabulary tests are failing with many mismatches between expected and actual output.
Examples
abalado expected: abal, got: abalad
abandonado expected: abandon, got: abandonad
ação expected: açã, got: aça~o
The stemmer is not properly removing common suffixes like -ado, -ção, etc.
Root Cause
The Portuguese stemmer implementation doesn't fully match the official Snowball algorithm specification at https://snowballstem.org/algorithms/portuguese/stemmer.html
What Needs to be Done
- Review and correct all stemming steps against the official algorithm
- Verify standard suffix removal, verb suffix removal, and residual suffix removal
- Ensure RV, R1, and R2 region detection is correct
- Check vowel removal and final cleanup steps
- Run vocabulary tests until they pass
Test Command
go test ./portuguese_vocab -v
Related
This was introduced in the initial implementation of Portuguese stemmer.
Issue
The Portuguese stemmer vocabulary tests are failing with many mismatches between expected and actual output.
Examples
abaladoexpected:abal, got:abaladabandonadoexpected:abandon, got:abandonadaçãoexpected:açã, got:aça~oThe stemmer is not properly removing common suffixes like
-ado,-ção, etc.Root Cause
The Portuguese stemmer implementation doesn't fully match the official Snowball algorithm specification at https://snowballstem.org/algorithms/portuguese/stemmer.html
What Needs to be Done
Test Command
go test ./portuguese_vocab -vRelated
This was introduced in the initial implementation of Portuguese stemmer.