Skip to content

Commit e15caea

Browse files
Comma bugfix for En electronics (#332)
* fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> --------- Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent a506e1c commit e15caea

File tree

3 files changed

+7
-5
lines changed

3 files changed

+7
-5
lines changed

Jenkinsfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ pipeline {
1313

1414
AR_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/04-24-24-0'
1515
DE_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/10-23-24-0'
16-
EN_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/09-04-24-0'
16+
EN_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/09-25-25-0'
1717
ES_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/09-25-24-0'
1818
ES_EN_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/08-30-24-0'
1919
FR_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/04-07-25-0'

nemo_text_processing/text_normalization/en/taggers/electronic.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,14 +127,15 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True):
127127

128128
full_stop_accep = pynini.accep(".")
129129
dollar_accep = pynini.accep("$") # Include for the correct transduction of the money graph
130-
excluded_symbols = full_stop_accep | dollar_accep
130+
excluded_symbols = full_stop_accep | dollar_accep | pynini.accep(",")
131131
filtered_symbols = pynini.difference(accepted_symbols, excluded_symbols)
132132
accepted_characters = NEMO_ALPHA | NEMO_DIGIT | filtered_symbols
133133
domain_component = full_stop_accep + pynini.closure(accepted_characters, 2)
134-
graph_domain = (
134+
graph_domain = pynutil.add_weight(
135135
pynutil.insert('domain: "')
136136
+ (pynini.closure(accepted_characters, 1) + pynini.closure(domain_component, 1))
137-
+ pynutil.insert('"')
137+
+ pynutil.insert('"'),
138+
0.1,
138139
).optimize()
139140

140141
graph |= pynutil.add_weight(graph_domain, MIN_NEG_WEIGHT)

tests/nemo_text_processing/en/data_text_normalization/test_cases_electronic.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,5 @@ https://www.nvidia.com/dgx-basepod/~HTTPS colon slash slash WWW dot NVIDIA dot c
4141
i can use your card ending in 8876~i can use your card ending in eight eight seven six
4242
upgrade/update~upgrade slash update
4343
upgrade / update~upgrade slash update
44-
upgrade/update/downgrade~upgrade slash update slash downgrade
44+
upgrade/update/downgrade~upgrade slash update slash downgrade
45+
5.4, or 5.5~five point four, or five point five

0 commit comments

Comments
 (0)