doc.set_ents uses tokens but Span uses char #7130
-
|
I'm trying to remove trailing special character at the end of entities. However, doc.set_ents uses token but Span uses character count. What's a good way to resolve this conflict? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
|
Hi! This is a good question for the discussion forum - so I'll move it there. You might get a message that this thread was closed/locked, but we can still continue the conversation on the open thread there, and it should automatically forward you. |
Beta Was this translation helpful? Give feedback.
-
|
You can call Also, please note that your custom component "remove_specials" should return the Finally, I'm not sure this function will work as you intend it when you apply it on a doc with multiple entities in |
Beta Was this translation helpful? Give feedback.
-
|
EDIT: (Thanks to @adrianeboyd for pointing out an error in my original response) I am certain that you are long past this question, but your pattern in |
Beta Was this translation helpful? Give feedback.


You can call
ent.startandent.endinstead ofent.start_charandent.end_charto obtain the token indices of the entity instead of the char indices.Also, please note that your custom component "remove_specials" should return the
docat the end of its processing.Finally, I'm not sure this function will work as you intend it when you apply it on a doc with multiple entities in
doc.ents, becausedoc.set_entsalways overwrites the entire set of entities. Instead, you probably want to build up the list of new entities and callset_entsonce on thedocright before returning it.