Fix IndexError when running with splitted text chunks #500
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a bug report and also a potential fix to the bug.
Description
After splitting a long text and feeding each chunk into an LLM model to prompt for entities linking task, the
_get_prompt_data
function complains "IndexError: list index out of range" when trying to accessself._ents_cands_by_shard[i_doc]
.Syntax of the line
self._ents_cands_by_shard = [[] * len(self._ents_cands_by_doc)]
seems to imply a purpose to initialize a list with a certain length. But it won't fulfill the purpose, for example if you run[[] * 3]
you only get[[]]
rather than[[], [], []]
. Please justify the change at your deliberation.Before applying the change: error log
After applying the change: all external tests passed
Types of change
Bug fix
Checklist
tests
andusage_examples/tests
, and all new and existing tests passed. This includespytest
ran with--external
)pytest
ran with--gpu
)