-
Notifications
You must be signed in to change notification settings - Fork 5
Description
We have a trait FASTTEntity and it seems that all FAST entities should use this trait but it does not help for extensibility.
The only things this trait is bringing is to save the start pos and end pos of the source code of a node as a number of characters. But this brings a limitation.
In TreeSitter we have the positions in number of bytes. And converting the number of bytes in number of character is really costly.
I have an importer that just instantiate one FAST entity for each TreeSitter nodes without any additional work and the vast majority of the time spent is in the conversion of the positions from byte to characters.
I parsed a file of almost 900 lines of code and it took 6.5sec. If I use the positions in bytes the import time drop at 100ms.
Since we have more and more FAST implementation based on TreeSitter, if we want to have something efficient, we need to save the positions in bytes instead of the number of characters. This would also make the reading of the source text faster.
What I propose is:
- Deprecate FASTTEntity (We do not have a FamixTEntity, so why have one in FAST ?)
- Introduce FASTTCharacterRelativeSource that would be the current FASTTEntity
- Introduce FASTTByteRelativeSource that would be an alternative implementation based on the number of bytes.
This would allow TreeSitter to use the FASTTByteRelativeSource and be much much much faster to import models.
Related issue: Evref-BL/Pharo-Tree-Sitter#32