-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed parsing bug for USI continaining a colon in the interpretation #19
Conversation
Thank you for adding this. I think the intent was that the string would be parsed from left to right with tokenization going the whole way, instead of hacky splitting on Your solution looks good, but please see my note about |
My main reason for giving up on parsing colons is that the specification for the provenance identifiers is fully unclear, so it is not specified if the provenance id can contain any kind of bracket. Additionally parsing the bracket structure of pro forma is doable but quite complicated, you need to support arbitrarily deep nesting but also ignore any bracket type that is not the outer bracket. For example |
Okay, so that short-circuit option won't work, but shouldn't it still use Splitting Splitting |
Yes indeed it should be. I will update it. |
Okay, I can merge whenever you're ready then. |
I realised I removed the repository code by splitting on it, so I took the liberty of making it into an enum and using that to better represent all possible repositories. With this I am ready for a merge. |
I changed the parsing to recognise the provinance identifiers to split the last remaining piece (interpretation+provinance) based on that. I first explored a stack based approach but that quickly grew in complexity. Additionally the specification is very unclear about provinances, but I think you know that as well, so I did not want to assume anything about the validity of using any kind of brackets in the provinance. If the specification was updated to disallow any kind of bracket (all of "()[]{}<>") the code could be changed to look for the last colon, but if it finds any bracket before it finds a colon it returns the full tail as the interpretation.