-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tab annotations #1175
base: master
Are you sure you want to change the base?
Tab annotations #1175
Conversation
@duncanka I'm having a similar nightmare with this. I'm running an annotation server which, for whatever reason, often includes newlines in annotations, and they crash brat completely. Your solution for splitting newlines in to separate annotations mostly works, but breaks if the annotation is only newlines (i.e "\n\n") I also tried digging around to figure out where the annotations are written to the file, and simply escape it, but I still haven't grasped it completely. What was your reason for avoiding this approach? |
@benolayinka I thought it was impossible to annotate an all-whitespace span anyway—i.e., the UI won't even allow it. I know zero-width spans were implemented a while back, but as far as I can tell the feature is basically non-functional at this point. What's your use case that requires all-newline annotations? It's been a while since I wrote this patch or otherwise tinkered with brat, but I think the main reason I took this approach was simply that it was easier. If you do want to try implementing escaping, I believe the code you want to modify is in |
It is impossible to do it in the visual annotator, but I have spacy hooked
up for automatic annotation and it sometimes annotates solely newlines.
Since I know these are not useful annotations, I configured the wrapper to
drop annotations which are only newlines, but that's just a hack. Brat
accepts the annotations, displays them, and writes them to the .ann file,
but the newlines are printed as actual newlines, so the next time a user
opens the same file, it crashes.
I was hoping to escape the text annotations to prevent unexpected failures
in the future. I have explicitly solved this particular issue by dropping
those newlines, but the solution is not robust and any other control
characters will probably break it all over again. I believe an escaped
unicode string is the most future-proof solution, and conceptually it seems
like a simple job. I'll keep poking around.
Ben Olayinka
Words @ Kids creative agency <http://thekids.eu>
Events and Music @ Dazed and Confused <http://dazedandconfused.club>
Via Facebook <http://facebook.com/benolayinka>
Via Medium <https://medium.com/@benolayinka>
…On Sat, Oct 28, 2017 at 8:27 AM, Jesse Dunietz ***@***.***> wrote:
@benolayinka <https://github.com/benolayinka> I thought it was impossible
to annotate an all-whitespace span anyway—i.e., the UI won't even allow it.
I know zero-width spans were implemented a while back, but as far as I can
tell the feature is basically non-functional at this point. What's your use
case that requires all-newline annotations?
It's been a while since I wrote this patch or otherwise tinkered with
brat, but I think the main reason I took this approach was simply that it
was easier. If you do want to try implementing escaping, I believe the code
you want to modify is in annotation.py
<https://github.com/nlplab/brat/blob/master/server/src/annotation.py>
(particularly TextBoundAnnotationWithText.__str__
<https://github.com/nlplab/brat/blob/0d21bf91b2b9c2cf12384f3206531a92d8f1f699/server/src/annotation.py#L1529>)
and annotator.py
<https://github.com/nlplab/brat/blob/master/server/src/annotator.py>
(particularly _create_span
<https://github.com/nlplab/brat/blob/0d21bf91b2b9c2cf12384f3206531a92d8f1f699/server/src/annotator.py#L353>).
But part of my fear with escaping was that there are other corners of the
code that unexpectedly rely on the underlying text exactly matching the
error-checking text in the .ann files.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1175 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGpA7D7nloOC6z5AVV9DOuBhP4iXv2mDks5swslCgaJpZM4Ij5-H>
.
|
c940b7c
to
564a2e5
Compare
Previously, attempting to make an annotation spanning newlines with multiple fragments would behave unreliably, and sometimes cause data corruption. This is now fixed. Additionally, I extended the same discontinuous span technique used to solve #786 to solve #819, adding support for annotations spanning tabs. (Arguably, proper escaping would still be a more elegant solution.)
Unresolved issue: when virtual fragments are created to handle line splits and tabs, deleting/moving a virtual fragment deletes only that one small chunk. It does not delete anything beyond the nearest newline or tab, which may mean part of the larger fragment the user meant to delete still remains.
(Apologies for all the distracting whitespace deletions from editing in Emacs.)