Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nmoddesc #13

Merged
merged 5 commits into from
Jan 24, 2025
Merged

Nmoddesc #13

merged 5 commits into from
Jan 24, 2025

Conversation

AngledLuffa
Copy link

add nmod:desc to various titles which are used as part of a person's name

#9

@nschneid
Copy link

If you are able to share the Ssurgeon script that could be good for future reference.

@amir-zeldes
Copy link

Also if we could keep a list somewhere of known titles, I'm trying to figure out how to implement this for GUM... just ran into "Tsarevna" :)

@AngledLuffa AngledLuffa force-pushed the nmoddesc branch 2 times, most recently from 15e432d to 9f99123 Compare January 23, 2025 16:11
@AngledLuffa
Copy link
Author

@nschneid good call. I just put it into the changes themselves, eg the last three changes which affected the train set. The test & dev set I had done by hand.

@AngledLuffa
Copy link
Author

@amir-zeldes I used this regex based on what I saw when checking by hand:

/(?i:poet|Mr|Mrs|Madam|Commissioner|Messrs|Minister|President|Governor|Chancellor|economist|fellow|Director|philosopher|critic|King|novelist|playwright|Lady|author|Captain)/

but that clearly misses quite a few. other nobility titles, such as Queen or Duke ... (actually, seems I missed Queen myself, better go back and revise that)

other jobs: historian showed up in PUD, and if I recall was part of how this whole thing was kicked off

"Simple tailor Garak and Captain Sisko were observed loudly fighting shortly after the news of Senator Vreenak's death"

sports (could consider it part of job): defender, forward, goalie, quarterback, etc

eg "Quarterbacks Jalen Hurts and Jayden Daniels meet for the third time this weekend"

@AngledLuffa AngledLuffa force-pushed the nmoddesc branch 2 times, most recently from 135f6be to 31f7a1b Compare January 23, 2025 16:25
@AngledLuffa
Copy link
Author

alright, went back and touched up my current changes with Queen included. Happy to take any other suggestions for added titles to search for

@martinpopel
Copy link
Member

/(?i:poet|Mr|Mrs|Madam|...

There are way more personal roles. See e.g. the list in PersonalRoles.pm.

@amir-zeldes
Copy link

Very useful, thanks for pointing it out!

@AngledLuffa
Copy link
Author

Yes, quite helpful, thanks!

I will point out again there are quite a few sports ones missing: winger (or just wing), forward, center, quarterback... trying to hit a variety of team sports here

@AngledLuffa
Copy link
Author

also some female versions of titles: empress, duchess

@nschneid
Copy link

The script linked from UniversalDependencies/UD_English-EWT#561 has a bunch

@AngledLuffa
Copy link
Author

Admiral? ADMIRAL Kirk?

... but in general I found a few missing ones in ParTUT thanks to this list, so, thanks again for sharing,.

In the phrase his sister Laure, that is not nmod:desc, right? I should be checking against nmod:poss as part of the search string

@nschneid
Copy link

Admiral? ADMIRAL Kirk?

👍

... but in general I found a few missing ones in ParTUT thanks to this list, so, thanks again for sharing,.

In the phrase his sister Laure, that is not nmod:desc, right? I should be checking against nmod:poss as part of the search string

Right, "his sister" and "Laure" are separate nominals connected by appos. So this is different from Sister Laure.

@AngledLuffa AngledLuffa force-pushed the nmoddesc branch 4 times, most recently from 0aa2225 to 4641048 Compare January 24, 2025 00:35
@AngledLuffa
Copy link
Author

Alright, Martin's list helped me find one more in the dev set I had missed and a couple others in the train set. Thanks for the help! Will merge and call it a day

@AngledLuffa AngledLuffa merged commit dac117d into dev Jan 24, 2025
…n the train set *without* flipping the other edges... yet

Also flip nmod:desc for sentences where the word in question was the root

Will need to carefully adjust the links that used to go to Minister (and similar words), as presumably Prime should not modify Shinzo for example

Adjust words such as critic w/o doing any 'the critic ...'

Attempt to fix
    "poet and critic John Dryden"
    "Marxist playwright and director Bertolt Brecht"
compare to the original by using the first title as the nmod:desc

Ssurgeon script used follows.

Note that this clearly indicates a missing feature in Ssurgeon, macros

Also note that there could be an addition clause, "!> /nmod:poss/ {}", in each of the expressions.  There are two sentences affected by such a change:

should be updated:
 # text = **his Captain Ahab** in Moby-Dick is a classic tragic hero, inspired by King Lear.
should NOT be updated:
 # text = the following year he was joined by **his sister Laure** and they spent four years away from home.

Rather than hash out how to do that via Ssurgeon, if even possible, we simply edited this by hand.

 # in this expression:
 # the 'othertitle' is not a parent via nmod|compound|flat so that the phrase isn't
 # Mr Cox, Mr Hänsch, ...
{word:/(?i:actor|admiral|adviser|economist|father|general|judge|justice|lieutenant|lord|miss|mother|professor|representative|scholar|scientist|sister|writer|Queen|poet|Mr|Mrs|Madam|Commissioner|Messrs|Minister|President|Governor|Chancellor|economist|fellow|Director|philosopher|critic|King|novelist|playwright|Lady|author|Captain)/}=oldhead <conj ({word:/(?i:actor|admiral|adviser|economist|father|general|judge|justice|lieutenant|lord|miss|mother|professor|representative|scholar|scientist|sister|writer|Queen|poet|Mr|Mrs|Madam|Commissioner|Messrs|Minister|President|Governor|Chancellor|economist|fellow|Director|philosopher|critic|King|novelist|playwright|Lady|author|Captain)/}=othertitle <=reln {} !>/nmod|compound|flat/ {} !>det {} !< conj ({} >det {})) >/nmod|compound|flat/=dead {}=newhead . {}=newhead
reattachNamedEdge -edge reln -dep newhead
removeNamedEdge -edge dead
addEdge -gov newhead -dep othertitle -reln nmod:desc

{word:/(?i:actor|admiral|adviser|economist|father|general|judge|justice|lieutenant|lord|miss|mother|professor|representative|scholar|scientist|sister|writer|Queen|poet|Mr|Mrs|Madam|Commissioner|Messrs|Minister|President|Governor|Chancellor|economist|fellow|Director|philosopher|critic|King|novelist|playwright|Lady|author|Captain)/}=oldhead <conj ({word:/(?i:actor|admiral|adviser|economist|father|general|judge|justice|lieutenant|lord|miss|mother|professor|representative|scholar|scientist|sister|writer|Queen|poet|Mr|Mrs|Madam|Commissioner|Messrs|Minister|President|Governor|Chancellor|economist|fellow|Director|philosopher|critic|King|novelist|playwright|Lady|author|Captain)/}=othertitle !< {} !>/nmod|compound|flat/ {} !>det {} !< conj ({} >det {})) >/nmod|compound|flat/=dead {}=newhead . {}=newhead
removeNamedEdge -edge dead
addEdge -gov newhead -dep othertitle -reln nmod:desc
setRoots newhead

{word:/(?i:actor|admiral|adviser|economist|father|general|judge|justice|lieutenant|lord|miss|mother|professor|representative|scholar|scientist|sister|writer|Queen|poet|Mr|Mrs|Madam|Commissioner|Messrs|Minister|President|Governor|Chancellor|economist|fellow|Director|philosopher|critic|King|novelist|playwright|Lady|author|Captain)/}=oldhead <=reln {} >/nmod|compound|flat/=dead {}=newhead . {}=newhead !>det {} !< conj ({} >det {})
reattachNamedEdge -edge reln -dep newhead
removeNamedEdge -edge dead
addEdge -gov newhead -dep oldhead -reln nmod:desc

{word:/(?i:actor|admiral|adviser|economist|father|general|judge|justice|lieutenant|lord|miss|mother|professor|representative|scholar|scientist|sister|writer|Queen|poet|Mr|Mrs|Madam|Commissioner|Messrs|Minister|President|Governor|Chancellor|economist|fellow|Director|philosopher|critic|King|novelist|playwright|Lady|author|Captain)/}=oldhead !< {} >/nmod|compound|flat/=dead {}=newhead . {}=newhead !>det {} !< conj ({} >det {})
removeNamedEdge -edge dead
addEdge -gov newhead -dep oldhead -reln nmod:desc
setRoots newhead
 # make the child come after newhead so that skipped double titles aren't captured
 # eg, "poet and critic John Dryden"
{}=oldhead </nmod:desc/=nmod ({}=newhead .. {}=child) >=reln {}=child
reattachNamedEdge -edge reln -gov newhead
…, depending on what the relation is. Not amod|nmod|acl

 # not nmod|amod|acl because, when checked by hand, those indicated words that modified the title, not the person
 # acl for an instance of "**Managing** Director Dominique Strauss", where "Managing" is modifying "Director"
 # it's not clear that would generalize for other instances of acl
{}=oldhead </nmod:desc/=nmod ({}=newhead) >/^(?!.*nmod|amod|acl).*$/=reln {}=child -- {}=child
reattachNamedEdge -edge reln -gov newhead
@AngledLuffa AngledLuffa deleted the nmoddesc branch January 29, 2025 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants