Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent analysis of after all #19

Open
AngledLuffa opened this issue Dec 28, 2024 · 4 comments
Open

Consistent analysis of after all #19

AngledLuffa opened this issue Dec 28, 2024 · 4 comments

Comments

@AngledLuffa
Copy link
Contributor

There are a few different analyses of after all in this treebank:

# sent_id = en_lines-ud-train-doc6-2241
# text = We felt it was his day, after all.
1       We      we      PRON    PERS-P1PL-NOM   Case=Nom|Number=Plur|Person=1|PronType=Prs      2       nsubj   _       _
2       felt    feel    VERB    PAST    Mood=Ind|Tense=Past|VerbForm=Fin        0       root    _       _
3       it      it      PRON    PERS-SG _       6       nsubj   _       _
4       was     be      AUX     PAST    Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin   6       cop     _       _
5       his     his     PRON    P3SG-GEN        Case=Gen|Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs 6       nmod:poss       _       _
6       day     day     NOUN    SG-NOM  Number=Sing     2       xcomp   _       SpaceAfter=No
7       ,       ,       PUNCT   Comma   _       8       punct   _       _
8       after   after   ADV     _       _       6       advmod  _       _
9       all     all     ADV     _       _       8       fixed   _       SpaceAfter=No
10      .       .       PUNCT   Period  _       2       punct   _       _

# sent_id = en_lines-ud-train-doc3-1011
# text = The United States is, after all, the prime revolutionary country.
1       The     the     DET     DEF     Definite=Def|PronType=Art       2       det     _       _
2       United  United  PROPN   SG-NOM  Number=Sing     12      nsubj   _       _
3       States  States  PROPN   SG-NOM  Number=Plur     2       flat    _       _
4       is      be      AUX     PRES    Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   12      cop     _       SpaceAfter=No
5       ,       ,       PUNCT   Comma   _       7       punct   _       _
6       after   after   ADP     _       _       7       case    _       _
7       all     all     PRON    TOT-SG  Case=Nom        12      nmod    _       SpaceAfter=No

# sent_id = en_lines-ud-train-doc4-1418
# text = After all, I also was a part of the great cause of these high and just proceedings.
1       After   after   ADP     _       _       2       case    _       _
2       all     all     PRON    TOT-SG  Case=Nom        8       nmod    _       SpaceAfter=No
3       ,       ,       PUNCT   Comma   _       2       punct   _       _
4       I       I       PRON    PERS-P1SG-NOM   Case=Nom|Number=Sing|Person=1|PronType=Prs      8       nsubj   _       _
5       also    also    ADV     _       _       8       advmod  _       _
6       was     be      AUX     PAST    Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin   8       cop     _       _
7       a       a       DET     IND-SG  Definite=Ind|PronType=Art       8       det     _       _
8       part    part    NOUN    SG-NOM  Number=Sing     0       root    _       _
9       of      of      ADP     _       _       12      case    _       _
...

and elsewhere

Actually, none of these quite agree with what is done in EWT, which treats all as a DET in this expression:

# sent_id = weblog-blogspot.com_dakbangla_20050311135387_ENG_20050311_135387-0218
# text = We are, after all, in this together.
1       We      we      PRON    PRP     Case=Nom|Number=Plur|Person=1|PronType=Prs      8       nsubj   8:nsubj _
2       are     be      AUX     VBP     Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin   8       cop     8:cop   SpaceAfter=No
3       ,       ,       PUNCT   ,       _       2       punct   2:punct _
4       after   after   ADP     IN      _       5       case    5:case  _
5       all     all     DET     DT      PronType=Tot    8       obl     8:obl:after     SpaceAfter=No
6       ,       ,       PUNCT   ,       _       5       punct   5:punct _
7       in      in      ADP     IN      _       8       case    8:case  _
8       this    this    PRON    DT      Number=Sing|PronType=Dem        0       root    0:root  _
9       together        together        ADV     RB      _       8       advmod  8:advmod        SpaceAfter=No
10      .       .       PUNCT   .       _       8       punct   8:punct _

Open to having a PR which unifies these treatments?

@nschneid
Copy link
Contributor

I would favor the EWT approach. Note also that "all" attaches as obl, not nmod.

@LarsAhrenberg
Copy link
Contributor

Thanks for pointing out the inconsistencies. I will fix them.

As regards the choice of DET vs. PRON for all I have followed the same annotation guidelines as for the Swedish treebanks and, apparently, generally for UD v.1. The UPOS would depend on the presence or absence of a head word. One reason is that I want the annotation for English-LinES and Swedish-LinES to be as similar as possible. And I'm not sure that the guidelines for English DET vs. PRON is in total agreement with the general guidelines for DET. What is the "hypothetical modified noun" in cases such as first of all, above all, at all?

I am also puzzled why demonstratives are treated differently from the quantifiers all, some, each, ... in following the v.1 guidelines? The general guidelines say that DETs in comparison to PRONs "are more likely to be used attributively (modifying a noun phrase) than substantively (replacing a noun phrase). Is there a difference here for English between quantifiers and demonstratives?

I note that German has a stricter division, only words that cannot be used attributively are assigned the UPOS PRON. This means that English, German and Swedish apply different principles for the choice of DET vs PRON.

It would be easy to make English_LinES follow the current principles, it would just mean changing the UPOS and FEATS in accordance with the proposed values when there is a deviation. (And it would not be so difficult to change back). However, I'd prefer waiting some time in the hope that there may be more agreement among the Germanic language family as a whole.

@nschneid
Copy link
Contributor

You are right that there is an exception for demonstratives in the English guidelines, which otherwise generally follow Penn conventions that never treat "all", "some", etc. as pronouns.

That decision predated me. Perhaps @jnivre can weigh in on whether a uniform definition of DET vs. PRON for Germanic languages is desirable (and worth the disruption to longstanding within-language practices).

@jnivre
Copy link

jnivre commented Dec 28, 2024

Thanks for pointing out the inconsistencies. I will fix them.

As regards the choice of DET vs. PRON for all I have followed the same annotation guidelines as for the Swedish treebanks and, apparently, generally for UD v.1. The UPOS would depend on the presence or absence of a head word. One reason is that I want the annotation for English-LinES and Swedish-LinES to be as similar as possible. And I'm not sure that the guidelines for English DET vs. PRON is in total agreement with the general guidelines for DET. What is the "hypothetical modified noun" in cases such as first of all, above all, at all?

I am also puzzled why demonstratives are treated differently from the quantifiers all, some, each, ... in following the v.1 guidelines? The general guidelines say that DETs in comparison to PRONs "are more likely to be used attributively (modifying a noun phrase) than substantively (replacing a noun phrase). Is there a difference here for English between quantifiers and demonstratives?

I note that German has a stricter division, only words that cannot be used attributively are assigned the UPOS PRON. This means that English, German and Swedish apply different principles for the choice of DET vs PRON.

It would be easy to make English_LinES follow the current principles, it would just mean changing the UPOS and FEATS in accordance with the proposed values when there is a deviation. (And it would not be so difficult to change back). However, I'd prefer waiting some time in the hope that there may be more agreement among the Germanic language family as a whole.

I agree with @LarsAhrenberg that it would be good to discuss this more generally, at least for Germanic languages. I think the special treatment of demonstratives in English is a heritage from the PTB. However, it is not clear that this should carry over to UPOS tags (as opposed to XPOS tags).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants