Skip to content

Latest commit

 

History

History
276 lines (255 loc) · 25.5 KB

STATS.md

File metadata and controls

276 lines (255 loc) · 25.5 KB

Legal-CGEL Statistics

Analyzing 111 files:

Overview

  • Trees: 46
  • Nodes: 4133
  • Lexical Nodes: 1585 (34.5/tree)
  • Lexical Insertions (nodes where surface string is empty due to typo): 1
  • Gaps: 34
  • Punctuation Tokens: 130
  • Avg Tree Depth: 17.9

POS categories

POS count
N 485
D 279
P 279
V 156
Adj 123
V_aux 99
Coordinator 58
Sdr 42
Adv 38
N_pro 26

Lemmas occurring >=5 times, by categories the lemma appears in

  • {D}: a an any the this
  • {Coordinator}: and or
  • {V, V_aux}: be have
  • {V}: authorize provide require
  • {P}: as at by for from in of on under with
  • {N}: Secretary State amount chapter court paragraph person program section subsection time year
  • {P, Sdr}: to
  • {D, Sdr}: that
  • {Adj, D}: such
  • {N, V}: use
  • {N_pro}: it which
  • {V_aux}: may shall
  • {Adv}: not
  • {Adj, V}: appropriate

All lexemes of closed-class categories

  • D: 30, 4, 5,000, 500,000,000, a, all, an, any, both, each, every, five, more, no, once, such, that, the, these, this, two
  • N_pro: it, there, they, which, who, whoever, whose
  • V_aux: be, do, have, is, may, shall, will, would
  • P: after, against, as, as to, at, before, by, during, except, for, from, if, in, including, into, of, on, other than, out, prior, pursuant, than, thereof, throughout, to, under, unless, upon, when, whenever, where, with, within
  • Sdr: that, to, whether or not
  • Coordinator: and, but, or

Nonterminal categories

category count
Nom 626
NP 453
VP 348
PP 293
DP 281
Clause 243
AdjP 129
Coordination 58
Clause_rel 41
AdvP 40
GAP 34
N@flat 1
PP_strand 1

Functions

function count
Head 2450
Mod 343
Obj 317
Comp 286
Det 278
Coordinate 127
Marker 100
Subj 86
(root) 46
Supplement 27
PredComp 26
Prenucleus 14
Comp_ind 10
Postnucleus 8
DisplacedSubj 3
Flat 3
Det-Head 3
Head-Prenucleus 2
Particle 2
Marker-Head 1
Mod-Head 1

High Valencies (ternary+, omitting Supplements and Coordinations)

valency count
(VP :Head V :Obj NP :Comp PP) 6
(VP :Head V :Obj GAP :Comp PP) 5
(VP :Head V :PredComp AdjP :Comp PP) 1
(VP :Head V :Comp PP :Comp Clause) 1
(VP :Head Coordination :Obj NP :Comp PP) 1
(VP :Head V :Obj Coordination :Comp PP) 1
(VP :Head V :Particle PP :Comp PP) 1
(VP :Head V :Obj Coordination :Comp Clause) 1
(VP :Head V_aux :Comp Clause :DisplacedSubj NP) 1
(VP :Head V :Particle PP :Obj NP) 1
(VP :Head V :Obj NP :Comp Clause) 1

Nonlexical Categories by Function (excluding nonce categories)

Nom NP VP PP DP Clause AdjP Coordination Clause_rel AdvP GAP PP_strand
Head 533 17 317 12 2 26 6 31 21 2
Mod 60 3 4 82 3 34 99 5 18 35
Obj 302 8 7
Comp 155 120 3 5 1 1 1
Det 6 272
Coordinate 33 28 27 16 15 5 1 2
Subj 64 3 19
(root) 44 2
Supplement 1 18 3 3 2
PredComp 9 1 1 15
Prenucleus 13 1
Comp_ind 3 7
Postnucleus 5 3
Det-Head 3
DisplacedSubj 3
Head-Prenucleus 2
Particle 2
Mod-Head 1
Marker-Head 1

Parent Categories by Function (excluding nonce categories and root)

Nom NP VP PP Clause DP AdjP Coordination Clause_rel AdvP N PP_strand
Head 620 453 348 293 243 281 129 1 41 40 1
Mod 252 4 59 4 12 3 7 2
Obj 72 245
Comp 103 141 25 2 14 1
Det 278
Coordinate 127
Marker 12 14 43 7 14 3 1 6
Subj 66 20
Supplement 3 4 7 2 7 2 1 1
PredComp 21 5
Prenucleus 1 13
Comp_ind 10
Postnucleus 1 2 3 2
Flat 3
Det-Head 3
DisplacedSubj 3
Head-Prenucleus 2
Particle 2
Marker-Head 1
Mod-Head 1