Turkish PropBank (TRopBank) is a corpus of over 17.000 Turkish verbs, each annotated with their syntactic arguments and thematic roles. Arguments are bits of essential information attached to a verb (such as subject or object), and thematic roles are semantic classifications associated with these arguments (such as agent or patient). This resource allows matching between the syntax layer and the semantics layer for the processing of Turkish data.
In the field of SRL, PropBank is one of the studies widely recognized by the computational linguistics communities. PropBank is the bank of propositions where predicate- argument information of the corpora is annotated, and the semantic roles or arguments that each verb can take are posited.
Each verb has a frame file, which contains arguments applicable to that verb. Frame files may include more than one roleset with respect to the senses of the given verb. In the roleset of a verb sense, argument labels Arg0 to Arg5 are described according to the meaning of the verb. For the example below, the predicate is “announce” from PropBank, Arg0 is “announcer”, Arg1 is “entity announced”, and ArgM- TMP is “time attribute”.
[ARG0 Türk Hava Yolları] [ARG1 indirimli satışlarını] [ARGM-TMP bu Pazartesi] [PREDICATE açıkladı].
[ARG0 Turkish Airlines] [PREDICATE announced] [ARG1 its discounted fares] [ARGM-TMP this Monday].
The following Table shows typical semantic role types. Only Arg0 and Arg1 indicate the same thematic roles across different verbs: Arg0 stands for the Agent or Causer and Arg1 is the Patient or Theme. The rest of the thematic roles can vary across different verbs. They can stand for Instrument, Start point, End point, Beneficiary, or Attribute. Moreover, PropBank uses ArgM’s as modifier labels indicating time, location, temporal, goal, cause etc., where the role is not specific to a single verb group; it generalizes over the entire corpus instead.
| Tag | Meaning |
|---|---|
| Arg0 | Agent or Causer |
| ArgM-EXT | Extent |
| Arg1 | Patient or Theme |
| ArgM-LOC | Locatives |
| Arg2 | Instrument, start point, end point, beneficiary, or attribute |
| ArgM-CAU | Cause |
| ArgM-MNR | Manner |
| ArgM-DIS | Discourse |
| ArgM-ADV | Adverbials |
| ArgM-DIR | Directionals |
| ArgM-PNC | Purpose |
| ArgM-TMP | Temporals |
- Directional modifiers give information regarding the path of motion in the sentence. Directional modifiers may be mistakenly tagged as locatives.
- Locatives are used for the place where the action takes place.
- Manners define how the action is performed.
- Extent markers represent the amount of change that occurs in the action.
- Temporal modifiers keep the time of the action.
- Reciprocals are reflexives that refer to other arguments, like “himself,” “itself,” “together,” “each other,” and “both.”
- Secondary predication markers are used for adjuncts of the predicate, which holds predicate structure.
- Purpose clauses show the motivation for the action. Cause clauses simply show the reason for an action.
- Discourse markers connect the sentence to the previous sentence, such as “also,” “however,” “as well,” and “but.”
- Adverbials are used for syntactic elements that modify the sentence and are not labeled with one of the modifier tags stated above.
- “Will,” “may,” “can,” “must,” “shall,” “might,” “should,” “could,” “would,” and also “going (to),” “have (to),” and “used (to)” are modality adjuncts of the predicate and tagged as modal in PropBank.
- Negation is used to tag negative markers of the sentences.
The structure of a sample frameset is as follows:
<FRAMESET id="TR10-0006410">
<ARG name="ARG0">Engeli kaldıran kişi</ARG>
<ARG name="ARG1">Engelini kaldırdığı şey</ARG>
</FRAMESET>
Each entry in the frame file is enclosed by and tags. id shows the unique identifier given to the frameset, which is the same ID in the synset file of the corresponding verb sense. tags denote the semantic roles of the corresponding frame.
You can also see Python, Cython, Swift, C++, C, Java, Php, or C# repository.
To check if you have a compatible version of Node.js installed, use the following command:
node -v
You can find the latest version of Node.js here.
Install the latest version of Git.
npm install nlptoolkit-propbank
In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called util will be created. Or you can use below link for exploring the code:
git clone https://github.com/starlangsoftware/propbank-js.git
Steps for opening the cloned project:
- Start IDE
- Select File | Open from main menu
- Choose
PropBank-Jsfile - Select open as project option
- Couple of seconds, dependencies will be downloaded.
Frame listesini okumak ve tüm Frameleri hafızada tutmak için
a = FramesetList()
Framesetleri tek tek gezmek için
for (let i = 0; i < a.size(); i++){
frameset = a.getFrameset(i)
}
Bir fiile ait olan Frameseti bulmak için
frameset = a.getFrameSet("TUR10-1234560")
Bir framesetin tüm argümanlarını bulmak için
getFramesetArguments(): Array<FramesetArgument>
@inproceedings{kara-etal-2020-tropbank,
title = "{TR}op{B}ank: {T}urkish {P}rop{B}ank V2.0",
author = {Kara, Neslihan and
Aslan, Deniz Baran and
Mar{\c{s}}an, B{\"u}{\c{s}}ra and
Bakay, {\"O}zge and
Ak, Koray and
Y{\i}ld{\i}z, Olcay Taner},
booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://www.aclweb.org/anthology/2020.lrec-1.336",
pages = "2763--2772",
ISBN = "979-10-95546-34-4",
}
- main and types are important when this package will be imported.
"main": "dist/index.js",
"types": "dist/index.d.ts",
- Dependencies should be maximum (not only direct but also indirect references should also be given), everything directly in the code should be given here.
"dependencies": {
"nlptoolkit-corpus": "^1.0.12",
"nlptoolkit-dictionary": "^1.0.14",
"nlptoolkit-morphologicalanalysis": "^1.0.19",
"nlptoolkit-xmlparser": "^1.0.7"
}
- Compiler flags currently includes nodeNext for importing.
"compilerOptions": {
"outDir": "dist",
"module": "nodeNext",
"sourceMap": true,
"noImplicitAny": true,
"removeComments": false,
"declaration": true,
},
- tests, node_modules and dist should be excluded.
"exclude": [
"tests",
"node_modules",
"dist"
]
- Should include all ts classes.
export * from "./CategoryType"
export * from "./InterlingualDependencyType"
export * from "./InterlingualRelation"
export * from "./Literal"
- Add data files to the project folder. Subprojects should include all data files of the parent projects.
- Classes should be defined as exported.
export class JCN extends ICSimilarity{
- Do not forget to comment each function.
/**
* Computes JCN wordnet similarity metric between two synsets.
* @param synSet1 First synset
* @param synSet2 Second synset
* @return JCN wordnet similarity metric between two synsets
*/
computeSimilarity(synSet1: SynSet, synSet2: SynSet): number {
- Function names should follow caml case.
setSynSetId(synSetId: string){
- Write getter and setter methods.
getRelation(index: number): Relation{
setName(name: string){
- Use standard javascript test style.
describe('SimilarityPathTest', function() {
describe('SimilarityPathTest', function() {
it('testComputeSimilarity', function() {
let turkish = new WordNet();
let similarityPath = new SimilarityPath(turkish);
assert.strictEqual(32.0, similarityPath.computeSimilarity(turkish.getSynSetWithId("TUR10-0656390"), turkish.getSynSetWithId("TUR10-0600460")));
assert.strictEqual(13.0, similarityPath.computeSimilarity(turkish.getSynSetWithId("TUR10-0412120"), turkish.getSynSetWithId("TUR10-0755370")));
assert.strictEqual(13.0, similarityPath.computeSimilarity(turkish.getSynSetWithId("TUR10-0195110"), turkish.getSynSetWithId("TUR10-0822980")));
});
});
});
- Enumerated types should be declared with enum.
export enum CategoryType {
MATHEMATICS, SPORT, MUSIC, SLANG, BOTANIC,
PLURAL, MARINE, HISTORY, THEOLOGY, ZOOLOGY,
METAPHOR, PSYCHOLOGY, ASTRONOMY, GEOGRAPHY, GRAMMAR,
MILITARY, PHYSICS, PHILOSOPHY, MEDICAL, THEATER,
ECONOMY, LAW, ANATOMY, GEOMETRY, BUSINESS,
PEDAGOGY, TECHNOLOGY, LOGIC, LITERATURE, CINEMA,
TELEVISION, ARCHITECTURE, TECHNICAL, SOCIOLOGY, BIOLOGY,
CHEMISTRY, GEOLOGY, INFORMATICS, PHYSIOLOGY, METEOROLOGY,
MINERALOGY
}
- If there are multiple constructors for a class, define them as constructor1, constructor2, ..., then from the original constructor call these methods.
constructor1(symbol: any){
constructor2(symbol: any, multipleFile: MultipleFile) {
constructor(symbol: any, multipleFile: MultipleFile = undefined) {
if (multipleFile == undefined){
this.constructor1(symbol);
} else {
this.constructor2(symbol, multipleFile);
}
}
- Importing should be done via import method with referencing the node-modules.
import {Corpus} from "nlptoolkit-corpus/dist/Corpus";
import {Sentence} from "nlptoolkit-corpus/dist/Sentence";
- Use xmlparser package for parsing xml files.
var doc = new XmlDocument("test.xml")
doc.parse()
let root = doc.getFirstChild()
let firstChild = root.getFirstChild()
