The Romanian Social Media Sexist Language UD Treebank is a reference treebank in Universal Dependencies (UD) format for Romanian sexist language. Currently small, it comprises a subset of tweets sourced from CoRoSeOf.
The Romanian Social Media Sexist Language UD Treebank is a specialized linguistic resource focused on analyzing sexist language in Romanian social media. It contains 210 annotated tweets selected from CoRoSeOf, providing a unique insight into social media discourse. As part of the UD_Romanian-TueCL project, it fills a significant gap in Romanian linguistic resources by being the first UD treebank to specifically address sexist language in the social media genre. The project is work-in-progress and the treebank is being updated on a regular basis.
The creation of this treebank was made possible through the initiative of Dr. Çağrı Çöltekin, lecturer @University of Tuebingen, as part of a course project focused on low-resourced languages. While Romanian is not a low-resourced language, it lacked a UD-compliant social media corpus. Diana C. Hoefels constructed and annotated the corpus, while Dr. Çağrı Çöltekin provided reviewing, consultation on the guidelines, and authored the documentation.
- For a quantative and qualitative analysis of the sourced samples, refer to CoRoSeOf - An Annotated Corpus of Romanian Sexist and Offensive Tweets.
- 2024-05-15 v2.14
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.14 License: CC BY-SA 4.0 Includes text: yes Genre: social Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Hoefels, Diana; Çöltekin, Çağrı Contributing: here Contact: [email protected] or [email protected], [email protected] ===============================================================================