Skip to content

Commit

Permalink
xapian: update stopwords from Xapian upstream repo
Browse files Browse the repository at this point in the history
  • Loading branch information
rsto committed May 15, 2019
1 parent 3902425 commit 107f9f0
Show file tree
Hide file tree
Showing 29 changed files with 2,673 additions and 2,122 deletions.
2 changes: 1 addition & 1 deletion imap/xapian_wrap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ static Xapian::Stopper *get_stopper()
struct buf buf = BUF_INITIALIZER;
buf_setcstr(&buf, swpath);
// XXX doesn't play nice with WIN32 paths
buf_appendcstr(&buf, "/english.list");
buf_appendcstr(&buf, "/english.txt");

// Open the stopword file
errno = 0;
Expand Down
31 changes: 21 additions & 10 deletions languages/stopwords/arabic.list → languages/stopwords/arabic.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@

| An Arabic stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
| (This is not an official Snowball stop word list, basically inspired from
| Arabic Stop Words Project)

إذ
إذا
إذما
Expand Down Expand Up @@ -25,18 +31,20 @@
إليكن
أم
أما
أما
إما
أن
إن
أنا
إنا
أنا
أنت
أنتم
أنتما
أنتن
إنما
إنه
أنى
أنى
آه
آها
أو
Expand All @@ -45,17 +53,19 @@
أوه
آي
أي
أيها
إي
أين
أين
أينما
إيه
أيها
بخ
بس
بعد
بعض
بك
بكم
بكم
بكما
بكن
بل
Expand All @@ -70,8 +80,8 @@
بهما
بهن
بي
بيد
بين
بيد | though
تلك
تلكم
تلكما
Expand Down Expand Up @@ -108,7 +118,7 @@
ذينك
ريث
سوف
سوى
سوى | except
شتان
عدا
عسى
Expand All @@ -119,7 +129,7 @@
عما
عن
عند
غير
غير | except
فإذا
فإن
فلا
Expand All @@ -144,13 +154,14 @@
كليكما
كليهما
كم
كم
كما
كي
كيت
كيف
كيفما
لا
لاسيما
لاسيما | especially
لدى
لست
لستم
Expand Down Expand Up @@ -188,15 +199,15 @@
ليسوا
ما
ماذا
متى
متى | when
مذ
مع
مما
ممن
من
منذ
منه
منها
منذ
مه
مهما
نحن
Expand Down Expand Up @@ -229,10 +240,10 @@
هيا
هيت
هيهات
وإذ
وإذا
والذي
والذين
وإذ
وإذا
وإن
ولا
ولكن
Expand Down
94 changes: 0 additions & 94 deletions languages/stopwords/danish.list

This file was deleted.

102 changes: 102 additions & 0 deletions languages/stopwords/danish.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@

| A Danish stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.

| This is a ranked list (commonest to rarest) of stopwords derived from
| a large text sample.


og | and
i | in
jeg | I
det | that (dem. pronoun)/it (pers. pronoun)
at | that (in front of a sentence)/to (with infinitive)
en | a/an
den | it (pers. pronoun)/that (dem. pronoun)
til | to/at/for/until/against/by/of/into, more
er | present tense of "to be"
som | who, as
på | on/upon/in/on/at/to/after/of/with/for, on
de | they
med | with/by/in, along
han | he
af | of/by/from/off/for/in/with/on, off
for | at/for/to/from/by/of/ago, in front/before, because
ikke | not
der | who/which, there/those
var | past tense of "to be"
mig | me/myself
sig | oneself/himself/herself/itself/themselves
men | but
et | a/an/one, one (number), someone/somebody/one
har | present tense of "to have"
om | round/about/for/in/a, about/around/down, if
vi | we
min | my
havde | past tense of "to have"
ham | him
hun | she
nu | now
over | over/above/across/by/beyond/past/on/about, over/past
da | then, when/as/since
fra | from/off/since, off, since
du | you
ud | out
sin | his/her/its/one's
dem | them
os | us/ourselves
op | up
man | you/one
hans | his
hvor | where
eller | or
hvad | what
skal | must/shall etc.
selv | myself/yourself/herself/ourselves etc., even
her | here
alle | all/everyone/everybody etc.
vil | will (verb)
blev | past tense of "to stay/to remain/to get/to become"
kunne | could
ind | in
når | when
være | present tense of "to be"
dog | however/yet/after all
noget | something
ville | would
jo | you know/you see (adv), yes
deres | their/theirs
efter | after/behind/according to/for/by/from, later/afterwards
ned | down
skulle | should
denne | this
end | than
dette | this
mit | my/mine
også | also
under | under/beneath/below/during, below/underneath
have | have
dig | you
anden | other
hende | her
mine | my
alt | everything
meget | much/very, plenty of
sit | his, her, its, one's
sine | his, her, its, one's
vor | our
mod | against
disse | these
hvis | if
din | your/yours
nogle | some
hos | by/at
blive | be/become
mange | many
ad | by/through
bliver | present tense of "to be/to become"
hendes | her/hers
været | be
thi | for (conj)
jer | you
sådan | such, like this/like that
Loading

0 comments on commit 107f9f0

Please sign in to comment.