Extracting events based on frequency of words found in news corpus. The event extraction system operates on simple logic that verbs and non proper nouns that are less frequently found in general purpose corpus are distinct word that indicates probabale events.
The steps are as follows
- Remove stopwords
- Ignore proper nouns
- Verify frequency of both original word in text and its root word to decide if this word is unique in general corpus and viable candidate to indicate its a event word.
This advantage of this approach is we do not need to use expensive hardware with gpus or complex computation heavy deep learning algorithms. The frequency dictionary is calculated just once and takes less than 10 minutes. Also, the inference process is extremely fast as this is normalized frequency is stored in dictionary. So both traditional sense training and inference is extremly fast.
Pros of this approach:
- Fast training
- Low resource requirements.
- Fast inference
- Performance is comparable to deep learning based one but is still waiting for through testing.
Cons:
- The dataset used in this project may have limitation.