Hebrew NLP Models, Tools, Commercial and Online Services

Contents

1 Models and Tools
- 1.1 Models and Tools by Task
- 1.2 Models by Type
2 Commercial and Online Services

1 Models and Tools

1.1 Models and Tools by Task

1.1.1 Text Preprocessing and Morphological Analysis

1.1.1.1 Tokenization

Yonti Levin's Hebrew Tokenizer [Python] {MIT} - A very simple python tokenizer for Hebrew text. No batteries included - No dependencies needed!
Hebrew Tokenizer {?} - Eyal Gruss's Hebrew tokenizer. A field-tested Hebrew tokenizer for dirty texts (ben-yehuda project, bible, cc100, mc4, opensubs, oscar, twitter) focused on multi-word expression extraction.
RFTokenizer [Python] {Apache License 2.0} - A highly accurate morphological segmenter to break up complex word forms

1.1.1.2 Transliteration

TaatikNet {CC BY-SA 3.0} - Sequence-to-sequence learning for Hebrew transliteration (converting between Hebrew text and Latin transliteration). See also post and an interactive demo.

1.1.1.3 Morphological Segmentation

RFTokenizer [Python] {Apache License 2.0} - A highly accurate morphological segmenter to break up complex word forms
HebPipe [Python] {Apache License 2.0} - End-to-end pipeline for Hebrew NLP using off the shelf tools, including morphological analysis, tagging, lemmatization, parsing and more.
YAP morpho-syntactic parser [Go] {Apache License 2.0} - Morphological Analysis, disambiguation and dependency Parser. Morphological Analyzer relies on the BGU Lexicon. [original repository] Demo
DictaBERT-seg {CC BY 4.0} - A fine-tuned model for prefix segmentation task.
OtoBERT {CC BY 4.0} - Designed specifically for identifying suffixed verbal forms in Modern Hebrew.

1.1.1.4 Morphological Analysis

The MILA Morphological Analysis Tool [?] {GPLv3} - Takes as input undotted Hebrew text (formatted either as plain text or as tokenized XML following MILA's standards). The Analyzer then returns, for each token, all the possible morphological analyses of the token, reflecting part of speech, transliteration, gender, number, definiteness, and possessive suffix. Free for non-commercial use. (temporarily down)
The MILA Morphological Disambiguation Tool [?] {GPLv3} - Takes as input morphologically-analyzed text and uses a Hidden Markov Model (HMM) to assign scores for each analysis, considering contextual information from the rest of the sentence. For a given token, all analyses deemed impossible are given scores of 0; all n analyses deemed possible are given positive scores. Free for non-commercial use. (temporarily down)
BGU Tagger: Morphological Tagging of Hebrew [Java] {?} - Morphological Analysis, Disambiguation.
AlephBERT {Apache License 2.0} - a large pre-trained language model for Modern Hebrew, publicly available, pre-training on Oscar, Texts of Hebrew tweets, all of Hebrew Wikipedia, published by the OnlpLab team. This model obtains state-of-the- art results on the tasks of segmentation and Part of Speech Tagging. Github: https://github.com/OnlpLab/AlephBERT
AlephBERTGimmel {CC0 1.0} - a new Hebrew pre-trained language model, trained on the same dataset as the previous SOTA Hebrew PLM AlephBERT, consisting of approximately 2 billion words of text but with a substantially increased vocabulary of 128,000 word pieces. Published as a collaboration of the OnlpLab team and Dicta. Github: https://github.com/Dicta-Israel-Center-for-Text-Analysis/alephbertgimmel
TavBERT {MIT} - a BERT-style masked language model over character sequences, published by Omri Keren, Tal Avinari, Prof. Reut Tsarfaty and Dr. Omer Levy.
Verb Inflector [Java] {Apache License 2.0} - A generation mechanism, created as part of Eran Tomer's (erantom@gmail.com) Master thesis, which produces vocalized and morphologically tagged Hebrew verbs given a non-vocalized verb in base-form and an indication of which pattern the verb follows.
HebPipe [Python] {Apache License 2.0} - End-to-end pipeline for Hebrew NLP using off the shelf tools, including morphological analysis, tagging, lemmatization, parsing and more.
YAP morpho-syntactic parser [Go] {Apache License 2.0} - Morphological Analysis, disambiguation and dependency Parser. Morphological Analyzer relies on the BGU Lexicon. [original repository] Demo
SPMRL to UD {Apache License 2.0} - Converts YAP's output from the SPMRL scheme to UD v2.
HebMorph [Lucene] {AGPL-3.0} - An open-source effort to make Hebrew properly searchable by various IR software libraries. Includes Hebrew Analyzer for Lucene.
Hspell [?] {AGPL-3.0} - Free Hebrew linguistic project including spell checker and morphological analyzer. HspellPy [Python] {AGPL-3.0} - Python wrapper for Hspell.
DictaBERT-morph {CC BY 4.0} - A fine-tuned model for mophological tagging task.
OtoBERT {CC BY 4.0} - Designed specifically for identifying suffixed verbal forms in Modern Hebrew.

1.1.1.5 Part-of-speech (POS) Tagging

AlephBERT {Apache License 2.0} - a large pre-trained language model for Modern Hebrew, publicly available, pre-training on Oscar, Texts of Hebrew tweets, all of Hebrew Wikipedia, published by the OnlpLab team. This model obtains state-of-the- art results on the tasks of segmentation and Part of Speech Tagging. Github: https://github.com/OnlpLab/AlephBERT
AlephBERTGimmel {CC0 1.0} - a new Hebrew pre-trained language model, trained on the same dataset as the previous SOTA Hebrew PLM AlephBERT, consisting od approximiately 2 billion words of text but with a substantially increased vocabulary of 128,000 word pieces. Published as a collaboration of the OnlpLab team and Dicta. Github: https://github.com/Dicta-Israel-Center-for-Text-Analysis/alephbertgimmel
TavBERT {MIT} - a BERT-style masked language model over character sequences, published by Omri Keren, Tal Avinari, Prof. Reut Tsarfaty and Dr. Omer Levy.
The MILA Morphological Analysis Tool [?] {GPLv3} - Takes as input undotted Hebrew text (formatted either as plain text or as tokenized XML following MILA's standards). The Analyzer then returns, for each token, all the possible morphological analyses of the token, reflecting part of speech, transliteration, gender, number, definiteness, and possessive suffix. Free for non-commercial use. (temporarily down)
HebPipe [Python] {Apache License 2.0} - End-to-end pipeline for Hebrew NLP using off the shelf tools, including morphological analysis, tagging, lemmatization, parsing and more
YAP morpho-syntactic parser [Go] {Apache License 2.0} - Morphological Analysis, disambiguation and dependency Parser. Morphological Analyzer relies on the BGU Lexicon. [original repository] Demo
DictaBERT-morph {CC BY 4.0} - A fine-tuned model for mophological tagging task.

1.1.1.6 Stemming and Lemmatization

HebPipe [Python] {Apache License 2.0} - End-to-end pipeline for Hebrew NLP using off the shelf tools, including morphological analysis, tagging, lemmatization, parsing and more.
YAP morpho-syntactic parser [Go] {Apache License 2.0} - Morphological Analysis, disambiguation and dependency Parser. Morphological Analyzer relies on the BGU Lexicon. [original repository] Demo
OtoBERT {CC BY 4.0} - Designed specifically for identifying suffixed verbal forms in Modern Hebrew.

1.1.1.7 Spell Checking and Correction

Shtey Shekel {MIT} - Wikiproject for correcting grammar mistakes. (Heuristic) positive annotions can be derived from query.
Hspell [?] {AGPL-3.0} - Free Hebrew linguistic project including spell checker and morphological analyzer. HspellPy [Python] {AGPL-3.0} - Python wrapper for Hspell.

1.1.1.8 Diacritization/Vocalization

Nakdan (Paper) - Tool for Automatic and semi-automatic Nikud for Hebrew texts. Avi Shmidman, Shaltiel Shmidman, Moshe Koppel, and Yoav Goldberg. 2020. Nakdan: Professional Hebrew diacritizer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 197–203, Online. Association for Computational Linguistics.
Nakdimon (Paper , code , data) - Hebrew diacritizer. Elazar Gershuni and Yuval Pinter: Restoring Hebrew Diacritics Without a Dictionary. Demo in Replicate.
UNIKUD {MIT} - Morris Alper's open-source tool for adding vowel signs (Nikud) to Hebrew text, uses no rule-based logic, built with a CANINE transformer network. An interactive demo is available at https://huggingface.co/spaces/malper/unikud Blog post: https://towardsdatascience.com/unikud-adding-vowels-to-hebrew-text-with-deep-learning-powered-by-dagshub-56d238e22d3f .
Hebrew OCR with Nikud [Python] {?} - A program to convert Hebrew text files (without Nikud) to text files with the correct Nikud. Developed by Adi Oz and Vered Shani.
Hebrew Punctuation Model {Apache License 2.0} - A fine-tuned version of AlephBERT, designed to restore punctuation in Hebrew spoken language transcripts. It is specifically trained as a post-processing step for Automatic Speech Recognition (ASR) outputs, where punctuation is often missing in raw transcriptions.

1.1.1.9 Stopwords Removal

1.1.1.10 Language modeling

Legal-HeBERT {?} - a BERT model for Hebrew legal and legislative domains. It is intended to improve the legal NLP research and tools development in Hebrew. Avichay Chriqui, Dr. Inbal Yahav Shenberger and Dr. Ittai Bar-Siman-Tov release two versions of Legal-HeBERT: The first version is a fine-tuned model of HeBERT applied on legal and legislative documents. The second version uses HeBERT's architecture guidlines to train a BERT model from scratch.

1.1.1.11 Text Normalization

NeMo-text-processing {Apache License 2.0} - Verbit extended NeMo-text-processing python package with WFST-based Hebrew inverse text normalization (ITN). ITN is a part of Automatic Speech Recognition (ASR) post-processing pipeline and can be used to convert spoken to written form to improve text readability.

1.1.2 Text Analysis

1.1.2.1 Question Answering (QA)

HeRo {?} - RoBERTa based language model for Hebrew; Fine-tuned for sentiment analysis, named entity recognition and question answering.

1.1.2.2 Sentiment Analysis

HeRo {?} - RoBERTa based language model for Hebrew; Fine-tuned for sentiment analysis, named entity recognition and question answering.
AlephBERT {Apache License 2.0} - a large pre-trained language model for Modern Hebrew, publicly available, pre-training on Oscar, Texts of Hebrew tweets, all of Hebrew Wikipedia, published by the OnlpLab team. Github: https://github.com/OnlpLab/AlephBERT
AlephBERTGimmel {CC0 1.0} - a new Hebrew pre-trained language model, trained on the same dataset as the previous SOTA Hebrew PLM AlephBERT, consisting od approximiately 2 billion words of text but with a substantially increased vocabulary of 128,000 word pieces. Published as a collaboration of the OnlpLab team and Dicta. Github: https://github.com/Dicta-Israel-Center-for-Text-Analysis/alephbertgimmel
Neural Sentiment Analyzer for Modern Hebrew [?] {MIT} - This code and dataset provide an established benchmark for neural sentiment analysis for Modern Hebrew.
HeBERT {MIT} - HeBERT is a Hebrew pretrained language model for Polarity Analysis and Emotion Recognition, published by Dr. Inbal Yahav Shenberger and Avichay Chriqui. It is based on Google's BERT architecture and it is BERT-Base config. HeBert was trained on three dataset: OSCAR, A Hebrew dump of Wikipedia, Emotion User Generated Content (UGC) data that was collected for the purpose of this study. The model was evaluated on downstream tasks: HebEMO - emotion recognition model and sentiment analysis. (https://huggingface.co/avichr/heBERT?fbclid=IwAR2Lo9pkN5HLZmtFiFwcIDWyXR9gyP646pyFzNSUUP_djalAkewvB9p8E_o)

1.1.2.3 Emotion Detection

Hebrew Psychological Lexicons {Apache License 2.0} - Easy-to-use Python interface for Hebrew clinical psychology text analysis. Useful for various psychology applications such as detecting emotional state, well being, relationship quality in conversation, identifying topics (e.g., family, work) and many more.
HeBERT {MIT} - HeBERT is a Hebrew pretrained language model for Polarity Analysis and Emotion Recognition, published by Dr. Inbal Yahav Shenberger and Avichay Chriqui. It is based on Google's BERT architecture and it is BERT-Base config. HeBert was trained on three dataset: OSCAR, A Hebrew dump of Wikipedia, Emotion User Generated Content (UGC) data that was collected for the purpose of this study. The model was evaluated on downstream tasks: HebEMO - emotion recognition model and sentiment analysis. (https://huggingface.co/avichr/heBERT?fbclid=IwAR2Lo9pkN5HLZmtFiFwcIDWyXR9gyP646pyFzNSUUP_djalAkewvB9p8E_o)

1.1.2.4 Text Summarization

Summarization Experiments for Hebrew {?} - sequence-to-sequence models (mT5 models) training for Hebrew summarization.

1.1.2.5 Text Classification

LongHeRo {?} - State-of-the-art Longformer language model for Hebrew.
Legal-HeBERT {?} - a BERT model for Hebrew legal and legislative domains. It is intended to improve the legal NLP research and tools development in Hebrew. Avichay Chriqui, Dr. Inbal Yahav Shenberger and Dr. Ittai Bar-Siman-Tov release two versions of Legal-HeBERT: The first version is a fine-tuned model of HeBERT applied on legal and legislative documents. The second version uses HeBERT's architecture guidlines to train a BERT model from scratch.
Universal Language Model Fine-tuning for Text Classification (ULMFiT) in Hebrew - The weights (e.g. a trained model) for a Hebrew version for Howard's and Ruder's ULMFiT model. Trained on the Hebrew Wikipedia corpus.

1.1.2.6 Topic Classification

Hebrew Psychological Lexicons {Apache License 2.0} - Easy-to-use Python interface for Hebrew clinical psychology text analysis. Useful for various psychology applications such as detecting emotional state, well being, relationship quality in conversation, identifying topics (e.g., family, work) and many more.

1.1.2.7 Topic Modeling

BGU NLP - LemLDA: an LDA Package for Hebrew [?] {GPLv3} - The package is based on Heinrich's java implementation of collapsed Gibbs sampling with an extra variable to model the generative nature of lemmas in Hebrew.

1.1.2.8 Irony/Sarcasm Detection

1.1.2.9 Discourse Analysis

1.1.2.10 Dialogue Modeling

1.1.3 Information Extraction

1.1.3.1 Named Entity Recognition (NER)

HeRo {?} - RoBERTa based language model for Hebrew, present state-of-the-art results on sentiment analysis, named entity recognition and question answering.
AlephBERT {Apache License 2.0} - a large pre-trained language model for Modern Hebrew, publicly available, pre-training on Oscar, Texts of Hebrew tweets, all of Hebrew Wikipedia, published by the OnlpLab team. This model obtains state-of-the-art results on the tasks of segmentation, Part of Speech Tagging, Named Entity Recognition, and Sentiment Analysis. Github: https://github.com/OnlpLab/AlephBERT
AlephBERTGimmel {CC0 1.0} - a new Hebrew pre-trained language model, trained on the same dataset as the previous SOTA Hebrew PLM AlephBERT, consisting od approximiately 2 billion words of text but with a substantially increased vocabulary of 128,000 word pieces. Published as a collaboration of the OnlpLab team and Dicta. Github: https://github.com/Dicta-Israel-Center-for-Text-Analysis/alephbertgimmel
TavBERT {MIT} - a BERT-style masked language model over character sequences, published by Omri Keren, Tal Avinari, Prof. Reut Tsarfaty and Dr. Omer Levy.
Neural Modeling for Named Entities and Morphology (NEMO2) {Apache License 2.0} - OnlpLab's code and models for neural modeling of Hebrew NER. Described in the TACL paper Neural Modeling for Named Entities and Morphology (NEMO2).
MDTEL {?} - Yonatan Bitton's code that recognizes medical entities in a Hebrew text.
HebSpacy {MIT} - A custom spaCy pipeline for Hebrew text including a transformer-based multitask NER model that recognizes 16 entity types in Hebrew, including GPE, PER, LOC and ORG.
HebSafeHarbor {MIT} - A de-identification toolkit for clinical text in Hebrew. Demo
HebSafeHarbor_Clalit_Validation {MIT} - A de-identification toolkit for clinical text in Hebrew. An improved version of Microsoft's HebSafeHarbor project.

1.1.3.2 Semantic Role Labeling (SRL)

1.1.3.3 Temporal Information Extraction

HebSafeHarbor {MIT} - A de-identification toolkit for clinical text in Hebrew. Demo

1.1.3.4 Event Extraction

1.1.3.5 Coreference Resolution

HebPipe [Python] {Apache License 2.0} - End-to-end pipeline for Hebrew NLP using off the shelf tools, including morphological analysis, tagging, lemmatization, parsing and more.

1.1.3.6 Relation Extraction

1.1.4 Speech and Image Processing

1.1.4.1 Speech Recognition

1.1.4.2 Speech Synthesis

HebTTS [Python] {Apache License 2.0} - A language modeling Diacritics (`Niqqud’) free Text-To-Speech (TTS) approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer.

1.1.4.3 Text-to-Speech (TTS)

HebTTS [Python] {Apache License 2.0} - A language modeling Diacritics (`Niqqud’) free Text-To-Speech (TTS) approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer.

1.1.4.4 Speech-to-Text (STT)

The Automatic Hebrew Transcriber - Automatically transcribes text from Hebrew audio and video files. (down, link not found)
ivrit.ai Free and Open Transcription Tool - This website enables users to transcribe easily Hebrew audio files for free using ivrit.ai models, fostering accessibility and innovation in language technology.

1.1.4.5 Optical Character Recognition (OCR)

Text-Fabric [Python] {CC BY-NC 4.0} - A Python package for browsing and processing ancient corpora, focused on the Hebrew Bible Database.
Hebrew OCR with Nikud [Python] {?} - A program to convert Hebrew text files (without Nikud) to text files with the correct Nikud. Developed by Adi Oz and Vered Shani.

1.1.4.6 Language Generation

HebMorph [Lucene] {AGPL-3.0} - An open-source effort to make Hebrew properly searchable by various IR software libraries. Includes Hebrew Analyzer for Lucene.
Hebrew-Mistral-7B-200K {Apache License 2.0} - An open-source Large Language Model (LLM) pretrained in Hebrew and English and created by yam Peleg. It has been pretrained with 7B billion parameters and with 200K context length, based on Mistral-7B-v1.0 from Mistral. It has an extended hebrew tokenizer with 64,000 tokens and is continuesly pretrained from Mistral-7B on tokens in both English and Hebrew. The resulting model is a powerful general-purpose language model suitable for a wide range of natural language processing tasks, with a focus on Hebrew language understanding and generation.
Dicta-LM 2.0 Collection {Apache License 2.0} - Generative language models specifically optimized for Hebrew.

1.1.4.7 Machine Translation

word2word {Apache License 2.0} - Easy-to-use Python interface for accessing top-k word translations and for building a new bilingual lexicon from a custom parallel corpus.
HeArBERT {?} - A bilingual BERT for Arabic and Hebrew, pretrained on the respective parts of the OSCAR corpus.

1.2 Models by Type

1.2.1 Pre-Trained Language Models

AlephBERT {Apache License 2.0} - a large pre-trained language model for Modern Hebrew, publicly available, pre-training on Oscar, Texts of Hebrew tweets, all of Hebrew Wikipedia, published by the OnlpLab team. This model obtains state-of-the- art results on the tasks of segmentation, Part of Speech Tagging, Named Entity Recognition, and Sentiment Analysis. Github: https://github.com/OnlpLab/AlephBERT
AlephBERTGimmel {CC0 1.0} - a new Hebrew pre-trained language model, trained on the same dataset as the previous SOTA Hebrew PLM AlephBERT, consisting od approximiately 2 billion words of text but with a substantially increased vocabulary of 128,000 word pieces. Published as a collaboration of the OnlpLab team and Dicta. Github: https://github.com/Dicta-Israel-Center-for-Text-Analysis/alephbertgimmel
HeBERT {MIT} - HeBERT is a Hebrew pretrained language model for Polarity Analysis and Emotion Recognition, published by Dr. Inbal Yahav Shenberger and Avichay Chriqui. It is based on Google's BERT architecture and it is BERT-Base config. HeBert was trained on three dataset: OSCAR, A Hebrew dump of Wikipedia, Emotion User Generated Content (UGC) data that was collected for the purpose of this study. The model was evaluated on downstream tasks: HebEMO - emotion recognition model and sentiment analysis. (https://huggingface.co/avichr/heBERT?fbclid=IwAR2Lo9pkN5HLZmtFiFwcIDWyXR9gyP646pyFzNSUUP_djalAkewvB9p8E_o)
TavBERT {MIT} - a BERT-style masked language model over character sequences, published by Omri Keren, Tal Avinari, Prof. Reut Tsarfaty and Dr. Omer Levy.
BEREL {?} - BERT Embeddings for Rabbinic-Encoded Language - DICTA's pre-trained language model (PLM) for Rabbinic Hebrew.
Legal-HeBERT {?} - a BERT model for Hebrew legal and legislative domains. It is intended to improve the legal NLP research and tools development in Hebrew. Avichay Chriqui, Dr. Inbal Yahav Shenberger and Dr. Ittai Bar-Siman-Tov release two versions of Legal-HeBERT: The first version is a fine-tuned model of HeBERT applied on legal and legislative documents. The second version uses HeBERT's architecture guidlines to train a BERT model from scratch.
DictaBERT {CC BY 4.0} - A base model pretrained with the masked-language-modeling objective.
Criminal Sentence Classification {OpenRAIL} - This project classifies key aspects of criminal cases within the Israeli legal framework. The project leverages a few-shot learning approach for accurate sentence classification relevant to sentencing decisions.
MsBERT {CC BY 4.0} - A pretrained dedicated BERT model, dubbed MsBERT (short for: Manuscript BERT), designed from the ground up to handle Hebrew manuscript text. MsBERT substantially outperforms all existing Hebrew BERT models regarding the prediction of missing words in fragmentary Hebrew manuscript transcriptions in multiple genres, as well as regarding the task of differentiating between quoted passages and exegetical elaborations. We provide MsBERT for free download and unrestricted use, and we also provide an interactive and user-friendly website to allow manuscript scholars to leverage the power of MsBERT in their scholarly work of reconstructing fragmentary Hebrew manuscripts.

1.2.2 Fine-Tuned Language Models

TavBERT {MIT} - a BERT-style masked language model over character sequences, published by Omri Keren, Tal Avinari, Prof. Reut Tsarfaty and Dr. Omer Levy.
Legal-HeBERT {?} - a BERT model for Hebrew legal and legislative domains. It is intended to improve the legal NLP research and tools development in Hebrew. Avichay Chriqui, Dr. Inbal Yahav Shenberger and Dr. Ittai Bar-Siman-Tov release two versions of Legal-HeBERT: The first version is a fine-tuned model of HeBERT applied on legal and legislative documents. The second version uses HeBERT's architecture guidlines to train a BERT model from scratch.
Universal Language Model Fine-tuning for Text Classification (ULMFiT) in Hebrew - The weights (e.g. a trained model) for a Hebrew version for Howard's and Ruder's ULMFiT model. Trained on the Hebrew Wikipedia corpus.

1.2.3 Multilingual Models

BERT's multilingual model - Trained (also) on Hebrew.
Universal Language Model Fine-tuning for Text Classification (ULMFiT) in Hebrew - The weights (e.g. a trained model) for a Hebrew version for Howard's and Ruder's ULMFiT model. Trained on the Hebrew Wikipedia corpus.

1.2.4 PipelinesParsers

HebPipe [Python] {Apache License 2.0} - End-to-end pipeline for Hebrew NLP using off the shelf tools, including morphological analysis, tagging, lemmatization, parsing and more
YAP morpho-syntactic parser [Go] {Apache License 2.0} - Morphological Analysis, disambiguation and dependency Parser. Morphological Analyzer relies on the BGU Lexicon. [original repository] Demo
SPMRL to UD {Apache License 2.0} - Converts YAP's output from the SPMRL scheme to UD v2.
HebSpacy {MIT} - A custom spaCy pipeline for Hebrew text including a transformer-based multitask NER model that recognizes 16 entity types in Hebrew, including GPE, PER, LOC and ORG.
HebSafeHarbor {MIT} - A de-identification toolkit for clinical text in Hebrew. Demo

1.2.5 Causal Language Models (CLM)

Hebrew GPT neo {MIT} - Doron Adler's Hebrew text generation model based on EleutherAI's gpt-neo.

2 Commercial and Online Services

DICTA {CC-BY-SA 4.0} - Analytical tools for Jewish texts. They also have a GitHub organization.
wordfreq 3.0.3 {MIT} - wordfreq is a Python library for looking up the frequencies of words in 44 languages, including Hebrew. The Hebrew data is based on Wikipedia, OPUS OpenSubtitles 2018 and SUBTLEX, Google Books Ngrams 2012, Web text from OSCAR and Twitter.
Eyfo - A commercial engine for search and entity tagging in Hebrew.
Melingo's ICA (Intelligent Content Analysis) - A text analysis and textual categorized entity extraction API for Hebrew, Arabic and Farsi texts.
Genius - Automatic analysis of free text in Hebrew.
AlmaReader - Online text-to-speech service for Hebrew.
Amnon The Transcriber - a WhatsApp bot that receives a voice note and transcribe it to text.
Callee - a WhatsApp bot that receives a voice note, transcribes it to text also summarize it (as a text).
verbit.ai - Transcription.
Text Analytics for health containers
Hebrew-Nlp
HebMorph [Lucene] {AGPL-3.0} - An open-source effort to make Hebrew properly searchable by various IR software libraries. Includes Hebrew Analyzer for Lucene.

Files

models_tools_services.rst

Latest commit

History

models_tools_services.rst

File metadata and controls

Hebrew NLP Models, Tools, Commercial and Online Services