This is a text spotting model that simultaneously detects and recognizes text. The model detects symbol sequences separated by space and performs recognition without a dictionary. The model is built on top of the Mask-RCNN framework with additional attention-based text recognition head.
Symbols set is alphanumeric: 0123456789abcdefghijklmnopqrstuvwxyz
.
This model is 2D attention-based GRU decoder of text recognition head.
Metric | Value |
---|---|
Word spotting hmean ICDAR2015, without a dictionary | 59.04% |
GFlops | 0.002 |
MParams | 0.273 |
Source framework | PyTorch* |
Hmean Word spotting is defined and measured according to the Incidental Scene Text (ICDAR2015) challenge.
- Name:
encoder_outputs
, shape: [1x(28*28)x256]. Encoded text recognition features. - Name:
prev_symbol
, shape: [1x1]. Index in alphabet of previously generated symbol. - Name:
prev_hidden
, shape: [1x1x256]. Previous hidden state of GRU.
- Name:
output
, shape: [1x38]. Encoded text recognition features. - Name:
hidden
, shape: [1x1x256]. Current hidden state of GRU.
[*] Other names and brands may be claimed as the property of others.