Description
Using the demo code from the website (except I don't specify the document date), and the input "Call 090-1234-5678", I get this output:
1234-5678 [from char offset 5 to 18] --> (1234-XX-XX,5678-XX-XX,PT38955312H)
The Timex attributes are:
{tid=t3, value=PT38955312H, type=DURATION, beginPoint=t1, endPoint=t2}
This happens to be a Japanese phone number, so I understand that it doesn't work right out-of-the-box. However, for extracting \d\d\d\d-\d\d\d\d
I would expect the resulting timex to be a range, from year 1234 to year 5678, not a duration consisting of the number of hours in that length of time. The referenced Timex spec calls this an "anchored duration", and shows it being broken into multiple annotations (page 13), which would be easier to handle.
Even if this does turn out to be incorrectable or unfixable, can you help me find the source of it so I can delete it? I see plenty of duration rules in english.sutime, but they also seem to require other words to match ( like "the" and "year").