- 
                Notifications
    
You must be signed in to change notification settings  - Fork 32
 
Description
I think this is something people know, but it is not explicitly stated: Can a record have multiple extension-fields of the same type?
Section 5.1 of the 1.1 spec says "WARC named fields of the same type shall not be repeated in the same WARC record (for example, a WARC record shall not have several WARC-Date or several WARC-Target-URI), except as noted (e.g. WARC-Concurrent-To)." However it makes no explicit mention of whether multiple extension-fields of the same type are allowed. It does say "WARC processing software shall ignore fields with unrecognized names" which could mean it is allowed.
I think the answer is yes. But this is not stated anywhere. An example of multiple extension-fields of the same type on the same record that I've found so far is #42, the proposed WARC-Protocol field. That shows examples using 2 fields (for TLS and HTTP), but presumably at some point this will become a named field and have language in the spec like WARC-Concurrent-To does, leaving this question unanswered.
A reason to explicitly discuss multiple extension-fields of the same type is to avoid implementation issues. I suspect most WARC parsing software implements field parsing for extension-fields with a dictionary/hash, keyed on the field name, where duplicate keys are not allowed. Implementations will behave differently (first value wins, last field value wins, etc.). I personally hit this when parsing records with multiple WARC-Protocol fields.
Perhaps it should be explicitly stated, or does the " ignore fields with unrecognized names" cover this?