Skip to content

Multiple extension-fields of the same type on the same record? #95

@acidus99

Description

@acidus99

I think this is something people know, but it is not explicitly stated: Can a record have multiple extension-fields of the same type?

Section 5.1 of the 1.1 spec says "WARC named fields of the same type shall not be repeated in the same WARC record (for example, a WARC record shall not have several WARC-Date or several WARC-Target-URI), except as noted (e.g. WARC-Concurrent-To)." However it makes no explicit mention of whether multiple extension-fields of the same type are allowed. It does say "WARC processing software shall ignore fields with unrecognized names" which could mean it is allowed.

I think the answer is yes. But this is not stated anywhere. An example of multiple extension-fields of the same type on the same record that I've found so far is #42, the proposed WARC-Protocol field. That shows examples using 2 fields (for TLS and HTTP), but presumably at some point this will become a named field and have language in the spec like WARC-Concurrent-To does, leaving this question unanswered.

A reason to explicitly discuss multiple extension-fields of the same type is to avoid implementation issues. I suspect most WARC parsing software implements field parsing for extension-fields with a dictionary/hash, keyed on the field name, where duplicate keys are not allowed. Implementations will behave differently (first value wins, last field value wins, etc.). I personally hit this when parsing records with multiple WARC-Protocol fields.

Perhaps it should be explicitly stated, or does the " ignore fields with unrecognized names" cover this?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions