Skip to content

warcio cannot write wet files #150

@mraslann

Description

@mraslann

I am trying to use warcio to write WET files that hold text-only conversion records, but I am not able to find a way to write a record using warcio without having to make a live web request.

This is a sample of what I am trying to do:

def create_wet_file(url):
    with open(f"BigData/test1.wet.gz", 'wb+') as wet:
        writer = WARCWriter(wet, gzip=True)
        try:
            resp = requests.get(url, headers={'Accept-Encoding': 'identity, gzip', 'Content-Type': 'text/html; charset=utf-8'}, stream=True)
            headers_list = resp.raw.headers.items()
            headers = StatusAndHeaders('200 OK', headers_list, protocol='HTTP/1.0')
            record = writer.create_warc_record(uri=url, record_type='response', payload=resp.raw, http_headers=headers)
            writer.write_record(record)
        except requests.exceptions.ConnectionError as e:
            print(e)

Is this a limitation in warcio, or is there a way around it?

Thank you for any pointers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions