Skip to content

Getting errors when updating #9

@coenberns

Description

@coenberns

If I try to update to the latest DBs, I previously got an error that the url from metanetx was wrong. Therefore, I changed these in the __database_processing.py file. However, I still get the following error when the script is trying to decode the chebi_compounds.tsv file with traceback:

20250407_12:01:13 Processing file /xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/downloads/chebi_compounds.tsv
Traceback (most recent call last):
File "/xx/xx/miniconda3/envs/metadraft3/bin/mergem", line 8, in
sys.exit(main())
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/cli.py", line 55, in main
mergem.update_id_mapper()
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__model_handling.py", line 139, in update_id_mapper
build_id_mapping(delete_database_files)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__database_processing.py", line 1000, in build_id_mapping
process_met_file(chebi_compounds_filename, chebi_compounds_names)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__database_processing.py", line 241, in process_met_file
for line in db_file:
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 7629: invalid start byte

I also inspected whether there actually was a 0xad byte in this position by using:

dd if=chebi_compounds.tsv bs=1 skip=7600 count=64 | xxd

But this did not show the byte referenced to. My system is Linux, so it could have to do with the standard encoder using UTF-8? Let me know if you have seen this error before and how to solve it! 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions