If I try to update to the latest DBs, I previously got an error that the url from metanetx was wrong. Therefore, I changed these in the __database_processing.py file. However, I still get the following error when the script is trying to decode the chebi_compounds.tsv file with traceback:
20250407_12:01:13 Processing file /xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/downloads/chebi_compounds.tsv
Traceback (most recent call last):
File "/xx/xx/miniconda3/envs/metadraft3/bin/mergem", line 8, in
sys.exit(main())
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/cli.py", line 55, in main
mergem.update_id_mapper()
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__model_handling.py", line 139, in update_id_mapper
build_id_mapping(delete_database_files)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__database_processing.py", line 1000, in build_id_mapping
process_met_file(chebi_compounds_filename, chebi_compounds_names)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__database_processing.py", line 241, in process_met_file
for line in db_file:
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 7629: invalid start byte
I also inspected whether there actually was a 0xad byte in this position by using:
dd if=chebi_compounds.tsv bs=1 skip=7600 count=64 | xxd
But this did not show the byte referenced to. My system is Linux, so it could have to do with the standard encoder using UTF-8? Let me know if you have seen this error before and how to solve it! 😄
If I try to update to the latest DBs, I previously got an error that the url from metanetx was wrong. Therefore, I changed these in the __database_processing.py file. However, I still get the following error when the script is trying to decode the chebi_compounds.tsv file with traceback:
20250407_12:01:13 Processing file /xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/downloads/chebi_compounds.tsv
Traceback (most recent call last):
File "/xx/xx/miniconda3/envs/metadraft3/bin/mergem", line 8, in
sys.exit(main())
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/cli.py", line 55, in main
mergem.update_id_mapper()
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__model_handling.py", line 139, in update_id_mapper
build_id_mapping(delete_database_files)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__database_processing.py", line 1000, in build_id_mapping
process_met_file(chebi_compounds_filename, chebi_compounds_names)
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/site-packages/mergem/__database_processing.py", line 241, in process_met_file
for line in db_file:
File "/xx/xx/miniconda3/envs/metadraft3/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 7629: invalid start byte
I also inspected whether there actually was a 0xad byte in this position by using:
dd if=chebi_compounds.tsv bs=1 skip=7600 count=64 | xxd
But this did not show the byte referenced to. My system is Linux, so it could have to do with the standard encoder using UTF-8? Let me know if you have seen this error before and how to solve it! 😄