Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] hex::dec::lzma_decompress reports an error when decompressing a small buffer #2033

Closed
1 task
AlexGuo1998 opened this issue Dec 27, 2024 · 2 comments
Closed
1 task
Labels
bug Something isn't working

Comments

@AlexGuo1998
Copy link
Contributor

AlexGuo1998 commented Dec 27, 2024

Operating System

Windows

What's the issue you encountered?

When decompressing a short lzma-compressed buffer, hex::dec::lzma_decompress reports an error, but the data is written to the section with an incorrect size of 100 bytes.

How can the issue be reproduced?

  1. Generate an xz compressed file with the following Python code:
import lzma

decompressed = b'123'
compressed = lzma.compress(decompressed, format=lzma.FORMAT_XZ, preset=1)
with open('123.xz', 'wb') as f:
    f.write(compressed)

The hexdump of the result 123.xz file is:

FD 37 7A 58 5A 00 00 04 E6 D6 B4 46 02 00 21 01 10 00 00 00 A8 70 8E 86 01 00 02 31 32 33 00 00 61 C5 1C 07 44 28 23 30 00 01 1B 03 0B 2F B9 10 1F B6 F3 7D 01 00 00 00 00 04 59 5A
  1. Load the file into ImHex, apply the following pattern:
import std.mem;
import std.io;
import hex.dec;

u8 compressed[std::mem::size()] @ 0;
std::mem::Section decompressed = std::mem::create_section("decompressed");
bool ok = hex::dec::lzma_decompress(compressed, decompressed);
std::print("ok={}", ok);
  1. ok=false is printed to the console. There is a "decompressed" section of 100 bytes, of which the first 3 bytes are 31 32 33("123") and others 00.

The expected result should be: ok=true, and the "decompressed" section contains only the first 3 bytes.

ImHex Version

1.36.0

ImHex Build Type

  • Nightly or built from sources

Installation type

MSI

Additional context?

The corresponding code is here:

/* lzma_decompress(compressed_pattern, section_id) */
ContentRegistry::PatternLanguage::addFunction(nsHexDec, "lzma_decompress", FunctionParameterCount::exactly(2), [](Evaluator *evaluator, auto params) -> std::optional<Token::Literal> {
#if IMHEX_FEATURE_ENABLED(LIBLZMA)
auto compressedData = getCompressedData(evaluator, params[0]);
auto &section = evaluator->getSection(params[1].toUnsigned());
lzma_stream stream = LZMA_STREAM_INIT;
if (lzma_auto_decoder(&stream, 0x10000, LZMA_IGNORE_CHECK) != Z_OK) {
return false;
}
section.resize(100);
stream.avail_in = compressedData.size();
stream.avail_out = section.size();
stream.next_in = compressedData.data();
stream.next_out = section.data();
ON_SCOPE_EXIT {
lzma_end(&stream);
};
while (stream.avail_in != 0) {
auto res = lzma_code(&stream, LZMA_RUN);
if (res == LZMA_STREAM_END) {
section.resize(section.size() - stream.avail_out);
break;
}
if (res == LZMA_MEMLIMIT_ERROR) {
auto usage = lzma_memusage(&stream);
lzma_memlimit_set(&stream, usage);
res = lzma_code(&stream, LZMA_RUN);
}
if (res != LZMA_OK)
return false;
if (stream.avail_out != 0)
break;
const auto prevSectionSize = section.size();
section.resize(prevSectionSize * 2);
stream.next_out = section.data() + prevSectionSize;
stream.avail_out = prevSectionSize;
}
return true;
#else
std::ignore = evaluator;
std::ignore = params;
err::E0012.throwError("hex::dec::lzma_decompress is not available. Please recompile ImHex with liblzma support.");
#endif
});

After some debugging, it seems like first lzma_code() returns LZMA_MEMLIMIT_ERROR(6), but after adjusting the limit, it returns LZMA_STREAM_END(1), and is incorrectly treated as an error.


To fix this once and for all, noticing earlier at lzma_auto_decoder()

if (lzma_auto_decoder(&stream, 0x10000, LZMA_IGNORE_CHECK) != Z_OK) {

the decoding memlimit is set to 0x10000 (64KiB) which is too small for any .xz files. According to man xz, common decompression memory requirements are between 1MiB and 65MiB.

We are always increasing the memory limit anyways, so it shouldn't be set in the first place. According to the liblzma doc, UINT64_MAX should be used to disable the limiter. In this way we don't have to process the LZMA_MEMLIMIT_ERROR error.

Or if we want to be safe, set it to something like 1GiB and treat LZMA_MEMLIMIT_ERROR as a real error.

Which one do you prefer? I can write a PR for this.

@AlexGuo1998 AlexGuo1998 added the bug Something isn't working label Dec 27, 2024
@WerWolv
Copy link
Owner

WerWolv commented Dec 27, 2024

Thanks for investigating! Yeah I think I didn't understand the API correctly.
I believe having some limitation is good to have, otherwise it will be pretty easy to crash ImHex with specifically crafted inputs so something like a 1GiB limit sounds like a good idea. A PR would be highly appreciated!

AlexGuo1998 added a commit to AlexGuo1998/ImHex that referenced this issue Dec 31, 2024
As discussed in WerWolv#2033, we should limit the memory usage to a reasonably sane
value (1GiB here), rather than always increasing it when exceeded. We also
print a warning to the console in this case. This also avoided the bug when
decompressing a small buffer.
AlexGuo1998 added a commit to AlexGuo1998/ImHex that referenced this issue Jan 2, 2025
As discussed in WerWolv#2033, we should limit the memory usage to a reasonably sane
value (1GiB here), rather than always increasing it when exceeded. We also
print a warning to the console in this case. This also avoided the bug when
decompressing a small buffer.
@paxcut
Copy link
Contributor

paxcut commented Jan 2, 2025

After some debugging, it seems like first lzma_code() returns LZMA_MEMLIMIT_ERROR(6), but after adjusting the limit, it returns LZMA_STREAM_END(1), and is incorrectly treated as an error.

I was the person who fixed the fact that it was impossible to decompress any file using the previous code. As reported in the PR the problem was caused by not calling lzma_memusage() to adjust the limit using the value returned. That eliminated the error and now lzma implementation is able to decompress any size input. However there is a minimum size file imposed to 100 bytes which in practice makes sense because below 100 bytes the resulting 'compressed' file is actually bigger. LZMA_STREAM_END is an indicator that your compressed file is bigger then the original.

If you decompress a real lzma compressed file you will see the the code returns true and the file will have the correct size, so setting the limit to 1 Gb is not only unnecessary, but will limit the files that can be decompressed just so you can decompress files that should not have been compressed to begin with. How is that an improvement?

WerWolv pushed a commit that referenced this issue Jan 2, 2025
<!--
Please provide as much information as possible about what your PR aims
to do.
PRs with no description will most likely be closed until more
information is provided.
If you're planing on changing fundamental behaviour or add big new
features, please open a GitHub Issue first before starting to work on
it.
If it's not something big and you still want to contact us about it,
feel free to do so !
-->

### Problem description
<!-- Describe the bug that you fixed/feature request that you
implemented, or link to an existing issue describing it -->

See #2033 (`hex::dec::lzma_decompress` reports an error when
decompressing a small buffer).

### Implementation description
<!-- Explain what you did to correct the problem -->

Set the LZMA decompressor memory limit to 1GiB fixed. Print a warning
when exceeded, and abort with returning `false`.

### Screenshots
<!-- If your change is visual, take a screenshot showing it. Ideally,
make before/after sceenshots -->

Normal result when decompressing a small buffer


![image](https://github.com/user-attachments/assets/5b9e6b07-464e-41f6-bdc7-f5b1cd351069)

Warning message when `memlimit` is exceeded: (Set to 64B here for a
demo)
> W: lzma_decompress memory usage 1114168 bytes would exceed the limit
(64 bytes), aborting


![image](https://github.com/user-attachments/assets/04abf3ef-1d29-4846-bb41-214695c7d09c)

### Additional things
<!-- Anything else you would like to say -->

Is the warning wording OK? I'm not a native English speaker so please
change it if you want to.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants