[Bug] `hex::dec::lzma_decompress` reports an error when decompressing a small buffer #2033

AlexGuo1998 · 2024-12-27T07:01:10Z

Operating System

Windows

What's the issue you encountered?

When decompressing a short lzma-compressed buffer, hex::dec::lzma_decompress reports an error, but the data is written to the section with an incorrect size of 100 bytes.

How can the issue be reproduced?

Generate an xz compressed file with the following Python code:

import lzma

decompressed = b'123'
compressed = lzma.compress(decompressed, format=lzma.FORMAT_XZ, preset=1)
with open('123.xz', 'wb') as f:
    f.write(compressed)

The hexdump of the result 123.xz file is:

FD 37 7A 58 5A 00 00 04 E6 D6 B4 46 02 00 21 01 10 00 00 00 A8 70 8E 86 01 00 02 31 32 33 00 00 61 C5 1C 07 44 28 23 30 00 01 1B 03 0B 2F B9 10 1F B6 F3 7D 01 00 00 00 00 04 59 5A

Load the file into ImHex, apply the following pattern:

import std.mem;
import std.io;
import hex.dec;

u8 compressed[std::mem::size()] @ 0;
std::mem::Section decompressed = std::mem::create_section("decompressed");
bool ok = hex::dec::lzma_decompress(compressed, decompressed);
std::print("ok={}", ok);

ok=false is printed to the console. There is a "decompressed" section of 100 bytes, of which the first 3 bytes are 31 32 33("123") and others 00.

The expected result should be: ok=true, and the "decompressed" section contains only the first 3 bytes.

ImHex Version

1.36.0

ImHex Build Type

Nightly or built from sources

Installation type

MSI

Additional context?

The corresponding code is here:

ImHex/plugins/decompress/source/content/pl_functions.cpp

Lines 151 to 204 in 6005af1

    
           /* lzma_decompress(compressed_pattern, section_id) */ 
        
           ContentRegistry::PatternLanguage::addFunction(nsHexDec, "lzma_decompress", FunctionParameterCount::exactly(2), [](Evaluator *evaluator, auto params) -> std::optional<Token::Literal> { 
        
               #if IMHEX_FEATURE_ENABLED(LIBLZMA) 
        
                   auto compressedData = getCompressedData(evaluator, params[0]); 
        
                   auto &section = evaluator->getSection(params[1].toUnsigned()); 
        
                   lzma_stream stream = LZMA_STREAM_INIT; 
        
                   if (lzma_auto_decoder(&stream, 0x10000, LZMA_IGNORE_CHECK) != Z_OK) { 
        
                       return false; 
        
                   } 
        
                   section.resize(100); 
        
                   stream.avail_in = compressedData.size(); 
        
                   stream.avail_out = section.size(); 
        
                   stream.next_in = compressedData.data(); 
        
                   stream.next_out = section.data(); 
        
                   ON_SCOPE_EXIT { 
        
                       lzma_end(&stream); 
        
                   }; 
        
                   while (stream.avail_in != 0) { 
        
                       auto res = lzma_code(&stream, LZMA_RUN); 
        
                       if (res == LZMA_STREAM_END) { 
        
                           section.resize(section.size() - stream.avail_out); 
        
                           break; 
        
                       } 
        
                       if (res == LZMA_MEMLIMIT_ERROR) { 
        
                           auto usage = lzma_memusage(&stream); 
        
                           lzma_memlimit_set(&stream, usage); 
        
                           res = lzma_code(&stream, LZMA_RUN); 
        
                       } 
        
                       if (res != LZMA_OK) 
        
                           return false; 
        
                       if (stream.avail_out != 0) 
        
                           break; 
        
                       const auto prevSectionSize = section.size(); 
        
                       section.resize(prevSectionSize * 2); 
        
                       stream.next_out  = section.data() + prevSectionSize; 
        
                       stream.avail_out = prevSectionSize; 
        
                   } 
        
                   return true; 
        
               #else 
        
                   std::ignore = evaluator; 
        
                   std::ignore = params; 
        
                   err::E0012.throwError("hex::dec::lzma_decompress is not available. Please recompile ImHex with liblzma support."); 
        
               #endif 
        
           });

After some debugging, it seems like first lzma_code() returns LZMA_MEMLIMIT_ERROR(6), but after adjusting the limit, it returns LZMA_STREAM_END(1), and is incorrectly treated as an error.

To fix this once and for all, noticing earlier at lzma_auto_decoder()

ImHex/plugins/decompress/source/content/pl_functions.cpp

Line 158 in 6005af1

if (lzma_auto_decoder(&stream, 0x10000, LZMA_IGNORE_CHECK) != Z_OK) {

the decoding memlimit is set to 0x10000 (64KiB) which is too small for any .xz files. According to man xz, common decompression memory requirements are between 1MiB and 65MiB.

We are always increasing the memory limit anyways, so it shouldn't be set in the first place. According to the liblzma doc, UINT64_MAX should be used to disable the limiter. In this way we don't have to process the LZMA_MEMLIMIT_ERROR error.

Or if we want to be safe, set it to something like 1GiB and treat LZMA_MEMLIMIT_ERROR as a real error.

Which one do you prefer? I can write a PR for this.

The text was updated successfully, but these errors were encountered:

WerWolv · 2024-12-27T09:12:25Z

Thanks for investigating! Yeah I think I didn't understand the API correctly.
I believe having some limitation is good to have, otherwise it will be pretty easy to crash ImHex with specifically crafted inputs so something like a 1GiB limit sounds like a good idea. A PR would be highly appreciated!

As discussed in WerWolv#2033, we should limit the memory usage to a reasonably sane value (1GiB here), rather than always increasing it when exceeded. We also print a warning to the console in this case. This also avoided the bug when decompressing a small buffer.

paxcut · 2025-01-02T03:52:23Z

After some debugging, it seems like first lzma_code() returns LZMA_MEMLIMIT_ERROR(6), but after adjusting the limit, it returns LZMA_STREAM_END(1), and is incorrectly treated as an error.

I was the person who fixed the fact that it was impossible to decompress any file using the previous code. As reported in the PR the problem was caused by not calling lzma_memusage() to adjust the limit using the value returned. That eliminated the error and now lzma implementation is able to decompress any size input. However there is a minimum size file imposed to 100 bytes which in practice makes sense because below 100 bytes the resulting 'compressed' file is actually bigger. LZMA_STREAM_END is an indicator that your compressed file is bigger then the original.

If you decompress a real lzma compressed file you will see the the code returns true and the file will have the correct size, so setting the limit to 1 Gb is not only unnecessary, but will limit the files that can be decompressed just so you can decompress files that should not have been compressed to begin with. How is that an improvement?

### Problem description  See #2033 (`hex::dec::lzma_decompress` reports an error when decompressing a small buffer). ### Implementation description  Set the LZMA decompressor memory limit to 1GiB fixed. Print a warning when exceeded, and abort with returning `false`. ### Screenshots  Normal result when decompressing a small buffer ![image](https://github.com/user-attachments/assets/5b9e6b07-464e-41f6-bdc7-f5b1cd351069) Warning message when `memlimit` is exceeded: (Set to 64B here for a demo) > W: lzma_decompress memory usage 1114168 bytes would exceed the limit (64 bytes), aborting ![image](https://github.com/user-attachments/assets/04abf3ef-1d29-4846-bb41-214695c7d09c) ### Additional things  Is the warning wording OK? I'm not a native English speaker so please change it if you want to.

AlexGuo1998 added the bug Something isn't working label Dec 27, 2024

AlexGuo1998 mentioned this issue Jan 2, 2025

fix: Set LZMA decompressor memory limit to 1GiB fixed #2044

Merged

AlexGuo1998 closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] `hex::dec::lzma_decompress` reports an error when decompressing a small buffer #2033

[Bug] `hex::dec::lzma_decompress` reports an error when decompressing a small buffer #2033

AlexGuo1998 commented Dec 27, 2024 •

edited

Loading

WerWolv commented Dec 27, 2024

paxcut commented Jan 2, 2025

[Bug] hex::dec::lzma_decompress reports an error when decompressing a small buffer #2033

[Bug] hex::dec::lzma_decompress reports an error when decompressing a small buffer #2033

Comments

AlexGuo1998 commented Dec 27, 2024 • edited Loading

Operating System

What's the issue you encountered?

How can the issue be reproduced?

ImHex Version

ImHex Build Type

Installation type

Additional context?

WerWolv commented Dec 27, 2024

paxcut commented Jan 2, 2025

[Bug] `hex::dec::lzma_decompress` reports an error when decompressing a small buffer #2033

[Bug] `hex::dec::lzma_decompress` reports an error when decompressing a small buffer #2033

AlexGuo1998 commented Dec 27, 2024 •

edited

Loading