Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_ref: allow zero size reads at any offset #758

Merged
merged 1 commit into from
Feb 22, 2025

Conversation

th0rex
Copy link
Contributor

@th0rex th0rex commented Feb 20, 2025

Hi, this PR changes the implementation of read_bytes_at in ReadRef for &[u8] to allow zero size reads for any offset, even if that offset is out of bounds.

This makes it match the behavior of ReadRef for ReadCache, and also makes e.g. ElfSegment::bytes work for zero sized segments as seen in real world ELF files.

Please let me know if you have any comments or would like me to change something, or if you want to keep the current behavior, I'm also happy to just carry this patch in a fork, or to add this behavior in a custom ReadRef implementation only, if there is some reason for disallowing zero sized reads that are out of bounds.

Commit message copied below with more detail:


Some ELF files do have segments with size zero, at offsets that are invalid / out of bounds. This commit changes the default ReadRef implementation for &[u8] to permit reads that are out of bounds, if the requested length is zero, by returning &[].

Examples of such files are many ARM ELF files in e.g. debian repositories, for example

/usr/lib/debug/.build-id/bd/dd2eaf3326ffce6d173666b5f3e62a376e123a.debug in package libgedit-gfls-1-0-dbgsym_0.2.1-2_arm64.deb:

~ cp /usr/lib/debug/.build-id/bd/dd2eaf3326ffce6d173666b5f3e62a376e123a.debug /tmp/bad_file.elf
~ r2 /tmp/bad_file.elf
[0x0000027c]> iI~binsz
binsz    64184
[0x0000027c]> iSS
[Segments]

nth paddr        size vaddr        vsize perm type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00000000  0x27c 0x00000000  0x55e0 -r-x MAP  LOAD0
1   0x0000fab8    0x0 0x0001fab8   0x5c0 -rw- MAP  LOAD1
2   0x0000fab8    0x0 0x0001fb28   0x240 -rw- MAP  DYNAMIC
[...]
8   0x0000fab8    0x0 0x0001fab8   0x548 -r-- MAP  GNU_RELRO

This file has multiple segments starting at 0xfab8 (the end of the file), with a physical size of 0, while the file size is also 0xfab8 (64184).

The current ReadRef implementation for &[u8] causes ProgramHeader::data to fail for them, causing e.g. ElfSegment::bytes to also fail.

While a caller could handle this error, or one could provide their own type implementing ReadRef this way, I think it makes sense to do this by default, since no data is actually being read out of bounds, but with the current implementation we still try to index out of bounds and then error.

Notably with this change this also matches the implementation of ReadRef for ReadCache which already has a similar check implemented for read_bytes_at.

@philipc
Copy link
Contributor

philipc commented Feb 21, 2025

This seems to happen whenever objcopy --only-keep-debug is used. It would be useful to add a test case for this by running objcopy on https://github.com/gimli-rs/object-testfiles/blob/master/elf/base

I'm not opposed to this fix, especially since it matches ReadCache, but I'm not certain about what the ideal behaviour should be. For example, we could also fix this particular case by doing the check in ProgramHeader::data. This would allow us to handle this commonly occurring case while still detecting files that are corrupt in other ways.

It's decision between whether this crate is only meant to handle valid files, or whether it should do best effort parsing for any file, no matter how corrupt, or whether it should be something in between.

th0rex added a commit to th0rex/object-testfiles that referenced this pull request Feb 21, 2025
Created by running

```
cd elf
objcopy --only-keep-debug base debug-only
```

To be used to add a test for gimli-rs/object#758
@th0rex
Copy link
Contributor Author

th0rex commented Feb 21, 2025

It would be useful to add a test case for this by running objcopy on https://github.com/gimli-rs/object-testfiles/blob/master/elf/base

Made a PR for adding the file gimli-rs/object-testfiles#21. Will update the submodule and add a test once that PR is merged.

but I'm not certain about what the ideal behaviour should be

I'm not sure either.

For example, we could also fix this particular case by doing the check in ProgramHeader::data

Yes, this would work as well, and I can do that if you want me to.

It's decision between whether this crate is only meant to handle valid files, or whether it should do best effort parsing for any file, no matter how corrupt, or whether it should be something in between.

Yeah, and it also depends on how defines "valid". Is an offset that is out of bounds but never read from (because size is zero) invalid?

I'm happy with any decision, we can also just change this in ProgramHeader::data (though I suspect there might be other places that could allow for zero sized data and similarly be valid, e.g., sections would suffer from the same issue, so I think fixing it in ReadRef makes more sense, especially since it aligns the behavior with what the ReadCache impl is doing).

th0rex added a commit to th0rex/object-testfiles that referenced this pull request Feb 21, 2025
Created by running

```
cd elf
objcopy --only-keep-debug base debug-only
```

To be used to add a test for gimli-rs/object#758
@philipc
Copy link
Contributor

philipc commented Feb 21, 2025

I'm fine with it being fixed in ReadRef.

Some ELF files do have segments with size zero, at offsets that are
invalid / out of bounds. This commit changes the default `ReadRef`
implementation for `&[u8]` to permit reads that are out of bounds, if
the requested length is zero, by returning `&[]`.

Examples of such files are many ARM ELF files in e.g. debian
repositories, for example

`/usr/lib/debug/.build-id/bd/dd2eaf3326ffce6d173666b5f3e62a376e123a.debug`
in package `libgedit-gfls-1-0-dbgsym_0.2.1-2_arm64.deb`:

```
~ cp /usr/lib/debug/.build-id/bd/dd2eaf3326ffce6d173666b5f3e62a376e123a.debug /tmp/bad_file.elf
~ r2 /tmp/bad_file.elf
[0x0000027c]> iI~binsz
binsz    64184
[0x0000027c]> iSS
[Segments]

nth paddr        size vaddr        vsize perm type name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00000000  0x27c 0x00000000  0x55e0 -r-x MAP  LOAD0
1   0x0000fab8    0x0 0x0001fab8   0x5c0 -rw- MAP  LOAD1
2   0x0000fab8    0x0 0x0001fb28   0x240 -rw- MAP  DYNAMIC
[...]
8   0x0000fab8    0x0 0x0001fab8   0x548 -r-- MAP  GNU_RELRO
```

This file has multiple segments starting at `0xfab8` (the end of the
file), with a physical size of `0`, while the file size is also `0xfab8`
(64184).

The current `ReadRef` implementation for `&[u8]` causes
`ProgramHeader::data` to fail for them, causing e.g. `ElfSegment::bytes`
to also fail.

While a caller could handle this error, or one could provide their own
type implementing `ReadRef` this way, I think it makes sense to do this
by default, since no data is *actually* being read out of bounds, but
with the current implementation we still try to index out of bounds and
then error.

Notably with this change this also matches the implementation of
`ReadRef` for `ReadCache` which already has a similar check implemented
for `read_bytes_at`.
@th0rex
Copy link
Contributor Author

th0rex commented Feb 21, 2025

Done, added a unit test

@philipc philipc merged commit e764a2d into gimli-rs:master Feb 22, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants