-
Notifications
You must be signed in to change notification settings - Fork 6.2k
rgw: copy object encryption fixes #54543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍👍
src/rgw/rgw_op.cc
Outdated
| oproc = std::make_unique<RGWCOE_proc_from_filters>(RGWCOE_proc_from_filters(*filter)); | ||
| return *oproc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RGWCOE_make_filter_pipeline maintains ownership of these filters long after copy_obj_data() returns and its AtomicObjectProcessor destructs. that leaves us with a dangling reference, which may or may not end up being a problem
can we find a way for get_filter() to transfer their ownership to the caller? if everything lived on copy_obj_data()'s stack, the correct order of destruction would happen naturally
that could be some class CopyFilterStack : public DataProcessor that owns these filters, and whose process() method just forwards to the last filter added. get_filter() would return that as a unique_ptr<DataProcessor>
| &cb, | ||
| this, | ||
| s->yield); | ||
| } catch (int caught_errno) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if exceptions are really necessary, please throw something that derives from std::exception. we tend to use catch (const std::exception&) as a catch-all which depends on that
there's a std::system_error exception that wraps an error code:
try {
throw std::system_error(-r, std::system_category());
} catch (const std::system_error& e) {
// log as e.what()
op_ret = -e.code().value();
}copy_object() returns int already, so these exceptions probably could have been caught and converted to error codes long before this
| /** | ||
| * @brief Read filter when copying data from object to another. | ||
| */ | ||
| class ObjectFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataProcessor is the filter interface; this one seems more like a DataProcessorFactory maybe?
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
b450a9d to
54205e9
Compare
|
I've pushed a minor update: this takes care of the loose ends, builds, and in very sketchy preliminary testing, seems to do something possibly correct. I've not addressed any of the issues people raised above yet. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
ping @mdw-at-linuxbox |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
54205e9 to
3d0d30a
Compare
|
I've just updated this PR with my current upstream fixes for copy object encryption. This will need rebasing to the latest "main" to make it current. I also have a standalone python script to test this, which should be somehow folded into teuthology. That's not quite trivial because it needs some specific things set up in rgw (keystone, storage classes), so the teuthology tooling will need to do that to run the tests. |
ideally this stuff would go in s3-tests with the rest of our sse test coverage. s3-tests does support storage classes, and the rgw suite does know how to create/configure them (ex https://github.com/ceph/ceph/blob/main/qa/suites/rgw/verify/overrides.yaml#L18-L22) in s3-tests, see |
|
Hi @mdw-at-linuxbox - I've added some tests here ceph/s3-tests#595. Do you think that would be enough for this? |
|
ping @mdw-at-linuxbox |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
daca7e9 to
2292920
Compare
src/rgw/rgw_op.cc
Outdated
| DoutPrefixProvider *dpp; | ||
| boost::optional<RGWGetObj_Decompress> decompress; | ||
| bool partial_content = false; | ||
| std::map<std::string, std::string> crypt_http_responses; // XXX who consumes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbodley - It might be related to this that there is no reference to these HTTP responses to be reflected to the client, so all test assertions are failing due to missing appropriate HTTP responses.
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:r = {'CopyObjectResult': {'ETag': '"7fc56270e7a70fa81a5935b72eacbe29"', 'LastModified': datetime.datetime(2025, 6, 12, 19,...cle)', ...}, 'HTTPStatusCode': 200, 'HostId': '', 'RequestId': 'tx0000014f2d56ad167c911-00684b24a6-4260-default', ...}}
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout: 'assert': lambda r: (
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:> r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption-customer-algorithm'] == 'AES256' and
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout: r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption-customer-key-md5'] == 'arxBvwY2V4SiOne6yppVPQ=='
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout: )
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout: },
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout: 'sse-kms': {
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout: 'args': {
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: 'ServerSideEncryption': 'aws:kms',
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: 'SSEKMSKeyId': lambda: get_secondary_kms_keyid()
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: },
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: 'assert': lambda r: (
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption'] == 'aws:kms' and
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption-aws-kms-key-id'] == get_secondary_kms_keyid()
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: )
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout: }
2025-06-12T19:05:26.711 INFO:teuthology.orchestra.run.smithi122.stdout: }
2025-06-12T19:05:26.711 INFO:teuthology.orchestra.run.smithi122.stdout:E KeyError: 'x-amz-server-side-encryption-customer-algorithm'
|
rescheduled the rgw:crypt subsuite in https://pulpito.ceph.com/cbodley-2025-06-18_16:49:07-rgw:crypt-mdw-main-coe-20-distro-default-smithi/ using packages from https://shaman.ceph.com/builds/ceph/mdw-main-coe-20/ against the updated tests from ceph/s3-tests#595 |
2292920 to
64d7c69
Compare
|
I pushed a copy of this to the wrong branch (mdw-main-fscrypt, totally unrelated branch) last friday. This is rebased on the latest, and includes fixes for attributes on copy object and completemultipartupload. When I went through the aws documentation and compared that to what I was generating, these were the two operations that came up short. These fixes should generate attributes consistent with the aws s3 rest api documentation. The next step is to run this against seenafallah's s3 tests (clwluvw/enc-copy) - I haven't managed to get a clean run of that yet, but I got some indication that what I'm generating may not satisfy those tests. I'm going to look at those tests more closely next to figure out what they're actually expecting, and also how to run just those tests without all the other tests. |
64d7c69 to
04a33e2
Compare
|
I've pushed an update here. It fixes a few problems that showed up with enc-copy. While "copy-enc" passed, there were various issues with multipart uploads and sse-c. So, this is rebased on the latest main, and passed this test against s3-tests branch "clwluvw/enc-copy" for me: |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
04a33e2 to
d0e5d3a
Compare
|
I've rebased this to fix the merge conflict, and verified it builds. |
|
I've revised the title and removed the "draft" tag. Not sure what to do about the "pull checklist / verify" check failing, haven't seen that before. Testing this requires the "enc_copy" branch of s3-tests, which has to happen separately (and presumably after) this is committed. |
|
i pushed a rebased version of ceph/s3-tests#595 to https://github.com/cbodley/s3-tests/commits/wip-23264, and a suite-branch (based on mdw-main-coe-28) to https://github.com/cbodley/ceph/commits/wip-23264 that points to that s3tests repo/branch qa pending in https://pulpito.ceph.com/cbodley-2025-09-24_18:46:02-rgw-mdw-main-coe-28-distro-default-gibba/ |
|
from https://jenkins.ceph.com/job/ceph-pull-requests/167549/
|
lots of "Failed to fetch package version" failures, so i started a rerun in https://pulpito.ceph.com/cbodley-2025-09-25_12:52:40-rgw-mdw-main-coe-28-distro-default-gibba/ however, there were a bunch of s3tests failures in the original run |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
If another bug tells the compression filter to decompress more data than is actually present, the resulting "end_of_buffer" error was thrown. The thrown exception unwinds the stack, including a completion that is pending. The resulting core dump indicates a failure with this completion rather than the end of buffer exception, which is misleading and not useful. With this change, radosgw does not abort, and instead logs a somewhat useful message before returning an "unknown" error to the client. Fixes: https://tracker.ceph.com/issues/23264 Signed-off-by: Marcus Watts <[email protected]>
This contains code to allow copyobject to copy encrypted objects. It includes additional data paths to communicate data from the rest layer down to the sal layer to handle decrypting objects. The data paths include logic to use filter chains from get and put that process encryption and compression. There are several hacks to deal with quirks of the filter chains. The "get" path has to propgate flushes around the chain, because a flush isn't guaranteed to propagate through it. Also the "get" and "put" chains have conflicting uses of the buffer list logic, so the buffer list has to be copied so that they don't step on each other's toes. Fixes: https://tracker.ceph.com/issues/23264 Signed-off-by: Marcus Watts <[email protected]>
Lifecycle transtion can copy objects to a different storage tier. When this happens, since the object is repacked, the original manifest is invalidated. It is necessary to store a special "parts_len" attribute to fix this. There was code in PutObj to handle this, but that was only used for multisite replication, it is not used by the lifecycle transisiton code. This fix adds similar logic to the lifecycle transition code path to make the same thing happen. Fixes: https://tracker.ceph.com/issues/23264 Signed-off-by: Marcus Watts <[email protected]>
While 'STANDARD' is a valid storage class, it is not supposed to ever be returned when fetching an object. This change suppresses storing 'STANDARD' as the attribute value, so that objects explicitly created with 'STANDARD' will in fact be indistinguishable from those where it was implicitly set. Fixes: https://tracker.ceph.com/issues/67786 Signed-off-by: Marcus Watts <[email protected]>
When an object is copied, it should only be depending on data in the request to determine the storage class, and if it is not specified, it should default to 'STANDARD'. In radosgw, this means that this is another attribute (similar to encryption) that should not be merged from the source object. Fixes: https://tracker.ceph.com/issues/67787 Signed-off-by: Marcus Watts <[email protected]>
d0e5d3a to
e0d2a64
Compare
This is a "review only" version of copy object encryption fixes "phase 1".
2 commits: first adds all the code changes in, the second moves some object files around for cmake.
missing: a few loose ends in rgw_op.c for derypt. teuthology tests. documentation?
re-encryption will be "phase 2", following the pattern set here for decrypt.