Skip to content

Conversation

@mdw-at-linuxbox
Copy link
Collaborator

This is a "review only" version of copy object encryption fixes "phase 1".

2 commits: first adds all the code changes in, the second moves some object files around for cmake.

missing: a few loose ends in rgw_op.c for derypt. teuthology tests. documentation?

re-encryption will be "phase 2", following the pattern set here for decrypt.

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍

Comment on lines 5731 to 6194
oproc = std::make_unique<RGWCOE_proc_from_filters>(RGWCOE_proc_from_filters(*filter));
return *oproc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RGWCOE_make_filter_pipeline maintains ownership of these filters long after copy_obj_data() returns and its AtomicObjectProcessor destructs. that leaves us with a dangling reference, which may or may not end up being a problem

can we find a way for get_filter() to transfer their ownership to the caller? if everything lived on copy_obj_data()'s stack, the correct order of destruction would happen naturally

that could be some class CopyFilterStack : public DataProcessor that owns these filters, and whose process() method just forwards to the last filter added. get_filter() would return that as a unique_ptr<DataProcessor>

&cb,
this,
s->yield);
} catch (int caught_errno) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if exceptions are really necessary, please throw something that derives from std::exception. we tend to use catch (const std::exception&) as a catch-all which depends on that

there's a std::system_error exception that wraps an error code:

try {
  throw std::system_error(-r, std::system_category());
} catch (const std::system_error& e) {
  // log as e.what()
  op_ret = -e.code().value();
}

copy_object() returns int already, so these exceptions probably could have been caught and converted to error codes long before this

/**
* @brief Read filter when copying data from object to another.
*/
class ObjectFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataProcessor is the filter interface; this one seems more like a DataProcessorFactory maybe?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@mdw-at-linuxbox
Copy link
Collaborator Author

I've pushed a minor update: this takes care of the loose ends, builds, and in very sketchy preliminary testing, seems to do something possibly correct. I've not addressed any of the issues people raised above yet.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jun 18, 2024
@cbodley cbodley removed the stale label Jul 10, 2024
@cbodley
Copy link
Contributor

cbodley commented Jul 22, 2024

ping @mdw-at-linuxbox

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@mdw-at-linuxbox
Copy link
Collaborator Author

I've just updated this PR with my current upstream fixes for copy object encryption. This will need rebasing to the latest "main" to make it current.

I also have a standalone python script to test this, which should be somehow folded into teuthology. That's not quite trivial because it needs some specific things set up in rgw (keystone, storage classes), so the teuthology tooling will need to do that to run the tests.

@github-actions github-actions bot removed the stale label Sep 25, 2024
@cbodley
Copy link
Contributor

cbodley commented Sep 26, 2024

I also have a standalone python script to test this, which should be somehow folded into teuthology. That's not quite trivial because it needs some specific things set up in rgw (keystone, storage classes), so the teuthology tooling will need to do that to run the tests.

ideally this stuff would go in s3-tests with the rest of our sse test coverage. s3-tests does support storage classes, and the rgw suite does know how to create/configure them (ex https://github.com/ceph/ceph/blob/main/qa/suites/rgw/verify/overrides.yaml#L18-L22)

in s3-tests, see test_lifecycle_transition() for an example that uses those storage classes: https://github.com/ceph/s3-tests/blob/master/s3tests_boto3/functional/test_s3.py#L9091-L9092

@clwluvw
Copy link
Member

clwluvw commented Oct 13, 2024

Hi @mdw-at-linuxbox - I've added some tests here ceph/s3-tests#595. Do you think that would be enough for this?

@cbodley
Copy link
Contributor

cbodley commented Nov 19, 2024

ping @mdw-at-linuxbox

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jan 18, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

DoutPrefixProvider *dpp;
boost::optional<RGWGetObj_Decompress> decompress;
bool partial_content = false;
std::map<std::string, std::string> crypt_http_responses; // XXX who consumes?
Copy link
Member

@clwluvw clwluvw Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley - It might be related to this that there is no reference to these HTTP responses to be reflected to the client, so all test assertions are failing due to missing appropriate HTTP responses.

2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:r = {'CopyObjectResult': {'ETag': '"7fc56270e7a70fa81a5935b72eacbe29"', 'LastModified': datetime.datetime(2025, 6, 12, 19,...cle)', ...}, 'HTTPStatusCode': 200, 'HostId': '', 'RequestId': 'tx0000014f2d56ad167c911-00684b24a6-4260-default', ...}}
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:            'assert': lambda r: (
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:>               r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption-customer-algorithm'] == 'AES256' and
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:                r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption-customer-key-md5'] == 'arxBvwY2V4SiOne6yppVPQ=='
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:            )
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:        },
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:        'sse-kms': {
2025-06-12T19:05:26.709 INFO:teuthology.orchestra.run.smithi122.stdout:            'args': {
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:                'ServerSideEncryption': 'aws:kms',
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:                'SSEKMSKeyId': lambda: get_secondary_kms_keyid()
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:            },
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:            'assert': lambda r: (
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:                r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption'] == 'aws:kms' and
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:                r['ResponseMetadata']['HTTPHeaders']['x-amz-server-side-encryption-aws-kms-key-id'] == get_secondary_kms_keyid()
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:            )
2025-06-12T19:05:26.710 INFO:teuthology.orchestra.run.smithi122.stdout:        }
2025-06-12T19:05:26.711 INFO:teuthology.orchestra.run.smithi122.stdout:    }
2025-06-12T19:05:26.711 INFO:teuthology.orchestra.run.smithi122.stdout:E   KeyError: 'x-amz-server-side-encryption-customer-algorithm'

@cbodley
Copy link
Contributor

cbodley commented Jun 18, 2025

@mdw-at-linuxbox
Copy link
Collaborator Author

I pushed a copy of this to the wrong branch (mdw-main-fscrypt, totally unrelated branch) last friday. This is rebased on the latest, and includes fixes for attributes on copy object and completemultipartupload.

When I went through the aws documentation and compared that to what I was generating, these were the two operations that came up short. These fixes should generate attributes consistent with the aws s3 rest api documentation.

The next step is to run this against seenafallah's s3 tests (clwluvw/enc-copy) - I haven't managed to get a clean run of that yet, but I got some indication that what I'm generating may not satisfy those tests. I'm going to look at those tests more closely next to figure out what they're actually expecting, and also how to run just those tests without all the other tests.

@mdw-at-linuxbox
Copy link
Collaborator Author

I've pushed an update here. It fixes a few problems that showed up with enc-copy. While "copy-enc" passed, there were various issues with multipart uploads and sse-c.

So, this is rebased on the latest main, and passed this test against s3-tests branch "clwluvw/enc-copy" for me:
S3TEST_CONF=$HOME/t/s3tst1/s3-tests.conf pytest -k 'copy_enc'
' or lifecycle_transition_encrypted or copy_part_enc'
This was with 3 storage classes, resulting in 744 items, which took 28 minutes to run.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@mdw-at-linuxbox
Copy link
Collaborator Author

I've rebased this to fix the merge conflict, and verified it builds.
I've also pushed this to ceph-ci as "mdw-main-coe-28"

@mdw-at-linuxbox mdw-at-linuxbox changed the title DNM copy object encryption fixes -- phase 1 copy object encryption fixes Sep 24, 2025
@mdw-at-linuxbox mdw-at-linuxbox marked this pull request as ready for review September 24, 2025 15:59
@mdw-at-linuxbox
Copy link
Collaborator Author

I've revised the title and removed the "draft" tag. Not sure what to do about the "pull checklist / verify" check failing, haven't seen that before. Testing this requires the "enc_copy" branch of s3-tests, which has to happen separately (and presumably after) this is committed.

@cbodley
Copy link
Contributor

cbodley commented Sep 24, 2025

i pushed a rebased version of ceph/s3-tests#595 to https://github.com/cbodley/s3-tests/commits/wip-23264, and a suite-branch (based on mdw-main-coe-28) to https://github.com/cbodley/ceph/commits/wip-23264 that points to that s3tests repo/branch

qa pending in https://pulpito.ceph.com/cbodley-2025-09-24_18:46:02-rgw-mdw-main-coe-28-distro-default-gibba/

@cbodley cbodley changed the title copy object encryption fixes rgw: copy object encryption fixes Sep 24, 2025
@cbodley
Copy link
Contributor

cbodley commented Sep 24, 2025

from https://jenkins.ceph.com/job/ceph-pull-requests/167549/

The following tests FAILED:
244 - unittest_rgw_crypto (Failed)

[ FAILED ] 2 tests, listed below:
[ FAILED ] TestRGWCrypto.verify_RGWGetObj_BlockDecrypt_ranges
[ FAILED ] TestRGWCrypto.verify_RGWGetObj_BlockDecrypt_chunks

@cbodley
Copy link
Contributor

cbodley commented Sep 25, 2025

qa pending in https://pulpito.ceph.com/cbodley-2025-09-24_18:46:02-rgw-mdw-main-coe-28-distro-default-gibba/

lots of "Failed to fetch package version" failures, so i started a rerun in https://pulpito.ceph.com/cbodley-2025-09-25_12:52:40-rgw-mdw-main-coe-28-distro-default-gibba/

however, there were a bunch of s3tests failures in the original run

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

If another bug tells the compression filter to decompress more
data than is actually present, the resulting "end_of_buffer"
error was thrown.  The thrown exception unwinds the stack,
including a completion that is pending.  The resulting core dump
indicates a failure with this completion rather than the end of buffer
exception, which is misleading and not useful.

With this change, radosgw does not abort, and instead logs
a somewhat useful message before returning an "unknown" error
to the client.

Fixes: https://tracker.ceph.com/issues/23264

Signed-off-by: Marcus Watts <[email protected]>
This contains code to allow copyobject to copy encrypted objects.

It includes additional data paths to communicate data from the
rest layer down to the sal layer to handle decrypting
objects.  The data paths include logic to use filter chains
from get and put that process encryption and compression.
There are several hacks to deal with quirks of the filter chains.
The "get" path has to propgate flushes around the chain,
because a flush isn't guaranteed to propagate through it.
Also the "get" and "put" chains have conflicting uses of the
buffer list logic, so the buffer list has to be copied so that
they don't step on each other's toes.

Fixes: https://tracker.ceph.com/issues/23264

Signed-off-by: Marcus Watts <[email protected]>
Lifecycle transtion can copy objects to a different storage tier.
When this happens, since the object is repacked, the original
manifest is invalidated.  It is necessary to store a special
"parts_len" attribute to fix this.  There was code in PutObj
to handle this, but that was only used for multisite replication,
it is not used by the lifecycle transisiton code.  This fix
adds similar logic to the lifecycle transition code path to make the
same thing happen.

Fixes: https://tracker.ceph.com/issues/23264

Signed-off-by: Marcus Watts <[email protected]>
While 'STANDARD' is a valid storage class, it is not supposed
to ever be returned when fetching an object.  This change suppresses
storing 'STANDARD' as the attribute value, so that objects
explicitly created with 'STANDARD' will in fact be indistinguishable
from those where it was implicitly set.

Fixes: https://tracker.ceph.com/issues/67786

Signed-off-by: Marcus Watts <[email protected]>
When an object is copied, it should only be depending on data
in the request to determine the storage class, and if it is
not specified, it should default to 'STANDARD'.  In radosgw,
this means that this is another attribute (similar to encryption)
that should not be merged from the source object.

Fixes: https://tracker.ceph.com/issues/67787

Signed-off-by: Marcus Watts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build/ops needs-test pinned Use this label if you want to exempt a PR from being stalled rgw tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants