Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bssl: Could not obtain wait context for job #287

Open
sharksforarms opened this issue Nov 13, 2023 · 4 comments
Open

bssl: Could not obtain wait context for job #287

sharksforarms opened this issue Nov 13, 2023 · 4 comments

Comments

@sharksforarms
Copy link
Contributor

sharksforarms commented Nov 13, 2023

Hello, I've observed a sigsegv under the described conditions: an error path which does not clear the current job/tlv.

I added some instrumentation to my application with the intel qat engine:

// rsa decrypt error
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489b8
h2o[151965]: [/build/deps/neverbleed/neverbleed.c] bssl_offload_decrypt:RSA decrypt failure:-:error:04000070:RSA routines:OPENSSL_internal:DATA_LEN_NOT_EQUAL_TO_MOD_LEN ossl err  qat_hw_rsa.c 1664
h2o[151965]: [openssl-privsep] RSA decrypt failure
h2o[151965]: DEBUG [bssl_offload_free_request:698] nb finish job:0x6030002489b8
h2o[151965]: DEBUG [bssl_qat_async_finish_job:500] clearing ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268

// rsa sign
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489e8 job:0x6060001e32c8 waitctx:0x6060001e3328
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489e8
h2o[151965]: DEBUG [bssl_offload_digestsign:1418] rsa:0x611000059c88
h2o[151965]: DEBUG [qat_rsa_priv_enc:1021] here
h2o[151965]: DEBUG [qat_rsa_priv_enc:1106] here error
h2o[151965]: DEBUG [qat_rsa_priv_sign:1635] len:0, ASYNC_get_current_job(): 0x6060001e3208, (nil)

h2o[151965]: received SIGSEGV: si_code=1 (SEGV_MAPERR: Address not mapped to object) si_addr=(nil)
h2o[151965]: received SIGSEGV (cont): rip=(nil) rsp=0x7fe1bab6b688 rbp=(nil)

We can see that ASYNC_get_current_job(): 0x6060001e3208 which is the previous job, the failed rsa decryption.

Here's how the API is being used at a high level which causes the issue

// rsa decrypt
async_ctx = bssl_qat_async_start_job()
    bssl_qat_async_save_current_job // --> set's the current job
meth = bssl_engine_get_rsa_method();
meth->decrypt(...)
    // in_len != rsa_size --> OPENSSL_PUT_ERROR, return error
bssl_qat_async_finish_job(async_ctx)
// note: the current job is still set here...

// rsa sign
async_ctx = bssl_qat_async_start_job()
meth = bssl_engine_get_rsa_method();
meth->sign_raw(...) 
    qat_rsa_priv_sign
        qat_rsa_priv_enc
          qat_rsa_decrypt
              qat_init_op_done
                  job = ASYNC_get_current_job() // this is the previous job, which has been freed/zero'd
              qat_setup_async_event_notification(job)
                  ASYNC_get_wait_ctx(job) is NULL --> WARN("Could not obtain wait context for job\n");
        ASYNC_current_job_last_check_and_get // this is the previous job, which has been freed/zero'd
            job->tlv_destructor // --> SIGSEGV

Engine Logs:

[DEBUG][4211829.817191] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:198:qat_rsa_decrypt()] - Started
[WARN][4211829.817195] PID [304707] Thread [7ff7a2c5e700][qat_events.c:129:qat_setup_async_event_notification()] Could not obtain wait context for job
[WARN][4211829.817200] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:210:qat_rsa_decrypt()] Failed to setup async event notifications
[WARN][4211829.817206] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:1017:qat_rsa_priv_enc()] Failure in qat_rsa_decrypt  fallback = 0

I can see two potential solutions:

  1. (diff) one would be to call _ret = ASYNC_current_job_last_check_and_get(); before those earlier returns
  2. call tlv_destructor (if set) in bssl_qat_async_finish_job

Any thoughts?

@Yogaraj-Alamenda
Copy link
Contributor

@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.

@sharksforarms
Copy link
Contributor Author

@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.

PR is here: https://github.com/intel/QAT_Engine/pull/291/files

Thanks!

@Yogaraj-Alamenda
Copy link
Contributor

@sharksforarms Looks out to be an issue in the scenario. Can you please raise PR for the changes mentioned in the diff. We will evaluate and let you know the feedback.

PR is here: https://github.com/intel/QAT_Engine/pull/291/files

Thanks!

Thanks We will check and let you know.

@krithikx
Copy link
Contributor

@sharksforarms , We followed the steps shared by you and we are not able to see the issue in our reproduction. Do you have any test application to reproduce the scenario? Also what is the boringSSL version you have used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants