Skip to content

bssl: Could not obtain wait context for job #287

Open
@sharksforarms

Description

@sharksforarms

Hello, I've observed a sigsegv under the described conditions: an error path which does not clear the current job/tlv.

I added some instrumentation to my application with the intel qat engine:

// rsa decrypt error
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489b8
h2o[151965]: [/build/deps/neverbleed/neverbleed.c] bssl_offload_decrypt:RSA decrypt failure:-:error:04000070:RSA routines:OPENSSL_internal:DATA_LEN_NOT_EQUAL_TO_MOD_LEN ossl err  qat_hw_rsa.c 1664
h2o[151965]: [openssl-privsep] RSA decrypt failure
h2o[151965]: DEBUG [bssl_offload_free_request:698] nb finish job:0x6030002489b8
h2o[151965]: DEBUG [bssl_qat_async_finish_job:500] clearing ctx:0x6030002489b8 job:0x6060001e3208 waitctx:0x6060001e3268

// rsa sign
h2o[151965]: DEBUG [bssl_qat_async_start_job:479] saving job ctx:0x6030002489e8 job:0x6060001e32c8 waitctx:0x6060001e3328
h2o[151965]: DEBUG [bssl_offload_create_request:1362] nb job:0x6030002489e8
h2o[151965]: DEBUG [bssl_offload_digestsign:1418] rsa:0x611000059c88
h2o[151965]: DEBUG [qat_rsa_priv_enc:1021] here
h2o[151965]: DEBUG [qat_rsa_priv_enc:1106] here error
h2o[151965]: DEBUG [qat_rsa_priv_sign:1635] len:0, ASYNC_get_current_job(): 0x6060001e3208, (nil)

h2o[151965]: received SIGSEGV: si_code=1 (SEGV_MAPERR: Address not mapped to object) si_addr=(nil)
h2o[151965]: received SIGSEGV (cont): rip=(nil) rsp=0x7fe1bab6b688 rbp=(nil)

We can see that ASYNC_get_current_job(): 0x6060001e3208 which is the previous job, the failed rsa decryption.

Here's how the API is being used at a high level which causes the issue

// rsa decrypt
async_ctx = bssl_qat_async_start_job()
    bssl_qat_async_save_current_job // --> set's the current job
meth = bssl_engine_get_rsa_method();
meth->decrypt(...)
    // in_len != rsa_size --> OPENSSL_PUT_ERROR, return error
bssl_qat_async_finish_job(async_ctx)
// note: the current job is still set here...

// rsa sign
async_ctx = bssl_qat_async_start_job()
meth = bssl_engine_get_rsa_method();
meth->sign_raw(...) 
    qat_rsa_priv_sign
        qat_rsa_priv_enc
          qat_rsa_decrypt
              qat_init_op_done
                  job = ASYNC_get_current_job() // this is the previous job, which has been freed/zero'd
              qat_setup_async_event_notification(job)
                  ASYNC_get_wait_ctx(job) is NULL --> WARN("Could not obtain wait context for job\n");
        ASYNC_current_job_last_check_and_get // this is the previous job, which has been freed/zero'd
            job->tlv_destructor // --> SIGSEGV

Engine Logs:

[DEBUG][4211829.817191] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:198:qat_rsa_decrypt()] - Started
[WARN][4211829.817195] PID [304707] Thread [7ff7a2c5e700][qat_events.c:129:qat_setup_async_event_notification()] Could not obtain wait context for job
[WARN][4211829.817200] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:210:qat_rsa_decrypt()] Failed to setup async event notifications
[WARN][4211829.817206] PID [304707] Thread [7ff7a2c5e700][qat_hw_rsa.c:1017:qat_rsa_priv_enc()] Failure in qat_rsa_decrypt  fallback = 0

I can see two potential solutions:

  1. (diff) one would be to call _ret = ASYNC_current_job_last_check_and_get(); before those earlier returns
  2. call tlv_destructor (if set) in bssl_qat_async_finish_job

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions