Need to remediate when osquery cannot create database lock file #2004

RebeccaMahany · 2024-12-17T15:48:09Z

I saw this error recently in the automated tests:

I1217 06:16:53.786639 2924 rocksdb.cpp:181] Rocksdb open failed (5:0) IO error: Failed to create lock file: some-root-dir\osquery.db/LOCK: The process cannot access the file because it is being used by another process.

At least at the time of the error, there was only one osquery process. It's not clear from the logs what happened, but I think the osquery process didn't get a chance to terminate completely and the lock file got left behind.

launcher cannot handle when the osquery process fails to start up fully with an error like this -- the only remediation currently available is manual.

We should a) improve the shutdown routine to make sure that launcher doesn't get into this situation, and b) update the osquery runner to detect this state and take corrective action.

Thoughts re: a) improve the shutdown routine to make sure that launcher doesn't get into this situation:

Do we need a longer timeout for the shutdown routines? (This timeout is not enforced by the runner or instance -- it comes from the rungroup and/or from whatever is managing the launcher service.) I think timeout during shutdown is the most likely explanation for this issue. In this particular example, the issue occurred on Windows, so it may be something specific to the Windows service.
Do we want to introduce some kind of ordering to the shutdown routines? I think it's possible we could be running into this issue because some of the shutdown routines happen simultaneously with the osquery process shutdown, and this disrupts clean shutdown.
Do we need to change how we're shutting down osquery to shut it down more gracefully? I don't think there's much more we can do here, but maybe I'm missing something.

Thoughts re: b) update the osquery runner to detect this state and take corrective action:

Is it a good idea for the osquery instance to remove the lock file (if one exists) before starting up a new osquery process? (I think we would prefer this to having the osquery instance remove the lock file on shutdown because the shutdown tasks run more or less simultaneously, so we could end up with e.g. the instance removing the lock file before the osquery process gets the chance to shut itself down cleanly.)
The osquery instance and osquery runner both currently have no visibility into the osquery logs -- they're processed by the log adapter, which is entirely separate. However, here and previously (when trying to figure out how to handle launcher falling back to a old version of osquery that is incompatible with the database -- some details here) we've tentatively wanted the osquery instance to be able to respond to particular logs from osquery. Do we want to move log processing into the osquery instance? Or open up some line of communication between the instance and the log adapter?

RebeccaMahany added the features-improvements Features and Improvements label Dec 17, 2024

RebeccaMahany mentioned this issue Dec 17, 2024

Check for and log information about stale osquery database lock files #2006

Merged

RebeccaMahany mentioned this issue Dec 31, 2024

Move lockfile logging to osquery log adapter #2018

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to remediate when osquery cannot create database lock file #2004

Need to remediate when osquery cannot create database lock file #2004

RebeccaMahany commented Dec 17, 2024 •

edited

Loading

Need to remediate when osquery cannot create database lock file #2004

Need to remediate when osquery cannot create database lock file #2004

Comments

RebeccaMahany commented Dec 17, 2024 • edited Loading

RebeccaMahany commented Dec 17, 2024 •

edited

Loading