Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to remediate when osquery cannot create database lock file #2004

Open
RebeccaMahany opened this issue Dec 17, 2024 · 0 comments
Open

Need to remediate when osquery cannot create database lock file #2004

RebeccaMahany opened this issue Dec 17, 2024 · 0 comments
Labels
features-improvements Features and Improvements

Comments

@RebeccaMahany
Copy link
Contributor

RebeccaMahany commented Dec 17, 2024

I saw this error recently in the automated tests:

I1217 06:16:53.786639 2924 rocksdb.cpp:181] Rocksdb open failed (5:0) IO error: Failed to create lock file: some-root-dir\osquery.db/LOCK: The process cannot access the file because it is being used by another process.

At least at the time of the error, there was only one osquery process. It's not clear from the logs what happened, but I think the osquery process didn't get a chance to terminate completely and the lock file got left behind.

launcher cannot handle when the osquery process fails to start up fully with an error like this -- the only remediation currently available is manual.

We should a) improve the shutdown routine to make sure that launcher doesn't get into this situation, and b) update the osquery runner to detect this state and take corrective action.

Thoughts re: a) improve the shutdown routine to make sure that launcher doesn't get into this situation:

  • Do we need a longer timeout for the shutdown routines? (This timeout is not enforced by the runner or instance -- it comes from the rungroup and/or from whatever is managing the launcher service.) I think timeout during shutdown is the most likely explanation for this issue. In this particular example, the issue occurred on Windows, so it may be something specific to the Windows service.
  • Do we want to introduce some kind of ordering to the shutdown routines? I think it's possible we could be running into this issue because some of the shutdown routines happen simultaneously with the osquery process shutdown, and this disrupts clean shutdown.
  • Do we need to change how we're shutting down osquery to shut it down more gracefully? I don't think there's much more we can do here, but maybe I'm missing something.

Thoughts re: b) update the osquery runner to detect this state and take corrective action:

  • Is it a good idea for the osquery instance to remove the lock file (if one exists) before starting up a new osquery process? (I think we would prefer this to having the osquery instance remove the lock file on shutdown because the shutdown tasks run more or less simultaneously, so we could end up with e.g. the instance removing the lock file before the osquery process gets the chance to shut itself down cleanly.)
  • The osquery instance and osquery runner both currently have no visibility into the osquery logs -- they're processed by the log adapter, which is entirely separate. However, here and previously (when trying to figure out how to handle launcher falling back to a old version of osquery that is incompatible with the database -- some details here) we've tentatively wanted the osquery instance to be able to respond to particular logs from osquery. Do we want to move log processing into the osquery instance? Or open up some line of communication between the instance and the log adapter?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
features-improvements Features and Improvements
Projects
None yet
Development

No branches or pull requests

1 participant