You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I1217 06:16:53.786639 2924 rocksdb.cpp:181] Rocksdb open failed (5:0) IO error: Failed to create lock file: some-root-dir\osquery.db/LOCK: The process cannot access the file because it is being used by another process.
At least at the time of the error, there was only one osquery process. It's not clear from the logs what happened, but I think the osquery process didn't get a chance to terminate completely and the lock file got left behind.
launcher cannot handle when the osquery process fails to start up fully with an error like this -- the only remediation currently available is manual.
We should a) improve the shutdown routine to make sure that launcher doesn't get into this situation, and b) update the osquery runner to detect this state and take corrective action.
Thoughts re: a) improve the shutdown routine to make sure that launcher doesn't get into this situation:
Do we need a longer timeout for the shutdown routines? (This timeout is not enforced by the runner or instance -- it comes from the rungroup and/or from whatever is managing the launcher service.) I think timeout during shutdown is the most likely explanation for this issue. In this particular example, the issue occurred on Windows, so it may be something specific to the Windows service.
Do we want to introduce some kind of ordering to the shutdown routines? I think it's possible we could be running into this issue because some of the shutdown routines happen simultaneously with the osquery process shutdown, and this disrupts clean shutdown.
Do we need to change how we're shutting down osquery to shut it down more gracefully? I don't think there's much more we can do here, but maybe I'm missing something.
Thoughts re: b) update the osquery runner to detect this state and take corrective action:
Is it a good idea for the osquery instance to remove the lock file (if one exists) before starting up a new osquery process? (I think we would prefer this to having the osquery instance remove the lock file on shutdown because the shutdown tasks run more or less simultaneously, so we could end up with e.g. the instance removing the lock file before the osquery process gets the chance to shut itself down cleanly.)
The osquery instance and osquery runner both currently have no visibility into the osquery logs -- they're processed by the log adapter, which is entirely separate. However, here and previously (when trying to figure out how to handle launcher falling back to a old version of osquery that is incompatible with the database -- some details here) we've tentatively wanted the osquery instance to be able to respond to particular logs from osquery. Do we want to move log processing into the osquery instance? Or open up some line of communication between the instance and the log adapter?
The text was updated successfully, but these errors were encountered:
I saw this error recently in the automated tests:
At least at the time of the error, there was only one osquery process. It's not clear from the logs what happened, but I think the osquery process didn't get a chance to terminate completely and the lock file got left behind.
launcher cannot handle when the osquery process fails to start up fully with an error like this -- the only remediation currently available is manual.
We should a) improve the shutdown routine to make sure that launcher doesn't get into this situation, and b) update the osquery runner to detect this state and take corrective action.
Thoughts re: a) improve the shutdown routine to make sure that launcher doesn't get into this situation:
Thoughts re: b) update the osquery runner to detect this state and take corrective action:
The text was updated successfully, but these errors were encountered: