Skip to content

Conversation

@dovgopoly
Copy link
Collaborator

@dovgopoly dovgopoly commented Jan 6, 2026

Closes #8738.

This PR refactors the bcli plugin from async to sync execution. It introduces a new run_bitcoin_cli function that blocks until bitcoin-cli completes.

All RPC commands are converted to use this synchronous pattern.

The PR also adds a command_err helper for consistent error handling, improves waitpid error reporting, fixes fd leaks by properly closing file descriptors, and removes all the now-unused async infrastructure (pending queues, priority scheduling, retry callbacks).

Tests are fixed accordingly.

plugins/bcli.c Outdated
res->output_len = strlen(res->output);

/* Wait for child to exit */
while (waitpid(child, &status, 0) < 0 && errno == EINTR);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have something like this here:

	while (waitpid(child, &status, 0) < 0) {
		if (errno == EINTR)
			continue;
		plugin_err(plugin, "waitpid(%s) failed: %s",
			   res->args, strerror(errno));
	}

Copy link
Collaborator Author

@dovgopoly dovgopoly Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, added this.

Also I think I forgot to close a from descriptor in wait_and_check_bitcoind and run_bitcoin_cliv. Added this as well.

* we can attempt to prevent a crash if the 'getchaininfo' function returns
* a lower height than the one we already know, by waiting for a short period.
* However, I currently don't have a better idea on how to handle this situation. */
u32 *height UNUSED;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave this as it is and not use the optional last_height parameter. We're planning on getting rid of retries altogether inside bcli and have lightningd solely deal with retrying. So it feels a bit clunky to me to add retrying based on the last_height and ALSO have it be blocking to the rest of the plugin. We were never using it before and I don't think anyone has complained about this (yet) so it's probably not fatal?

Comment on lines +1138 to +680
/* As of at least v0.15.1.0, bitcoind returns "success" but an empty
string on a spent txout. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see this documented in the bitcoin-cli documentation https://developer.bitcoin.org/reference/rpc/gettxout.html , it might be good to test it out on cli to see if it every comes up with something else that's not an empty string but something like the string null?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be alright. I've found an issue raising this.

plugins/bcli.c Outdated
Comment on lines 496 to 504
static struct command_result *command_err_badjson(struct command *cmd,
struct bcli_result *res,
const char *errmsg)
{
char *err = tal_fmt(cmd, "%s: bad JSON: %s (%.*s)",
res->args, errmsg, (int)res->output_len, res->output);
return command_done_err(cmd, BCLI_ERROR, err, NULL);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is actually pretty good to have! Despite the plugin now being synchronous, it doesn't have to die, which plugin_err would do.

if (res->exitstatus != 0)
return estimatefees_null_response(cmd);

tokens = json_parse_simple(res->output, res->output, res->output_len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this pattern quite absurd because res->output is not a ctx! But it's used all over bcli in other places so it's not your fault at all! Have asked @rustyrussell about this and if it is intentional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked him about this, this works okay in these scenarios because the lifetime of the pointers are the same so once the parent pointer is cleaned up any children are also cleaned up at the same time.

@dovgopoly dovgopoly force-pushed the feat/refactor-bcli branch 2 times, most recently from 2faf7ec to 8512a1d Compare January 9, 2026 22:31
@dovgopoly dovgopoly requested a review from sangbida January 9, 2026 22:33
@dovgopoly dovgopoly marked this pull request as ready for review January 10, 2026 15:25
@dovgopoly dovgopoly force-pushed the feat/refactor-bcli branch 4 times, most recently from bbc4e8d to b3632a5 Compare January 13, 2026 14:10
Also rename command_err_badjson to generic command_err helper, since error messages aren't always about bad JSON (e.g., "command failed" for non-zero exit).
Add check_bitcoin_error to detect errors from bcli plugin and call fatal with a clear message identifying the failing method.
# We block l3 from seeing close, so it will try to reestablish.
def no_new_blocks(req):
return {"error": "go away"}
return {"error": {"code": -8, "message": "Block height out of range"}}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we revert with another code, it kills lightningd instantly, not making retry anymore.
I think it's expected to return a null response to the lightningd here.

Copy link
Collaborator Author

@dovgopoly dovgopoly Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After dddf995 it doesn't break the tests and we can remove these changes, but I decided to leave it as it is.

dovgopoly and others added 3 commits January 13, 2026 23:07
Changelog-Changed: bcli plugin now uses synchronous execution, simplifying bitcoin backend communication and improving error handling reliability.
if (res->exitstatus == 8)
return getrawblockbyheight_notfound(cmd);
return command_err(cmd, res, "command failed");
return getrawblockbyheight_notfound(cmd);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning a null response should be alright. I don't see the case we need to return error here. We can return an empty block and try again later. Tests are waiting the same behavior.

}
}

sleep(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the sleep was there in the async implementation of the plugin but I would add a test that runs this under load. So try to getrawblockbyheight and also try to run bcli commands at the same time.

Comment on lines +114 to +132
/* Check if response is an error and fail with clear message if so */
static void check_bitcoin_error(struct bitcoind *bitcoind, const char *buf,
const jsmntok_t *toks, const char *method)
{
const jsmntok_t *err_tok, *msg_tok;

err_tok = json_get_member(buf, toks, "error");
if (!err_tok)
return;

msg_tok = json_get_member(buf, err_tok, "message");
if (msg_tok)
fatal("%s: %.*s", method,
json_tok_full_len(msg_tok), json_tok_full(buf, msg_tok));
else
fatal("%s: %.*s", method,
json_tok_full_len(err_tok), json_tok_full(buf, err_tok));
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation behind this function. This seems like it kills bitcoind if an error response or a timeout happens from bitcoind. It's not uncommon for bitcoind (the actual bitcoin core daemon) to go out for lunch and have a timeout.

@rustyrussell checked his node and these were top 5 bitcoind timeouts in msec:
21812
22053
22406
23600
30535
60324

Separate from this, commands like estimatefees and getblockhash can fail when they are syncing. In our meeting, Rusty also suggested that it would be good to change that function from a check_bitcoin_error to something like check/get_bitcoin_result? That way you can do a lot of common result handling inside that function and as a result of that tidy up some of the code.

If we're doing this we should probably be a bit more discerning with our retry behaviour:

sendrawtransaction - This one is probably okay without a retry because we have rebroadcasting mechanisms already in lightningd.

getchaininfo - Lightningd retry loops rely on getchaininfo succeeding I think, so I wonder if this needs more internal retries or if it's okay for this to just die on one missed rpc call?

getrawblockbyheight - we have a retry on this one

estimatefees - This does fail during sync but if it returns the estimatefees_null_response() that's probably okay since it's not an error?

getutxoout - I think this one is fine as well because we only crash on a broken json.

}

strip_trailing_whitespace(res->output, res->output_len);
if (strlen(res->output) != 64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: magic number


return command_still_pending(cmd);
/* Try fetching from peers if bitcoind >= 23.0.0 */
if (bitcoind->version >= 230000) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: magic number

const char *method,
...)
static LAST_ARG_NULL struct bcli_result *
run_bitcoin_cli(const tal_t *ctx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use this function to refactor wait_and_check_bitcoind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor bcli Plugin from Asynchronous Multi-threaded to Synchronous Single-threaded Architecture

2 participants