Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitcoin-mine segfault when receiving SubmitSolution on release sv2-tp-0.1.11-ipc #72

Open
plebhash opened this issue Nov 30, 2024 · 26 comments

Comments

@plebhash
Copy link

plebhash commented Nov 30, 2024

after moving on from #71 by building release tag sv2-tp-0.1.11-ipc, I was able to successfully launch bitcoin-node and bitcoin-mine and connect a SRI Pool to it

then, I launched a tProxy with config-examples/tproxy-config-local-pool-example.toml config file, while changing the following:

-min_individual_miner_hashrate=10_000_000_000_000.0
+min_individual_miner_hashrate=12_000.0 # CPU mining on a Macbook Pro M2 doing 12 kH/s

only a few seconds after launching minerd, I could see that MULTIPLE blocks were found on Pool logs, and then Pool lost connection with TP:

2024-11-30T04:58:12.429275Z  INFO pool_sv2::lib::template_receiver: Connected to template distribution server at 127.0.0.1:8442
2024-11-30T04:58:12.431489Z  INFO roles_logic_sv2::handlers::common: Received SetupConnectionSuccess: version=2, flags=0
2024-11-30T04:58:12.431641Z  INFO pool_sv2::lib::mining_pool: PUB KEY: Ok([TxOut { value: 0, script_pubkey: Script(OP_0 OP_PUSHBYTES_20 ebe1b7dcc293ccaa0ee743a86f89df8258c208fc) }])
2024-11-30T04:58:12.431699Z  INFO pool_sv2::lib::mining_pool: Starting up pool listener
2024-11-30T04:58:12.431867Z  INFO pool_sv2::lib::mining_pool: Listening for encrypted connection on: 0.0.0.0:34254
2024-11-30T04:58:15.651875Z  INFO roles_logic_sv2::handlers::template_distribution: Received NewTemplate with id: 1, is future: true
2024-11-30T04:58:15.652165Z  INFO roles_logic_sv2::handlers::template_distribution: Received SetNewPrevHash for template: 1
2024-11-30T04:58:23.975206Z  INFO roles_logic_sv2::handlers::common: Received SetupConnection: version=2, flags=100
2024-11-30T04:58:23.975749Z  INFO roles_logic_sv2::handlers::mining: Received OpenExtendedMiningChannel from: ABC with id: 0
2024-11-30T04:58:29.996117Z ERROR roles_logic_sv2::channel_logic::channel_factory: Share does not meet any target: Extended(SubmitSharesExtended { channel_id: 1, sequence_number: 0, job_id: 1, nonce: 2667577408, ntime: 1732942692, version: 536870912, extranonce: Owned([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]) })
2024-11-30T04:58:30.027898Z  INFO roles_logic_sv2::channel_logic::channel_factory: Share hash meet bitcoin target: "000002caa6e0a78d276e2d691a4dd04a7e0a0dc4ddbe0377177afaf87726e28a"
2024-11-30T04:58:30.028031Z ERROR roles_logic_sv2::channel_logic::channel_factory: Share does not meet any target: Extended(SubmitSharesExtended { channel_id: 1, sequence_number: 0, job_id: 1, nonce: 836676202, ntime: 1732942692, version: 536870912, extranonce: Owned([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]) })
2024-11-30T04:58:30.028058Z ERROR roles_logic_sv2::channel_logic::channel_factory: Share does not meet any target: Extended(SubmitSharesExtended { channel_id: 1, sequence_number: 0, job_id: 1, nonce: 3483456789, ntime: 1732942692, version: 536870912, extranonce: Owned([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]) })
2024-11-30T04:58:30.028132Z  INFO pool_sv2::lib::template_receiver: Sending Solution to TP: SubmitSolution { template_id: 1, version: 536870912, header_timestamp: 1732942692, header_nonce: 2262608490, coinbase_tx: Owned([2, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 55, 2, 139, 1, 0, 83, 116, 114, 97, 116, 117, 109, 32, 118, 50, 32, 83, 82, 73, 32, 80, 111, 111, 108, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 2, 0, 242, 5, 42, 1, 0, 0, 0, 22, 0, 20, 235, 225, 183, 220, 194, 147, 204, 170, 14, 231, 67, 168, 111, 137, 223, 130, 88, 194, 8, 252, 0, 0, 0, 0, 0, 0, 0, 0, 38, 106, 36, 170, 33, 169, 237, 226, 246, 28, 63, 113, 209, 222, 253, 63, 169, 153, 223, 163, 105, 83, 117, 92, 105, 6, 137, 121, 153, 98, 180, 139, 235, 216, 54, 151, 78, 140, 249, 1, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) }
2024-11-30T04:58:30.128245Z ERROR roles_logic_sv2::channel_logic::channel_factory: Share does not meet any target: Extended(SubmitSharesExtended { channel_id: 1, sequence_number: 0, job_id: 1, nonce: 2833820778, ntime: 1732942692, version: 536870912, extranonce: Owned([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]) })
...
2024-11-30T04:58:30.533451Z  INFO roles_logic_sv2::channel_logic::channel_factory: Share hash meet bitcoin target: "0000020be23be74ac923aa5c4993507e0a73dc25b8e5a0869163d571fa43afa3"
2024-11-30T04:58:30.533532Z  INFO pool_sv2::lib::template_receiver: Sending Solution to TP: SubmitSolution { template_id: 1, version: 536870912, header_timestamp: 1732942692, header_nonce: 929368576, coinbase_tx: Owned([2, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 55, 2, 139, 1, 0, 83, 116, 114, 97, 116, 117, 109, 32, 118, 50, 32, 83, 82, 73, 32, 80, 111, 111, 108, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 2, 0, 242, 5, 42, 1, 0, 0, 0, 22, 0, 20, 235, 225, 183, 220, 194, 147, 204, 170, 14, 231, 67, 168, 111, 137, 223, 130, 88, 194, 8, 252, 0, 0, 0, 0, 0, 0, 0, 0, 38, 106, 36, 170, 33, 169, 237, 226, 246, 28, 63, 113, 209, 222, 253, 63, 169, 153, 223, 163, 105, 83, 117, 92, 105, 6, 137, 121, 153, 98, 180, 139, 235, 216, 54, 151, 78, 140, 249, 1, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) }
2024-11-30T04:58:30.540348Z ERROR roles_logic_sv2::channel_logic::channel_factory: Share does not meet any target: Extended(SubmitSharesExtended { channel_id: 1, sequence_number: 0, job_id: 1, nonce: 3756263808, ntime: 1732942692, version: 536870912, extranonce: Owned([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]) })
...
2024-11-30T04:58:33.812914Z ERROR network_helpers_sv2::noise_connection_tokio: Disconnected from client while reading : Connection reset by peer (os error 54) - 127.0.0.1:8442
2024-11-30T04:58:33.813122Z ERROR pool_sv2::lib: SHUTDOWN from Upstream: Channel recv failed: `RecvError`
Try to reconnecting or connecting to a new upstream

and bitcoin-mine broke with a segfault:

./build/src/bitcoin-mine -debug=sv2 -loglevel=sv2:trace -signet -sv2port=8442
2024-11-30T04:47:37Z Ignoring unknown configuration value server
2024-11-30T04:47:37Z Ignoring unknown configuration value signet.connect
2024-11-30T04:47:37Z Ignoring unknown configuration value signet.server
2024-11-30T04:47:37Z Ignoring unknown configuration value signet.rpcuser
2024-11-30T04:47:37Z Ignoring unknown configuration value signet.rpcpassword
2024-11-30T04:47:37Z Signet with challenge 51
2024-11-30T04:47:37Z Default data directory /Users/plebhash/Library/Application Support/Bitcoin
2024-11-30T04:47:37Z Using data directory /Users/plebhash/Library/Application Support/Bitcoin/signet
2024-11-30T04:47:37Z Config file: /Users/plebhash/Library/Application Support/Bitcoin/bitcoin.conf
2024-11-30T04:47:37Z Config file arg: [signet] debug="rpc"
2024-11-30T04:47:37Z Config file arg: [signet] debug="sv2"
2024-11-30T04:47:37Z Config file arg: [signet] loglevel="sv2:debug"
2024-11-30T04:47:37Z Config file arg: [signet] signetchallenge="51"
2024-11-30T04:47:37Z Config file arg: [signet] sv2port="8442"
2024-11-30T04:47:37Z Command-line arg: debug="sv2"
2024-11-30T04:47:37Z Command-line arg: loglevel="sv2:trace"
2024-11-30T04:47:37Z Command-line arg: signet=""
2024-11-30T04:47:37Z Command-line arg: sv2port="8442"
Connected to bitcoin-node
2024-11-30T04:47:37Z [sv2] Generated static key, saved to /Users/plebhash/Library/Application Support/Bitcoin/signet/sv2_static_key
2024-11-30T04:47:37Z [sv2:info] Static key: 036cb18488d0b3e5f05bc20ce26d182d06fbc65ae86f18f67c78eb3b5f5c8eeace
2024-11-30T04:47:37Z [sv2] Generated authority key, saved to /Users/plebhash/Library/Application Support/Bitcoin/signet/sv2_authority_key
2024-11-30T04:47:37Z Template Provider authority key: 9bn6KvBNmp29GJ1WZvCwGm7HuACHQc8Uniq9gR6rPWeFeoL3oem
2024-11-30T04:47:37Z [sv2:info] Template Provider listening on 127.0.0.1:8442
2024-11-30T04:47:37Z net thread start
2024-11-30T04:47:37Z sv2 thread start
2024-11-30T04:47:40Z [sv2] Received 0x00 SetupConnection from client id=0
2024-11-30T04:47:40Z [sv2] Send 0x01 SetupConnection.Success to client id=0
2024-11-30T04:47:40Z [sv2] Received 0x70 CoinbaseOutputDataSize from client id=0
2024-11-30T04:47:40Z [sv2] coinbase_output_max_additional_size=1 bytes
2024-11-30T04:47:41Z [sv2] Send 0x71 NewTemplate id=1 future=1 to client id=0
2024-11-30T04:47:41Z [sv2] Send 0x72 SetNewPrevHash to client id=0
2024-11-30T04:49:01Z [sv2] Received 0x76 SubmitSolution from client id=0
[1]    91293 segmentation fault  ./build/src/bitcoin-mine -debug=sv2 -loglevel=sv2:trace -signet -sv2port=8442
@plebhash
Copy link
Author

since testnet4 is being attacked, tests above are being performed against a custom signet, here's a config file to connect to it:

[signet]
signetchallenge=51      # OP_TRUE
connect=75.119.150.111  # Genesis node
server=1
rpcuser=username
rpcpassword=password
sv2port=8442
debug=rpc
debug=sv2
loglevel=sv2:debug

@plebhash
Copy link
Author

plebhash commented Nov 30, 2024

actually, the fact that Pool sent multiple SubmitSolution were sent in a small time window is irrelevant

I raised tProxy downstream difficulty on config-examples/tproxy-config-local-pool-example.toml:

-min_individual_miner_hashrate=10_000_000_000_000.0
+min_individual_miner_hashrate=50_000_000.0 # relatively high diff for CPU mining on a Macbook Pro M2 doing 12 kH/s

as soon as Pool sends SubmitSolution, bitcoin-mine breaks

@plebhash
Copy link
Author

ahh, message above is also not too accurate

I was able to propagate multiple blocks before breaking bitcoin-mine, so I'm not sure anymore that there's a distinguished patterns on time between solutions

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

Thanks, I hadn't tested SubmitSolution in the "wild" myself with the last release because I had not had time to setup a custom signet. Will connect to yours and try it.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

I'm using roles/target/release/mining_device to mine directly against the pool role on my M4 MacBook Pro. It found one block so far, so seems a bit slow. No crash yet, but I'll let it run.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

It also seems like it finds one valid share SubmitSharesSuccess and then subsequently only invalid shares Received SubmitSharesError with error code difficulty-too-low, until you restart.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

I used https://github.com/pooler/cpuminer with the translator instead. It seems to work a little bit better and mine faster, though not dramatically fast. Getting about two blocks per minute. No crashes.

Maybe I should try my S9 again... though last time I tried that it had difficulty with signet.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

Yup, my S9 with ancient BraiinsOS still trips over signet work, it just keeps rebooting.

Let's try the BitAxe...

@plebhash
Copy link
Author

plebhash commented Dec 2, 2024

make sure you get tProxy to set reasonable difficulties according to the different hashrates by tweaking min_individual_miner_hashrate on tProxy config

otherwise you could unintentionally skew your testing and create blindspots

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

I ended setting it to 300_000_000 instead of the BitAxe real ~700_000_000_000 capability, so it doesn't try to achieve a needless high difficulty.

Managed to produce a crash:

2024-12-02T12:26:55Z [sv2] Template with id=10 is no longer in cache

It seems like miner sent a solution for an earlier block here.

@plebhash
Copy link
Author

plebhash commented Dec 2, 2024

is that related to the segfault that was originally reported on this issue?

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

Please use loglevel=sv2:trace instead of info.

That said, the "no longer in cache" message is at the debug level so if you had the same issue, you would have seen it...

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

So it might be a different error, though I wouldn't be surprised if it's related. I think this cache clearer prematurely.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

This is odd though...

It seems that either the pool or the translator role is holding on to outdated templates much too long.

2024-12-02T12:04:21Z [sv2] Send 0x71 NewTemplate id=10 future=1 to client id=2
2024-12-02T12:04:21Z [sv2] Send 0x72 SetNewPrevHash to client id=2

Twenty minutes went by as I shut down the S9 and connected a BitAxe to the pool instead.

2024-12-02T12:26:49Z [sv2] Received 0x76 SubmitSolution from client id=2
2024-12-02T12:26:49Z [sv2:trace] version=557875200, timestamp=1733141061, nonce=4008181900

It looks like it gave the same template to the BitAxe. Maybe that makes sense.

The TP then creates two templates (which is strange):

2024-12-02T12:26:50Z [sv2] Send 0x71 NewTemplate id=11 future=1 to client id=2
2024-12-02T12:26:50Z [sv2] Send 0x72 SetNewPrevHash to client id=2
2024-12-02T12:26:50Z [sv2] Send 0x71 NewTemplate id=12 future=1 to client id=2
2024-12-02T12:26:50Z [sv2] Send 0x72 SetNewPrevHash to client id=2

Four seconds later it gets a solution that references the old template:

2024-12-02T12:26:54Z [sv2] Received 0x76 SubmitSolution from client id=2
2024-12-02T12:26:54Z [sv2:trace] version=557875200, timestamp=1733141061, nonce=4008181900
2024-12-02T12:26:55Z [sv2] Template with id=10 is no longer in cache

I guess the problem here is that pool didn't have a worker connected when a new block came in. And so when a new worked did connect, it didn't ask for a new template.

(no that makes no sense either)

In any case this shouldn't crash the TP.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

I restarted the node, template provider, pool, translator and the restarted the BitAxe. Now it doesn't seem to start mining.

The translator occastionally logs:

2024-12-02T12:52:01.027887Z  INFO roles_logic_sv2::handlers::mining: Received SetTarget for channel id: 1
2024-12-02T12:52:01.027940Z  INFO translator_sv2::lib::upstream_sv2::upstream: SetTarget: SetTarget { channel_id: 1, maximum_target: Ref([155, 131, 89, 174, 113, 157, 142, 153, 143, 206, 108, 210, 5, 190, 130, 33, 206, 180, 229, 218, 92, 102, 15, 94, 170, 252, 214, 98, 2, 0, 0, 0]) }

But AxeOS shows a hash rate of zero.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

When I restart the BitAxe once more I get another "[sv2] Template with id=1 is no longer in cache" crash. Again it's two SubmitSolution messages a few seconds apart with the second one crashing with "Template with id=1 is no longer in cache".

The TP shouldn't crash.

But I'm also suprised the BitAxe doesn't start working NewTemplate id=2 for at least two seconds. It seems there's a delay where the pool only receives the new template message two seconds after the TP sent it. That explains why the BitAxe isn't working on the new job.

Maybe this is a case of the operating system or either application keeping something in the send / receive buffer a bit too long. I noticed a similar problem recently in the tests on the Bitcoin Core side.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

@vasild this might be the same problem I encountered here: #49 (comment)

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

@plebhash the sv2-ipc-multisubmit branch in #73 fixes the crash for me, hopefully for you as well.

However you'll get a lot of stale blocks on a "fast" miner.

I still need to look into what's causing the delay in new templates getting sent to the pool. Will bake a new release once that's also fixed.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

There's definitely something weird going on. Not sure if it's networking code though.

If I run with -debug=ipc I see that we receive a 0x76 SubmitSolution message, but then bitcoin_mine waits almost exactly 30 seconds before IPC client send BlockTemplate.submitSolution$Params to the node process.

It takes the node only two milliseconds to update the tip, plus another millisecond to construct a new template and return it via waitNext. The TP then does some back and forth and 10 milisceond later constructs the 0x71 NewTemplate and 0x72 SetNewPrevHash messages.

But it waits a full second before actually sending it. During that second it send a waitNext() and waited for its result. So that part seems to be a concurrency problem.

30 seconds happens to be the default -sv2interval, so I tried setting it to 10, which reduces the interval but not exactly 10s.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

Narrowed it down a bit, the LOCK(m_tp_mutex); in Sv2TemplateProvider::SubmitSolution takes a very long time.

And we're holding m_tp_mutex all the way through a 1 second waitNext().

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

I pushed another commit to #73 which should fix this concurrency issue. Running some more tests before baking a new release.

I was able to mine 15 blocks in a row with my BitAxe without stales.

I suspect it will also fix the test slowness, which I'm also checking now.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

@plebhash looking at your log again, I suspect you did have the same issue. It's submitting two difference nonces for the same template id, which would cause a crash.

That should no longer happen, and the node should now be quicker at giving you a new template.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

The sv2_template_provider_tests still times out on CI, which makes me suspect there's a similar bug hiding in the shadows. I'll see if I can find it, but otherwise will bake a new release anyway.

@Sjors
Copy link
Owner

Sjors commented Dec 2, 2024

I found a solution for the slow tests as well, though not related to this crash. I needed to add a 1ms sleep to the main loop in ThreadSv2Handler.

@Sjors
Copy link
Owner

Sjors commented Dec 16, 2024

Let me know if you're still seeing crashes on the last release and/or the sv2 / sv2-ipc branches.

@Sjors
Copy link
Owner

Sjors commented Dec 24, 2024

Perfection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants