-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Description
Why you need it?
当tm端的channel与tc端断开时,很大可能是因为tm宕机了,如果tm宕机说明tm无法决议事务,但是调用链可能已经到了最后一步,假设这个tm调用了10个微服务,那么在AT下已经锁定了非常多的数据,而默认的timeoutrollback的时间是60秒,tm的宕机会导致数据被锁定60秒不可用,这是比较恶劣的
解决方案参考:
1.通过绑定channelInactive的事件处理,并需要增加开关,因为有可能由于网络问题误判,导致回滚了事务,所以默认开关应该是关闭的,不提前回滚全局事务
2.拥有提前rollback权限的是begin的那个tc,其余tc即便感知到tm断线,也不应该进行rollback,避免出现同时进行rollback或者其中一台tc与tm断开连接导致的误判,所以粒度可以控制在begin的tc上
When the channel between the TM and TC disconnects, it is likely because the TM has crashed. If the TM crashes, it means the TM cannot make a decision on the transaction, but the call chain may have already reached the final step. Suppose this TM has called 10 microservices, under AT mode, a lot of data has already been locked, and the default timeout rollback time is 60 seconds. The TM crash would lead to the data being locked and unavailable for 60 seconds, which is quite severe.
Solution reference:
- Handle the channelInactive event and add a switch because network issues might lead to misjudgment, causing the transaction to be rolled back prematurely. Therefore, the default switch should be off to avoid premature global transaction rollback.
- The right to prematurely rollback should be with the initiating TC. Even if other TCs sense the TM disconnection, they should not perform a rollback to avoid simultaneous rollback or misjudgment caused by a single TC disconnecting from the TM. Hence, the granularity can be controlled at the initiating TC.
How it could be?
A clear and concise description of what you want to happen. You can explain more about input of the feature, and output of it.
Other related information
Add any other context or screenshots about the feature request here.