Skip to content

Conversation

@alfonso-escribano
Copy link

I have seen some cases where when you delete an ip address while there are stablished connections, the connections stay dangled.
This patch try to avoid that cases.
First it kills connections before deleting the ip address to inform the clients, and after deleting the ip address to kill connections that can initiated between first kill and ip deletion.

Other approach could be to kill connections in fuction "delete_interface", before "addr delete"

To be able to use "ss -K", you need a kernel with config option CONFIG_INET_DIAG_DESTROY set.

@knet-jenkins
Copy link

knet-jenkins bot commented Sep 25, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2076/1/input

@oalbrigt oalbrigt changed the title Add support to kill dangling ip connections IPaddr2: add support to kill dangling IP connections Sep 25, 2025
local ss_line=""
local ss_out_loglevel="info"

local ipaddr="$1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the first line of the function.

@knet-jenkins
Copy link

knet-jenkins bot commented Sep 25, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2076/2/input

Also fix ss_output variable missing '$'
@knet-jenkins
Copy link

knet-jenkins bot commented Sep 25, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2076/3/input

@knet-jenkins
Copy link

knet-jenkins bot commented Sep 25, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2076/4/input

@oalbrigt
Copy link
Contributor

oalbrigt commented Sep 25, 2025

What is the use-case for this?

Usually the service (being run by the cluster as well) would try to close the connections cleanly before the IPaddr2 resource is stopped, if you got them in a group or with constraints to start the service after the IP resource.

If this is because of connections in TIME_WAIT state in ss these are already cleanly terminated and just waiting to receive FIN packet from the peer after sending FIN.

@oalbrigt
Copy link
Contributor

oalbrigt commented Sep 25, 2025

Correction, it doesnt go against the RFC as it sends RST packet to both endpoints, but it seems to be mostly needed to clean up connections "dead" connections from running processes, which doesnt seem to match the usual use-cases for this agent.

Also forcefully removing the TIME_WAIT connections goes against the RFC: https://www.rfc-editor.org/rfc/rfc1122#page-87.

When a connection is closed actively, it MUST linger in
TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).

@alfonso-escribano
Copy link
Author

What is the use-case for this?

Usually the service (being run by the cluster as well) would try to close the connections cleanly before the IPaddr2 resource is stopped, if you got them in a group or with constraints to start the service after the IP resource.

If this is because of connections in TIME_WAIT state in ss these are already cleanly terminated and just waiting to receive FIN packet from the peer after sending FIN.

The use case is this (I think there are several cases where this patch can be very useful).

We have 2 nodes with postgresql servers, in streaming, one in host-standby attending read-only queries, and other in read/write attending read/write queries, in front of those servers, we have pgbouncer, one process in each node, same config for two nodes.
We want to change where clients connect to exec read-only queries depending on the servers load or other parameters.

When we want to change ip address for read-only queries from one server to the other, we don't want to stop pgbouncer neither postgresql, we could stop read-only pgbouncer before change ip address, but the problem appears when we have the two ip addresses in same server (and only one pgbouncer for all connections), and we want to change read-only ip address to the other server, we don't want to stop pgbouncer that holds read-write and read-only queries.

In that case, if we change the ip address, connections to the ip address that we move, get dangling in server and also in clients.
If we "kill" them with "ss -K", we avoid dangling connections in server and also in clients.

We have try to improve our linux behavior tunning tcp parameters like tcp_keepalive and othe parameters, but when we remove ip address from de server, it doesn't work as espectedm depens on the server proccess

With this patch we are sure that we close all connections to the ip address before delete it, and clients reaction to that connection reset is quickly.

Also thingking in this other way:
We want to "remove/move/down" an ip address from a cluster node, we know that it's impossible to clients maintain the same connection with that ip address, then the "quickiest" way to inform clients, is to focibly kill the connections from server side.

Other use case where this can be very usefull.
If you want to have a floating ip address between several nodes, yo don't want to stop/restart services, only change the ip address from server.
I have seen cases where you remove the ip address from a server, but the connection remains in established state, but it's impossible to send/recieve any packets, in that cases, the client remains dangled until some timeouts expires, maybe more than 15 minutes.

I know that it's unusual to need this behavior, but I'm also sure that this option is going to be very usefull for many use cases
Also it's an optional parameter, if you don't use it, it doesn't change anything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants