Can poll_read() cause a hanging FIN_WAIT2 state in a tcp connection? #7037

vinny-pereira · 2024-12-15T17:19:07Z

vinny-pereira
Dec 15, 2024

Hello, I am new to rust development and tokio. I am currently trying to solve an issue within another project, where, sometimes if that program runs for a long time, some of its sockets will linger indefinitely in a FIN_WAIT2 state.

As far as I understand, that state will linger because we shutdown our write side of the socket, we received an ACK from the peer and we are waiting for their FIN so we can close our reader.

Currently this is the loop we use for our read half, where we call run() in a spawned task:

pub struct TcpStreamActor<T: AsyncRead + Unpin> {
    pub stream: T,
    pub sender: UnboundedSender<ReaderMessage>,
}

impl<T: AsyncRead + Unpin> TcpStreamActor<T> {
    async fn inner(&mut self) -> std::result::Result<(), PeerError> {
        loop {
            let mut data: Vec<u8> = vec![0; 24];

            // Read the header first, so learn the payload size
            self.stream.read_exact(&mut data).await?;
            let header: P2PMessageHeader = deserialize_partial(&data)?.0;

            // Network Message too big
            if header.length > (1024 * 1024 * 32) as u32 {
                return Err(PeerError::MessageTooBig);
            }

            data.resize(24 + header.length as usize, 0);

            // Read everything else
            self.stream.read_exact(&mut data[24..]).await?;

            // Intercept block messages
            if header._command[0..5] == [0x62, 0x6c, 0x6f, 0x63, 0x6b] {
                let mut block_data = vec![0; header.length as usize];
                block_data.copy_from_slice(&data[24..]);

                let message: UtreexoBlock = deserialize(&block_data)?;
                self.sender.send(ReaderMessage::Block(message))?;
            }

            let message: RawNetworkMessage = deserialize(&data)?;
            self.sender.send(ReaderMessage::Message(message))?;
        }
    }

    pub async fn run(mut self) -> Result<()> {
        let err = self.inner().await;
        if let Err(err) = err {
            self.sender.send(ReaderMessage::Error(err))?;
        }
        Ok(())
    }
}

If something goes wrong with the network, if the implementation of the peer is not properly set, we might never receive a message again from that peer.

My question here is, in this case, would the read_exact() call be hanging forever in this situation? there is no timeout enforced here, as I read in poll_read, if no message is received, poll_read() will be kept in a Pending state until an actual message is received. If we don't receive a message, we won't get the output of that future.

Is my chain of thought correct here? Would that be a indicative of why the FIN_WAIT2 might arise sometimes? because if that's true, the reader half won't ever be dropped keeping the connection opened.

Also, would a CancellationToken work in this situation at the task spawn? or would a timeout be more suitable here?

EDIT: This is happening on Linux machines only at the moment, at least this behaviour was reported from Linux users only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can poll_read() cause a hanging FIN_WAIT2 state in a tcp connection? #7037

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Can poll_read() cause a hanging FIN_WAIT2 state in a tcp connection? #7037

vinny-pereira Dec 15, 2024

Replies: 0 comments

vinny-pereira
Dec 15, 2024