unable to create_distributed_table() #7798

34code · 2024-12-20T18:12:44Z

Here is the error I see (over 5 times after running for 2 hrs each):

NOTICE:  Copying data from local table...
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

And here are the logs from the server at the same time:

2024-12-20 17:43:07.787 UTC [129221] LOG:  invalid length of startup packet
2024-12-20 17:43:09.224 UTC [129222] LOG:  invalid length of startup packet
2024-12-20 17:43:10.845 UTC [129223] LOG:  invalid length of startup packet
2024-12-20 17:43:12.078 UTC [129224] LOG:  invalid length of startup packet
2024-12-20 18:09:24.342 UTC [130501] FATAL:  unsupported frontend protocol 65363.19778: server supports 3.0 to 3.0
2024-12-20 18:09:25.015 UTC [130504] LOG:  invalid length of startup packet
2024-12-20 18:09:25.196 UTC [130505] LOG:  invalid length of startup packet
2024-12-20 18:09:25.440 UTC [130514] LOG:  invalid length of startup packet
2024-12-20 18:09:25.522 UTC [130515] LOG:  invalid length of startup packet
2024-12-20 18:09:25.627 UTC [130516] LOG:  invalid length of startup packet
2024-12-20 18:09:25.753 UTC [130517] LOG:  invalid length of startup packet
2024-12-20 18:09:25.831 UTC [130518] LOG:  invalid length of startup packet
2024-12-20 18:09:25.907 UTC [130519] LOG:  invalid length of startup packet
2024-12-20 18:09:26.031 UTC [130520] LOG:  invalid length of startup packet
2024-12-20 18:09:26.133 UTC [130521] LOG:  invalid length of startup packet
2024-12-20 18:09:26.422 UTC [130522] LOG:  invalid length of startup packet
2024-12-20 18:09:26.677 UTC [130523] LOG:  invalid length of startup packet
2024-12-20 18:09:28.188 UTC [130503] LOG:  could not receive data from client: Connection reset by peer

It's about 70GB of data transfer to the worker nodes (saw size increase by that much with "df -h" on worker nodes).. but the operation never successfully completes.

The text was updated successfully, but these errors were encountered:

34code · 2024-12-20T23:35:04Z

is there a possibility there is a default timeout of 2 hrs for queries by default?

onurctirtir · 2024-12-23T10:08:55Z

I'm not aware if Citus specifies such a default statement timeout for distributed queries / operations, so I'll check the code to see if we're doing this somewhere for internal COPY connections.

Also wondering are you seeing "canceling statement due to statement timeout", or "canceling statement due to lock timeout", or such messages in your PG logs during COPY, before distributed table creation breaks?

onurctirtir · 2024-12-23T10:10:11Z

But all in all, such error messages are not really good;

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

onurctirtir · 2024-12-23T10:12:59Z

Would you mind sharing the CREATE TABLE command as well as the exact SELECT .. create_distributed_table(..) .. call that you're making?

If sharing the exact commands is not possible, some obfuscated version that reveals the column types, column default expressions and indexes etc. would also help a lot.

34code · 2024-12-24T09:34:44Z

I think i've isolated it to a node/os related timeout.. I was able to tmux and docker exec into coordinator node and it ran fine without the timeout when using psql. However, when i connect from my laptop or remote server, it always times out.

Here is my create table:

CREATE TABLE com_walmart_prices (
    id uuid NOT NULL DEFAULT uuid_generate_v4(),
    source character varying DEFAULT 'Walmart.com'::character varying,
    condition character varying,
    amount character varying,
    buy_link text,
    barcode character varying NOT NULL,
    created_at timestamp(6) without time zone NOT NULL,
    amount_override character varying
);

And this is the create distributed table command:

SELECT create_distributed_table('com_walmart_prices', 'barcode');

Re: the logs do you mean just copying the docker logs from the coordinator node while the command is running to assess why its timing out?

34code · 2024-12-24T09:36:57Z

I had all the data on coordinator node which wasn't ideal but it all fit somehow.. in future I would prefer to not have to write the data to coordinator node first as the tables will grow to be larger than coordinator node alone can take. Maybe its an issue with my pg_backup script which is coming from vanilla postgres..

34code · 2024-12-24T09:39:52Z

These are the relevant logs I found on the coordinator node:

2024-12-20 06:53:06.739 UTC [95332] ERROR:  canceling statement due to user request
2024-12-20 06:53:06.739 UTC [95332] STATEMENT:  SELECT create_distributed_table('com_walmart_prices', 'barcode');

2024-12-20 06:53:06.739 UTC [95332] LOG:  could not send data to client: Connection reset by peer
2024-12-20 06:53:06.739 UTC [95332] FATAL:  connection to client lost
2024-12-20 06:53:06.871 UTC [97222] LOG:  PID 95332 in cancel request did not match any process
2024-12-20 06:53:28.364 UTC [95330] ERROR:  canceling statement due to user request
2024-12-20 06:53:28.364 UTC [95330] STATEMENT:  SELECT create_distributed_table('com_walmart_prices', 'barcode');

2024-12-20 06:53:38.307 UTC [97232] ERROR:  canceling statement due to user request
2024-12-20 06:53:38.307 UTC [97232] STATEMENT:  SELECT create_distributed_table('com_walmart_prices', 'barcode');

onurctirtir · 2024-12-25T12:32:32Z

I think i've isolated it to a node/os related timeout.. I was able to tmux and docker exec into coordinator node and it ran fine without the timeout when using psql. However, when i connect from my laptop or remote server, it always times out.

Yes, such logs made me think of that one of the issues here is related to statement timeout.

However, these still don't really look good and look like a separate problem.

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

I'll look into this a bit more and will try to reproduce the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to create_distributed_table() #7798

unable to create_distributed_table() #7798

34code commented Dec 20, 2024 •

edited

Loading

34code commented Dec 20, 2024

onurctirtir commented Dec 23, 2024

onurctirtir commented Dec 23, 2024

onurctirtir commented Dec 23, 2024

34code commented Dec 24, 2024 •

edited

Loading

34code commented Dec 24, 2024

34code commented Dec 24, 2024

onurctirtir commented Dec 25, 2024

unable to create_distributed_table() #7798

unable to create_distributed_table() #7798

Comments

34code commented Dec 20, 2024 • edited Loading

34code commented Dec 20, 2024

onurctirtir commented Dec 23, 2024

onurctirtir commented Dec 23, 2024

onurctirtir commented Dec 23, 2024

34code commented Dec 24, 2024 • edited Loading

34code commented Dec 24, 2024

34code commented Dec 24, 2024

onurctirtir commented Dec 25, 2024

34code commented Dec 20, 2024 •

edited

Loading

34code commented Dec 24, 2024 •

edited

Loading