-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to create_distributed_table() #7798
Comments
is there a possibility there is a default timeout of 2 hrs for queries by default? |
I'm not aware if Citus specifies such a default statement timeout for distributed queries / operations, so I'll check the code to see if we're doing this somewhere for internal COPY connections. Also wondering are you seeing "canceling statement due to statement timeout", or "canceling statement due to lock timeout", or such messages in your PG logs during COPY, before distributed table creation breaks? |
But all in all, such error messages are not really good;
|
Would you mind sharing the If sharing the exact commands is not possible, some obfuscated version that reveals the column types, column default expressions and indexes etc. would also help a lot. |
I think i've isolated it to a node/os related timeout.. I was able to tmux and docker exec into coordinator node and it ran fine without the timeout when using Here is my create table:
And this is the create distributed table command:
Re: the logs do you mean just copying the docker logs from the coordinator node while the command is running to assess why its timing out? |
I had all the data on coordinator node which wasn't ideal but it all fit somehow.. in future I would prefer to not have to write the data to coordinator node first as the tables will grow to be larger than coordinator node alone can take. Maybe its an issue with my pg_backup script which is coming from vanilla postgres.. |
These are the relevant logs I found on the coordinator node:
|
Yes, such logs made me think of that one of the issues here is related to statement timeout. However, these still don't really look good and look like a separate problem.
I'll look into this a bit more and will try to reproduce the issue. |
Here is the error I see (over 5 times after running for 2 hrs each):
And here are the logs from the server at the same time:
It's about 70GB of data transfer to the worker nodes (saw size increase by that much with "df -h" on worker nodes).. but the operation never successfully completes.
The text was updated successfully, but these errors were encountered: