-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1859664: Issues uploading data via PUT to s3 on driver v1.8 or above (including 1.12.1) #1279
Comments
hi - thanks for filing this issue with us. So one likely change which influence the behaviour you're seeing between 1.7.1 and 1.8.0 is #991, where we stopped swallowing the errors we got back from cloud storage :) So no extra error was introduced, we just simply surface the error now, which error was always there - instead of silently ignoring it. I just confirmed by The Snowflake
The second operation is maybe failing in your case. The file itself is likely uploaded (see phase1), just the verification step seems to fail in your case. As of next step, you can
Also instead of fixing the underlying issue on the infrastructure, you can decide to continue ignoring PUT/GET errors by setting Let me know please how this went. |
I can confirm that The I assume all of the hosts are 100% snowflake owned, so there should be nothing on the receiving side blocking this? Based on what you are saying, Logger: would we still get the detailed log output when setting |
But ignoring or surfacing the same errors do not fix those errors. If they were originating from the Snowflake end, I would expect tons of issues reported to us from other users, but there isn't any (besides this). I still would like to re-do my test in the same Snowflake deployment as yours, just to be sure. Can you please share in which Snowflake deployment you're experiencing this problem? |
This is for the *.eu-central-1.snowflakecomputing.com snowflake deployment. The connections are coming from a data center (not AWS), so maybe there is something blocked on the side of the snowflake deployment, but I am very certain that it is not something that is blocked from the data center. I understand that hiding the error doesn't fix it... I want to have a somewhat safe way to get it into a release so that i do not have to roll back 5 minutes after I saw the first issue when this is apparently not a currently critical element. |
Perfect - my tests were run on the exact same deployment:
and worked, so i can confirm it works on the Snowflake side, so next step is probably for you to work together with your network guys and trace down on which hop where the socket gets closed which in turn leads to We can help of course by giving hints (here), I already suggested some tools for it ( Since you mentioned the source is a data center, I would also suggest checking with your network guys if you're perhaps using a S3 Gateway which can contribute to this behaviour if it's not configured to transparently allow all kinds of requests to Snowflake
That is entirely correct and I agree with you. I guess you need to consider: are the files actually uploaded with 1.8.0 / 1.12.1 too to the Snowflake internal stage correctly, the same way as with 1.7.1 ? You can verify it e.g. with downloading the same file with I think that's all I can add to this issue for now. If you perhaps need Snowflake's help on looking into information which you might not want to share here (the driver's logs which can contain sensitive information, the packet capture, etc) you can file an official case with Snowflake Support and we can take it from there. Please do understand however, that Snowflake cannot fix the issue regardless if it's filed on Github or in an official Support ticket, when it is related to non-Snowflake infrastructure. Thank you for your kind understanding. |
No worries, I understand. This information has already been very valuable in understanding the underlying issue better. I am 100% certain that we do not have an S3 proxy in play here, and I also checked if the HEAD requests are generally working from the machines making those requests. That's why I assumed that it is probably either a change in what the driver does or something not working on your end. So from our data center side there is nothing blocking this. Will look into it and provide more details if it turns out that it might be driver related after all (which I doubt based on your information) |
This is interesting what you just wrote:
This is a new information. So; if the HEAD requests are generally working but you have a certain particular flow, which always (or sometimes) fails and can generate the issue - then I would like to test the exact same flow on my setup and see if I can reproduce it locally, from a different network. Do you perhaps have a reproduction setup which you can share, which I can try on my end to see if it reproduces for me? For now, I was testing general functionality with a small file, but apparently general functionality is working for you, too. Without this bit of information I was under the impression the general functionality doesn't work. The error suggested so
meaning, even after 3 attempts, Any information might be helpful. E.g. file must be bigger than X MB to get to the issue, any details. Best of course, would be a repro code snippet, a runnable program, or a repro Github repo shared if that's an option here. Thank you in advance ! |
Hi, regarding this one:
This is perfectly fine. It is the only way to use PUTs with Go driver. You can set this value by:
|
I ran the Sure, I can try to stitch together an example case |
Happy new years! 🚀 So I stitched together an example. It fails when trying to do the OCSP check, so there might be some issue in the network setup to get that information retrieved.
When going back down to v1.7.1 I am not getting any of that, although I have seen some OCSP cache errors some times making things slower - those were issue with DNS resolution which was sometimes a bit glitchy. On v1.12.1 it seems to just get stuck and keep retrying on that forever.
So, assuming that it is OCSP I tried to disable it https://community.snowflake.com/s/article/How-to-turn-off-OCSP-checking-in-Snowflake-client-drivers So I added the In order to ensure that nothing else is the matter, I actually disabled the OCSP checks by disabling it in the vendored gosnowflake code... and it worked. I checked for the domain: during the connection setup 2 1/2 years ago from our data center (routed through some AWS setup on our end), there everything under the corresponding privatelink domain was part of the setup including the OCSP domain. |
hey @tobischo Happy New Year to you too ! I'm currently on leave and will be back ~mid next week to look into your reproduction and this comment, but wanted to still come here quickly to thank you for the effort of putting the reproduction together and sharing the initial observations. Really appreciated ! Some quick remarks without deep analysis:
Indeed the expected behaviour is (when either of the above flags is specified), using a different http transport which has the OCSP related checks disabled altogether, not even trying anything. Will return to this issue next week and again, thank you very much, this is very helpful! |
Spent some time looking into this issue, and again would like to thank you for your help here @tobischo , super insightful; I believe with your help we made a progress in this very elusive issue. Using your reproduction, set up a more simplified version which allowed me to reproduce the problem with every gosnowflake version above 1.7.1 (starting with 1.7.2). This gives us a good estimation where and when the issue started happening, these guys are suspicious where between 1.7.1 and 1.7.2 we started surfacing
Indeed it only seems to affect The driver still tries to do OCSP for the stage - then gets into trouble and a long retry loop when due to user-side misconfiguration, the OCSP Privatelink hostname is not configured. During my reproduction attempts; I sometimes (non-deterministically) got an error like
which error ( There seems to be no other way to avoid it (besides manually patching the Preferred 'workaround' (== should be the case from day 0)
This fully eliminates the delay. Optional workaround, doesn't require DNS configurationThis still introduces ~2 extra minutes delay into ctx := sf.WithFileTransferOptions(context.Background(), &sf.SnowflakeFileTransferOptions{RaisePutGetError: false}) Both work with 1.12.1. We'll work on fixing this error, because with Very big thanks for your assistance in this issue! |
Right now, I am attempting to get the OCSP links setup correctly, but unfortunately I am not managing the DNS myself and I do not want to change the hosts file right now. Adding the additional 2min is also not really desirable either. As long as there are no plans to make changes on the server side that are incompatible with client library v1.7.1, I will stick to 1.7.1 for now and will update once possible through either having the OCSP endpoints accessible correctly or once a fix is available that would allow usage with Thank you for the great response and clear outline of the different options @sfc-gh-dszmolka I assume I will get an update about the fix by just keeping watch on this issue? |
Yes, please follow this issue because I'll update the progress here when there's any. Thank you for bearing with us while this is fixed! |
fix in preparation: #1288 |
fix is merged and will be part of the next release (already can be tested by installing the driver from |
What version of GO driver are you using?
1.12.1
What operating system and processor architecture are you using?
debian Linux, x86
What version of GO are you using?
1.23.3
4.Server version:* E.g. 1.90.1
8.46.1
We have been using the snowflake go driver on v1.7.1 for quite a long time now, as we first encountered issues with v1.8. Back then we downgraded and did not followup on the matter under the assumption that it might just be a client issue and will be fixed with the next versions. Recently we updated to 1.12.1 to verify that this is working and received the the following error on the client:
Failed to upload data to snowflake via PUT: 264003: unexpected error while retrieving header: operation error S3: HeadObject, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Head "https://<s3url>": write tcp <internal IP>-><aws ip>:443: write: broken pipe
Downgrading back to 1.7.1 fixes the issue for now.
What did you expect to see?
v1.12.1 to work the same as v1.7.1
Can you set logging to DEBUG and collect the logs?
Not easily
The text was updated successfully, but these errors were encountered: