Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API download function seems unreliable :( #375

Open
justinclift opened this issue Jun 18, 2024 · 3 comments
Open

API download function seems unreliable :( #375

justinclift opened this issue Jun 18, 2024 · 3 comments
Labels

Comments

@justinclift
Copy link
Member

As part of post-migration testing of things in the new data centre, I'm running some sanity tests to verify things are working ok.

However, it appears the API download() function doesn't work properly with larger sized database files:

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National Statistics Postcode Lookup UK.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 43  771M   43  337M  100   442  33.3M     43  0:00:23  0:00:10  0:00:13 34.1M
curl: (18) transfer closed with 454725632 bytes remaining to read

That's happening multiple times in a row with databases of non-trivial size.

The backend isn't showing the problem with any useful detail either:

Jun 18 01:53:16 api systemd[1]: Started dbhub-api.service - DBHub.io API daemon.
Jun 18 01:53:16 api api[535]: 2024/06/18 01:53:16 API server: request log opened: /var/log/dbhub/api.log
[...]
Jun 18 01:54:02 api api[535]: 2024/06/18 01:54:02 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:41960: i/o timeout
Jun 18 01:54:19 api api[535]: 2024/06/18 01:54:19 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:37868: i/o timeout
Jun 18 01:55:08 api api[535]: 2024/06/18 01:55:08 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:54474: i/o timeout
Jun 18 01:56:16 api api[535]: 2024/06/18 01:56:16 Error returning DB file: write tcp 103.216.220.135:443->203.12.2.188:60412: i/o timeout

It's happening on both the old and new API servers(!), and none of MinIO, Memcached, nor PostgreSQL are displaying any kind of error either.

After the migration is completed we'd better take a look at this.

@justinclift
Copy link
Member Author

Note that it doesn't appear to be a lack of resources, at least not cpu (plentiful), disk i/o (plentiful), nor memory. I tripled the ram given to the API VM under test (from 8GB to 24GB) and the failure happens in exactly the same spot.

@justinclift
Copy link
Member Author

Live databases are just as bad, timing out after about 20 seconds:

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National_Statistics_Postcode_Lookup_UK-August_2015.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   454    0     0  100   454      0     11  0:00:41  0:00:40  0:00:01     0
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.11: error:0A000119:SSL routines::decryption failed or bad record mac, errno 0

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="National_Statistics_Postcode_Lookup_UK-August_2015.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   454    0     0  100   454      0     13  0:00:34  0:00:33  0:00:01     0
curl: (56) OpenSSL SSL_read: OpenSSL/3.0.11: error:0A000119:SSL routines::decryption failed or bad record mac, errno 0

Small databases work fine though:

$ curl -F apikey="MY_API_KEY" -F dbowner="justinclift" -F dbname="Join Testing-live.sqlite" -o stuff.db https://api.dbhub.io/v1/download
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16805  100 16384  100   421  24693    634 --:--:-- --:--:-- --:--:-- 25308

$ ls -la stuff.db 
-rw-r--r-- 1 jc jc 16384 Jun 18 12:20 stuff.db

$ file stuff.db 
stuff.db: SQLite 3.x database, last written using SQLite version 3046000, file counter 1, database pages 4, cookie 0x15, schema 4, UTF-8, version-valid-for 1

@justinclift
Copy link
Member Author

justinclift commented Jul 10, 2024

As a data point, I just changed the version of Go being used for this from 1.19 provided by Debian 12 packages, to the latest official Golang release... and the problem still remains.

Looks like it's an actual problem with either our code or one of the components in use.

The backend server daemon itself shows this at the time of the broken download connection:

Jul 10 09:08:20 api api[630]: 2024/07/10 09:08:20 Error returning DB file: write tcp 103.216.220.135:443->[snipped]:39018: i/o timeout

It does that once per failed download attempt, so 3 attempts gave:

Jul 10 09:03:06 api api[630]: 2024/07/10 09:03:06 Error returning DB file: write tcp 103.216.220.135:443->[snipped]:33600: i/o timeout
Jul 10 09:03:46 api api[630]: 2024/07/10 09:03:46 Error returning DB file: write tcp 103.216.220.135:443->[snipped]:56536: i/o timeout
Jul 10 09:08:20 api api[630]: 2024/07/10 09:08:20 Error returning DB file: write tcp 103.216.220.135:443->[snipped]:39018: i/o timeout

Next, I'll see if the problem shows up when running from my local development workstation.

Probably not today though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant