Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: api endpoints dealing with blocks are too slow in stg #1382

Closed
altergui opened this issue Sep 6, 2024 · 3 comments
Closed

bug: api endpoints dealing with blocks are too slow in stg #1382

altergui opened this issue Sep 6, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@altergui
Copy link
Contributor

altergui commented Sep 6, 2024

since merging #1329 performance regressed significantly, this was not noticed in dev (with <10k blocks), only in stage (>1M blocks indexed), so it must be due to the table size
could you take a look @mvdan?

-- name: SearchBlocks :many
SELECT
b.*,
COUNT(t.block_index) AS tx_count,
COUNT(*) OVER() AS total_count
FROM blocks AS b
LEFT JOIN transactions AS t
ON b.height = t.block_height
WHERE (
(sqlc.arg(chain_id) = '' OR b.chain_id = sqlc.arg(chain_id))
AND LENGTH(sqlc.arg(hash_substr)) <= 64 -- if passed arg is longer, then just abort the query
AND (
sqlc.arg(hash_substr) = ''
OR (LENGTH(sqlc.arg(hash_substr)) = 64 AND LOWER(HEX(b.hash)) = LOWER(sqlc.arg(hash_substr)))
OR (LENGTH(sqlc.arg(hash_substr)) < 64 AND INSTR(LOWER(HEX(b.hash)), LOWER(sqlc.arg(hash_substr))) > 0)
-- TODO: consider keeping an hash_hex column for faster searches
)
AND (sqlc.arg(proposer_address) = '' OR LOWER(HEX(b.proposer_address)) = LOWER(sqlc.arg(proposer_address)))
)
GROUP BY b.height
ORDER BY b.height DESC
LIMIT sqlc.arg(limit)
OFFSET sqlc.arg(offset);

@altergui altergui added the bug Something isn't working label Sep 6, 2024
@altergui
Copy link
Contributor Author

altergui commented Sep 6, 2024

curl -s "https://api-stg.vocdoni.net/v2/chain/blocks?limit=1" is affected

but fetching one specific block, like curl -s "https://api-stg.vocdoni.net/v2/chain/blocks/1" is not

@mvdan
Copy link
Contributor

mvdan commented Sep 8, 2024

With the backup sql file you provided, I wrote a small benchmark to measure how slow the query is, mimicking your REST query with ?limit=1:

func BenchmarkTODO(b *testing.B) {
	app := vochain.TestBaseApplication(b)

	idx, err := New(app, Options{DataDir: b.TempDir(), ExpectBackupRestore: true})
	qt.Assert(b, err, qt.IsNil)
	err = idx.RestoreBackup("testdata/backup.sql")

	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		blocks, total, err := idx.BlockList(1, 0, "", "", "")
		qt.Assert(b, err, qt.IsNil)
		qt.Assert(b, blocks, qt.HasLen, 1)
		qt.Assert(b, blocks[0].TxCount, qt.Equals, int64(0))
		qt.Assert(b, total, qt.Equals, uint64(1635162))
	}
}

So there are some good and bad news. Let's start with the good news - simply removing the COUNT(*) OVER() AS total_count from the query basically gives you a near-infinite speed-up, from ~1.5s to ~0.02ms on my laptop:

       │       old        │                 new                 │
       │      sec/op      │   sec/op     vs base                │
TODO-8   1476149.90µ ± 4%   26.43µ ± 4%  -100.00% (p=0.002 n=6)

The bad news is that counting over an entire table in sqlite will always be slow for huge tables. Even in the best case scenario, where we just count all rows via SELECT COUNT(*) FROM blocks, I still measure a runtime of about ~7.5ms, or about 300x slower than the 0.02ms for getting a single block as above. This scales with the size of the table - which is why you only noticed the slow-down with staging, which has about 1.6 million indexed blocks.

And the counting over the entire table scales worse with more complex queries like yours. Because the WHERE, the GROUP BY, the LEFT JOIN - they all have to apply for every row being counted. So not only are you scanning the entire table (which already takes about 300x as long), but you're also doing extra work for every row in the entire table, which takes you from milliseconds to seconds, or another 1000x slow-down.

You basically cannot have filtering (or searching) SQL queries which also count the total number of results. Because, for SQL to compute that, it needs to search the entire table. So any LIMIT or OFFSET parameters are not saving any work - the query will still have to churn CPU to search the entire table.

I think the simplest solution for this query is to drop the COUNT(*) OVER(). I understand it's probably wanted for the sake of providing a total number of pages for the pagination in the UI, and this sort of hack is probably OK for queries on processes or entities, where the total size of the table is manageable (in the order of thousands of entries). But for blocks, where we very easily get into the millions, then it's just not feasible.

@altergui
Copy link
Contributor Author

altergui commented Sep 9, 2024

thanks!
fixed in c3cf08a

@altergui altergui closed this as completed Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants