bug: api endpoints dealing with blocks are too slow in stg #1382

altergui · 2024-09-06T16:15:37Z

since merging #1329 performance regressed significantly, this was not noticed in dev (with <10k blocks), only in stage (>1M blocks indexed), so it must be due to the table size
could you take a look @mvdan?

vocdoni-node/vochain/indexer/queries/blocks.sql

Lines 24 to 46 in b34b5bb

    
           -- name: SearchBlocks :many 
        
           SELECT 
        
               b.*, 
        
               COUNT(t.block_index) AS tx_count, 
        
               COUNT(*) OVER() AS total_count 
        
           FROM blocks AS b 
        
           LEFT JOIN transactions AS t 
        
               ON b.height = t.block_height 
        
           WHERE ( 
        
               (sqlc.arg(chain_id) = '' OR b.chain_id = sqlc.arg(chain_id)) 
        
               AND LENGTH(sqlc.arg(hash_substr)) <= 64 -- if passed arg is longer, then just abort the query 
        
               AND ( 
        
                   sqlc.arg(hash_substr) = '' 
        
                   OR (LENGTH(sqlc.arg(hash_substr)) = 64 AND LOWER(HEX(b.hash)) = LOWER(sqlc.arg(hash_substr))) 
        
                   OR (LENGTH(sqlc.arg(hash_substr)) < 64 AND INSTR(LOWER(HEX(b.hash)), LOWER(sqlc.arg(hash_substr))) > 0) 
        
                   -- TODO: consider keeping an hash_hex column for faster searches 
        
               ) 
        
               AND (sqlc.arg(proposer_address) = '' OR LOWER(HEX(b.proposer_address)) = LOWER(sqlc.arg(proposer_address))) 
        
           ) 
        
           GROUP BY b.height 
        
           ORDER BY b.height DESC 
        
           LIMIT sqlc.arg(limit) 
        
           OFFSET sqlc.arg(offset);

altergui · 2024-09-06T16:20:11Z

curl -s "https://api-stg.vocdoni.net/v2/chain/blocks?limit=1" is affected

but fetching one specific block, like curl -s "https://api-stg.vocdoni.net/v2/chain/blocks/1" is not

mvdan · 2024-09-08T22:07:07Z

With the backup sql file you provided, I wrote a small benchmark to measure how slow the query is, mimicking your REST query with ?limit=1:

func BenchmarkTODO(b *testing.B) {
	app := vochain.TestBaseApplication(b)

	idx, err := New(app, Options{DataDir: b.TempDir(), ExpectBackupRestore: true})
	qt.Assert(b, err, qt.IsNil)
	err = idx.RestoreBackup("testdata/backup.sql")

	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		blocks, total, err := idx.BlockList(1, 0, "", "", "")
		qt.Assert(b, err, qt.IsNil)
		qt.Assert(b, blocks, qt.HasLen, 1)
		qt.Assert(b, blocks[0].TxCount, qt.Equals, int64(0))
		qt.Assert(b, total, qt.Equals, uint64(1635162))
	}
}

So there are some good and bad news. Let's start with the good news - simply removing the COUNT(*) OVER() AS total_count from the query basically gives you a near-infinite speed-up, from ~1.5s to ~0.02ms on my laptop:

       │       old        │                 new                 │
       │      sec/op      │   sec/op     vs base                │
TODO-8   1476149.90µ ± 4%   26.43µ ± 4%  -100.00% (p=0.002 n=6)

The bad news is that counting over an entire table in sqlite will always be slow for huge tables. Even in the best case scenario, where we just count all rows via SELECT COUNT(*) FROM blocks, I still measure a runtime of about ~7.5ms, or about 300x slower than the 0.02ms for getting a single block as above. This scales with the size of the table - which is why you only noticed the slow-down with staging, which has about 1.6 million indexed blocks.

And the counting over the entire table scales worse with more complex queries like yours. Because the WHERE, the GROUP BY, the LEFT JOIN - they all have to apply for every row being counted. So not only are you scanning the entire table (which already takes about 300x as long), but you're also doing extra work for every row in the entire table, which takes you from milliseconds to seconds, or another 1000x slow-down.

You basically cannot have filtering (or searching) SQL queries which also count the total number of results. Because, for SQL to compute that, it needs to search the entire table. So any LIMIT or OFFSET parameters are not saving any work - the query will still have to churn CPU to search the entire table.

I think the simplest solution for this query is to drop the COUNT(*) OVER(). I understand it's probably wanted for the sake of providing a total number of pages for the pagination in the UI, and this sort of hack is probably OK for queries on processes or entities, where the total size of the table is manageable (in the order of thousands of entries). But for blocks, where we very easily get into the millions, then it's just not feasible.

altergui · 2024-09-09T11:22:23Z

thanks!
fixed in c3cf08a

altergui added the bug Something isn't working label Sep 6, 2024

altergui assigned mvdan Sep 6, 2024

altergui closed this as completed Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: api endpoints dealing with blocks are too slow in stg #1382

bug: api endpoints dealing with blocks are too slow in stg #1382

altergui commented Sep 6, 2024

altergui commented Sep 6, 2024

mvdan commented Sep 8, 2024

altergui commented Sep 9, 2024

bug: api endpoints dealing with blocks are too slow in stg #1382

bug: api endpoints dealing with blocks are too slow in stg #1382

Comments

altergui commented Sep 6, 2024

altergui commented Sep 6, 2024

mvdan commented Sep 8, 2024

altergui commented Sep 9, 2024