Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite: allow pragmas and statements needed to run the query optimizer #3483

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

justin-mp
Copy link
Contributor

https://www.sqlite.org/lang_analyze.html recommends a bunch of things one can do to optimize the performance of the query planner. Most folks using Durable Objects will probably be best served by just calling PRAGMA optimize after making schema changes.

A couple of notes about this commit:

  • PRAGMA optimize may call ANALYZE on all tables, including the internal _cf_ tables, so we have to allow ANALYZE even on otherwise disallowed table names.

  • Calls to ANALYZE, either directly or via PRAGMA optimize, cause the creation of the sqlite_stat1 table. This caused some noise in sql-test.js as I had to update tests that depend on (a) the particular set of tables in the database and (b) the particular size of the database.

@justin-mp justin-mp requested review from a team as code owners February 6, 2025 20:11
@justin-mp justin-mp force-pushed the jmp/sqlite-optimize branch from aa67bb3 to 1e99152 Compare February 6, 2025 20:32
Copy link
Member

@kentonv kentonv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should automatically run PRAGMA optimize when DOs shut down? Similar to how we take care of enabling autovacuum automatically, etc.

case SQLITE_ANALYZE: /* Table Name NULL */
KJ_ASSERT(param2 == kj::none);
// We allow all names (including names where isAllowedName() would return false) because a
// SQLite ANALYZE statement with no parameters will analyze all tables, including otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, ANALYZE doesn't return any information to the caller, it just makes notes for future query planning. Is that right? Therefore, it certainly doesn't reveal nor modify the contents of hidden tables, therefore there's no need to restrict it? Might be worth spelling that out in the comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is correct and I've updated the comment.

One non-obvious thing is that the notes that ANALYZE makes are available in a table that users can modify. This means that customers can make the query planner do suboptimal things. It's likely that the worst thing a customer could do is waste CPU, but it is a rather large surface that we open up by allowing ANALYZE.

https://www.sqlite.org/lang_analyze.html recommends a bunch of things
one can do to optimize the performance of the query planner.  Most
folks using Durable Objects will probably be best served by just
calling `PRAGMA optimize` after making schema changes.

A couple of notes about this commit:

* `PRAGMA optimize` may call `ANALYZE` on all tables, including the
  internal `_cf_` tables, so we have to allow `ANALYZE` even on
  otherwise disallowed table names.

* Calls to `ANALYZE`, either directly or via `PRAGMA optimize`, cause
  the creation of the `sqlite_stat1` table.  This caused some noise in
  sql-test.js as I had to update tests that depend on (a) the particular
  set of tables in the database and (b) the particular size of the
  database.
@justin-mp justin-mp force-pushed the jmp/sqlite-optimize branch from 1e99152 to e2afa2f Compare February 7, 2025 20:35
@justin-mp
Copy link
Contributor Author

I wonder if we should automatically run PRAGMA optimize when DOs shut down? Similar to how we take care of enabling autovacuum automatically, etc.

I do think we want either D1 or DO to automatically run PRAGMA optimize and I have an internal ticket filed to do so. This was more of a get something out quick until we did the automatic version.

The pros of doing the automatic version now are (1) that we probably wouldn't need to allowlist PRAGMA optimize and the ANALYZE statement and (2) customers don't have to worry about doing this.

The downsides are: (1) more work to figure out when we can call PRAGMA optimize on shutdown (probably in the SqliteDatabase dtor, as long as we can still write to the DB when the dtor runs) and (2) the fact that the sqlite_stat1 table would appear in all DOs even if they don't use ANALYZE.

I'm happy to drop this in favor of the automatic approach if we think that's worth waiting for compared to getting something out sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants