Skip to content

Commit

Permalink
documentation, cleaner prompts
Browse files Browse the repository at this point in the history
  • Loading branch information
chrisclark committed Aug 27, 2024
1 parent 6586b63 commit 2ed69f6
Show file tree
Hide file tree
Showing 5 changed files with 85 additions and 44 deletions.
41 changes: 34 additions & 7 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,40 @@ This project adheres to `Semantic Versioning <https://semver.org/>`_.

vNext
===========================
* `#660`_: Userspace connection migration. This should be an invisible change, but represents a significant refactor of how connections function.
Instead of a weird blend of DatabaseConnection models and underlying Django models (which were the original Explorer connections),
this migrates all connections to DatabaseConnection models and implements proper foreign keys to them on the Query and QueryLog models.
A data migration creates new DatabaseConnection models based on the configured settings.EXPLORER_CONNECTIONS.
Going forward, admins can create new Django-backed DatabaseConnection models by registering the connection in EXPLORER_CONNECTIONS, and then creating a
DatabaseConnection model using the Django admin or the user-facing /connections/new/ form, and entering the Django DB alias and setting the connection type to "Django Connection"

* Keyboard shortcut for formatting the SQL in the editor.

- Cmd+Shift+F (Windows: Ctrl+Shift+F)
- The format button has been moved tobe a small icon towards the bottom-right of the SQL editor.

* `#664`_: Improvements to the AI SQL Assistant:

- Table Annotations: Write persistent table annotations with descriptive information that will get injected into the
prompt for the assistant. For example, if a table is commonly joined to another table through a non-obvious foreign
key, you can tell the assistant about it in plain english, as an annotation to that table. Every time that table is
deemed 'relevant' to an assistant request, that annotation will be included alongside the schema and sample data.
- Few-Shot Examples: Using the small checkbox on the bottom-right of any saved queries, you can designate certain
queries as 'few shot examples". When making an assistant request, any designated few-shot examples that reference
the same tables as your assistant request will get included as 'reference sql' in the prompt for the LLM.
- Autocomplete / multiselect when selecting tables info to send to the SQL Assistant. Much easier and more keyboard
focused.
- Relevant tables are added client-side visually, in real time, based on what's in the SQL editor. The dependency on
sql_metadata is therefore removed, as server-side SQL parsing is no longer necessary
- Improved system prompt that emphasizes the particular SQL dialect being used.
- Addresses issue #657.

* `#660`_: Userspace connection migration.

- This should be an invisible change, but represents a significant refactor of how connections function. Instead of a
weird blend of DatabaseConnection models and underlying Django models (which were the original Explorer
connections), this migrates all connections to DatabaseConnection models and implements proper foreign keys to them
on the Query and QueryLog models. A data migration creates new DatabaseConnection models based on the configured
settings.EXPLORER_CONNECTIONS. Going forward, admins can create new Django-backed DatabaseConnection models by
registering the connection in EXPLORER_CONNECTIONS, and then creating a DatabaseConnection model using the Django
admin or the user-facing /connections/new/ form, and entering the Django DB alias and setting the connection type
to "Django Connection".
- The Query.connection and QueryLog.connection fields are deprecated and will be removed in a future release. They
are kept around in this release in case there is an unforeseen issue with the migration. Preserving the fields for
now ensures there is no data loss in the event that a rollback to an earlier version is required.

`5.2.0`_ (2024-08-19)
===========================
Expand Down
17 changes: 14 additions & 3 deletions docs/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,19 @@ SQL Assistant
-------------
- Built in integration with OpenAI (or the LLM of your choosing)
to quickly get help with your query, with relevant schema
automatically injected into the prompt. Simple, effective.
automatically injected into the prompt.
- The assistant tries hard to get relevant context into the prompt to the LLM, alongside your explicit request. You
can choose tables to include explicitly (and any tables you are reference in your SQL you will see get included as
well). When a table is "included", the prompt will include the schema of the table, 3 sample rows, any Table
Annotations you have added, and any designated "few shot examples". More on each of those below.
- Table Annotations: Write persistent table annotations with descriptive information that will get injected into the
prompt for the assistant. For example, if a table is commonly joined to another table through a non-obvious foreign
key, you can tell the assistant about it in plain english, as an annotation to that table. Every time that table is
deemed 'relevant' to an assistant request, that annotation will be included alongside the schema and sample data.
- Few-shot examples: Using the small checkbox on the bottom-right of any saved query, you can designate queries as
"Assistant Examples". When making an assistant request, the 'included tables' are intersected with tables referenced
by designated Example queries, and those queries are injected into the prompt, and the LLM is told that that these
are good reference queries.

Database Support
----------------
Expand Down Expand Up @@ -222,8 +234,7 @@ Power tips
view.
- Command+Enter and Ctrl+Enter will execute a query when typing in
the SQL editor area.
- Hit the "Format" button to format and clean up your SQL (this is
non-validating -- just formatting).
- Cmd+Shift+F (Windows: Ctrl+Shift+F) to format the SQL in the editor.
- Use the Query Logs feature to share one-time queries that aren't
worth creating a persistent query for. Just run your SQL in the
playground, then navigate to ``/logs`` and share the link
Expand Down
45 changes: 25 additions & 20 deletions explorer/assistant/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def table_schema(db_connection, table_name):
schema = schema_info(db_connection)
s = [table for table in schema if table[0] == table_name]
if len(s):
return s[0]
return s[0][1]


def sample_rows_from_table(connection, table_name):
Expand Down Expand Up @@ -72,8 +72,8 @@ def sample_rows_from_table(connection, table_name):
new_val = field
if isinstance(field, str) and len(field) > MAX_FIELD_SAMPLE_SIZE:
new_val = field[:MAX_FIELD_SAMPLE_SIZE] + "..." # Truncate and add ellipsis
elif isinstance(field, (bytes, bytearray)) and len(field) > MAX_FIELD_SAMPLE_SIZE:
new_val = field[:MAX_FIELD_SAMPLE_SIZE] + b"..." # Truncate binary data
elif isinstance(field, (bytes, bytearray)):
new_val = "<binary_data>"
processed_row.append(new_val)
ret.append(processed_row)

Expand All @@ -96,8 +96,7 @@ def format_rows_from_table(rows):

def build_system_prompt(flavor):
bsp = ExplorerValue.objects.get_item(ExplorerValue.ASSISTANT_SYSTEM_PROMPT).value
bsp += f"""\n\nYou are an expert at writing SQL, specifically for {flavor}, and account for the nuances
of this dialect of SQL. You always respond with valid {flavor} SQL."""
bsp += f"\nYou are an expert at writing SQL, specifically for {flavor}, and account for the nuances of this dialect of SQL. You always respond with valid {flavor} SQL." # noqa
return bsp


Expand Down Expand Up @@ -125,17 +124,30 @@ def get_relevant_few_shots(db_connection, included_tables):
).filter(query_conditions)


def get_few_shot_chunk(db_connection, included_tables):
included_tables = [t.lower() for t in included_tables]
few_shot_examples = get_relevant_few_shots(db_connection, included_tables)
if few_shot_examples:
return "## Relevant example queries, written by expert SQL analysts ##\n" + "\n\n".join(
[f"Description: {fs.title} - {fs.description}\nSQL:\n{fs.sql}"
for fs in few_shot_examples.all()]
)


@dataclass
class TablePromptData:
name: str
schema: str
schema: list
sample: list
annotation: TableDescription

def render(self):
fmt_schema = "\n".join([str(field) for field in self.schema])
ret = f"""## Information for Table '{self.name}' ##
Schema:\n{self.schema}
Sample rows:\n{format_rows_from_table(self.sample)}"""
Schema:\n{fmt_schema}
Sample rows:\n{format_rows_from_table(self.sample)}"""
if self.annotation:
ret += f"\nUsage Notes:\n{self.annotation.description}"
return ret
Expand All @@ -144,8 +156,8 @@ def render(self):
def build_prompt(db_connection, assistant_request, included_tables, query_error=None, sql=None):
included_tables = [t.lower() for t in included_tables]

error_chunk = f"## Query Error ##\n{query_error}" if query_error else ""
sql_chunk = f"## Existing User-Written SQL ##\n{sql}" if sql else ""
error_chunk = f"## Query Error ##\n{query_error}" if query_error else None
sql_chunk = f"## Existing User-Written SQL ##\n{sql}" if sql else None
request_chunk = f"## User's Request to Assistant ##\n{assistant_request}"
table_chunks = [
TablePromptData(
Expand All @@ -156,19 +168,12 @@ def build_prompt(db_connection, assistant_request, included_tables, query_error=
).render()
for t in included_tables
]
few_shot_chunk = get_few_shot_chunk(db_connection, included_tables)

few_shot_examples = get_relevant_few_shots(db_connection, included_tables)
if few_shot_examples:
few_shot_chunk = "## Relevant example queries, written by expert SQL analysts ##\n" + "\n\n".join(
[f"""Description: {fs.title} - {fs.description}
SQL:\n{fs.sql}"""
for fs in few_shot_examples.all()]
)
else:
few_shot_chunk = ""
chunks = [error_chunk, sql_chunk, *table_chunks, few_shot_chunk, request_chunk]

prompt = {
"system": build_system_prompt(db_connection.as_django_connection().vendor),
"user": "\n\n".join([error_chunk, sql_chunk, *table_chunks, few_shot_chunk, request_chunk]),
"user": "\n\n".join([c for c in chunks if c]),
}
return prompt
2 changes: 1 addition & 1 deletion explorer/src/js/uploads.js
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ export function setupUploads() {
}

let xhr = new XMLHttpRequest();
xhr.open('POST', `${window.baseUrlPath}upload/`, true);
xhr.open('POST', `${window.baseUrlPath}connections/upload/`, true);
xhr.setRequestHeader('X-CSRFToken', getCsrfToken());

xhr.upload.onprogress = function(event) {
Expand Down
24 changes: 11 additions & 13 deletions explorer/tests/test_assistant.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,19 @@ def setUp(self):
}

@patch("explorer.assistant.utils.openai_client")
@patch("explorer.assistant.utils.num_tokens_from_string")
def test_do_modify_query(self, mocked_num_tokens, mocked_openai_client):
def test_do_modify_query(self, mocked_openai_client):
from explorer.assistant.views import run_assistant

# create.return_value should match: resp.choices[0].message
mocked_openai_client.return_value.chat.completions.create.return_value = Mock(
choices=[Mock(message=Mock(content="smart computer"))])
mocked_num_tokens.return_value = 100
resp = run_assistant(self.request_data, None)
self.assertEqual(resp, "smart computer")

@patch("explorer.assistant.utils.openai_client")
@patch("explorer.assistant.utils.num_tokens_from_string")
def test_assistant_help(self, mocked_num_tokens, mocked_openai_client):
def test_assistant_help(self, mocked_openai_client):
mocked_openai_client.return_value.chat.completions.create.return_value = Mock(
choices=[Mock(message=Mock(content="smart computer"))])
mocked_num_tokens.return_value = 100
resp = self.client.post(reverse("assistant"),
data=json.dumps(self.request_data),
content_type="application/json")
Expand All @@ -73,8 +69,9 @@ def test_build_prompt_with_vendor_only(self, mock_get_item):
self.assertIn("sqlite", result["system"])

@patch("explorer.assistant.utils.sample_rows_from_table", return_value="sample data")
@patch("explorer.assistant.utils.table_schema", return_value=[])
@patch("explorer.models.ExplorerValue.objects.get_item")
def test_build_prompt_with_sql_and_annotation(self, mock_get_item, mock_sample_rows):
def test_build_prompt_with_sql_and_annotation(self, mock_get_item, mock_table_schema, mock_sample_rows):
mock_get_item.return_value.value = "system prompt"

included_tables = ["foo"]
Expand All @@ -86,8 +83,9 @@ def test_build_prompt_with_sql_and_annotation(self, mock_get_item, mock_sample_r
self.assertIn("Usage Notes:\nannotated", result["user"])

@patch("explorer.assistant.utils.sample_rows_from_table", return_value="sample data")
@patch("explorer.assistant.utils.table_schema", return_value=[])
@patch("explorer.models.ExplorerValue.objects.get_item")
def test_build_prompt_with_few_shot(self, mock_get_item, mock_sample_rows):
def test_build_prompt_with_few_shot(self, mock_get_item, mock_table_schema, mock_sample_rows):
mock_get_item.return_value.value = "system prompt"

included_tables = ["magic"]
Expand Down Expand Up @@ -154,7 +152,7 @@ def test_truncates_long_strings(self):
self.assertEqual(row[0], "a" * 200 + "...")
self.assertEqual(row[1], "short string")

def test_truncates_long_binary_data(self):
def test_binary_data(self):
long_binary = b"a" * 600

# Mock database connection and cursor
Expand All @@ -169,8 +167,8 @@ def test_truncates_long_binary_data(self):
header, row = ret

self.assertEqual(header, ["col1", "col2"])
self.assertEqual(row[0], b"a" * 200 + b"...")
self.assertEqual(row[1], b"short binary")
self.assertEqual(row[0], "<binary_data>")
self.assertEqual(row[1], "<binary_data>")

def test_handles_various_data_types(self):
# Mock database connection and cursor
Expand Down Expand Up @@ -212,7 +210,7 @@ def test_format_rows_from_table(self):
def test_schema_info_from_table_names(self):
from explorer.assistant.utils import table_schema
ret = table_schema(default_db_connection(), "explorer_query")
expected = ("explorer_query", [
expected = [
("id", "AutoField"),
("title", "CharField"),
("sql", "TextField"),
Expand All @@ -223,7 +221,7 @@ def test_schema_info_from_table_names(self):
("snapshot", "BooleanField"),
("connection", "CharField"),
("database_connection_id", "IntegerField"),
("few_shot", "BooleanField")])
("few_shot", "BooleanField")]
self.assertEqual(ret, expected)


Expand Down

0 comments on commit 2ed69f6

Please sign in to comment.