Skip to content

Commit e4da802

Browse files
authored
Merge pull request #2 from teaguesterling/feature/parse-functions
Add Function Extraction
2 parents 1d06e5c + 2c9e3d3 commit e4da802

File tree

8 files changed

+818
-10
lines changed

8 files changed

+818
-10
lines changed

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ set(EXTENSION_SOURCES
1313
src/parser_tools_extension.cpp
1414
src/parse_tables.cpp
1515
src/parse_where.cpp
16+
src/parse_functions.cpp
1617
)
1718

1819
build_static_extension(${TARGET_NAME} ${EXTENSION_SOURCES})

README.md

Lines changed: 118 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,23 @@ An experimental DuckDB extension that exposes functionality from DuckDB's native
44

55
## Overview
66

7-
`parser_tools` is a DuckDB extension designed to provide SQL parsing capabilities within the database. It allows you to analyze SQL queries and extract structural information directly in SQL. This extension provides one table function and two scalar functions for parsing SQL and extracting referenced tables: `parse_tables` (table function and scalar function), and `parse_table_names` (see [Functions](#functions) below). Future versions may expose additional aspects of the parsed query structure.
7+
`parser_tools` is a DuckDB extension designed to provide SQL parsing capabilities within the database. It allows you to analyze SQL queries and extract structural information directly in SQL. This extension provides parsing functions for tables, WHERE clauses, and function calls (see [Functions](#functions) below).
88

99
## Features
1010

11-
- Extract table references from a SQL query
12-
- See the **context** in which each table is used (e.g. `FROM`, `JOIN`, etc.)
13-
- Includes **schema**, **table**, and **context** information
11+
- **Extract table references** from a SQL query with context information (e.g. `FROM`, `JOIN`, etc.)
12+
- **Extract function calls** from a SQL query with context information (e.g. `SELECT`, `WHERE`, `HAVING`, etc.)
13+
- **Parse WHERE clauses** to extract conditions and operators
14+
- Support for **window functions**, **nested functions**, and **CTEs**
15+
- Includes **schema**, **name**, and **context** information for all extractions
1416
- Built on DuckDB's native SQL parser
1517
- Simple SQL interface — no external tooling required
1618

1719

1820
## Known Limitations
19-
- Only `SELECT` statements are supported
20-
- Only returns table references (the full parse tree is not exposed)
21+
- Only `SELECT` statements are supported for table and function parsing
22+
- WHERE clause parsing supports additional statement types
23+
- Full parse tree is not exposed (only specific structural elements)
2124

2225
## Installation
2326

@@ -70,21 +73,126 @@ This tells us a few things:
7073
* `EarlyAdopters` was referenced in a from clause (but it's a cte, not a table).
7174

7275
## Context
73-
Context helps give context of where the table was used in the query:
76+
77+
Context helps identify where elements are used in the query.
78+
79+
### Table Context
7480
- `from`: table in the main `FROM` clause
7581
- `join_left`: left side of a `JOIN`
7682
- `join_right`: right side of a `JOIN`
7783
- `cte`: a Common Table Expression being defined
7884
- `from_cte`: usage of a CTE as if it were a table
7985
- `subquery`: table reference inside a subquery
8086

87+
### Function Context
88+
- `select`: function in a `SELECT` clause
89+
- `where`: function in a `WHERE` clause
90+
- `having`: function in a `HAVING` clause
91+
- `order_by`: function in an `ORDER BY` clause
92+
- `group_by`: function in a `GROUP BY` clause
93+
- `nested`: function call nested within another function
94+
8195
## Functions
8296

83-
This extension provides one table function and three scalar functions for parsing SQL and extracting referenced tables.
97+
This extension provides parsing functions for tables, functions, and WHERE clauses. Each category includes both table functions (for detailed results) and scalar functions (for programmatic use).
98+
99+
In general, errors (e.g. Parse Exception) will not be exposed to the user, but instead will result in an empty result. This simplifies batch processing. When validity is needed, [is_parsable](#is_parsablesql_query--scalar-function) can be used.
100+
101+
### Function Parsing Functions
102+
103+
These functions extract function calls from SQL queries, including window functions and nested function calls.
104+
105+
#### `parse_functions(sql_query)` – Table Function
106+
107+
Parses a SQL `SELECT` query and returns all function calls along with their context of use (e.g. `select`, `where`, `having`, `order_by`, etc.).
108+
109+
##### Usage
110+
```sql
111+
SELECT * FROM parse_functions('SELECT upper(name), count(*) FROM users WHERE length(email) > 0;');
112+
```
113+
114+
##### Returns
115+
A table with:
116+
- `function_name`: the name of the function
117+
- `schema`: schema name (default `"main"` if unspecified)
118+
- `context`: where the function appears in the query
119+
120+
##### Example
121+
```sql
122+
SELECT * FROM parse_functions($$
123+
SELECT upper(name), count(*)
124+
FROM users
125+
WHERE length(email) > 0
126+
GROUP BY substr(department, 1, 3)
127+
HAVING sum(salary) > 100000
128+
ORDER BY lower(name)
129+
$$);
130+
```
131+
132+
| function_name | schema | context |
133+
|---------------|--------|------------|
134+
| upper | main | select |
135+
| count_star | main | select |
136+
| length | main | where |
137+
| substr | main | group_by |
138+
| sum | main | having |
139+
| lower | main | order_by |
140+
141+
---
142+
143+
#### `parse_function_names(sql_query)` – Scalar Function
144+
145+
Returns a list of function names (strings) referenced in the SQL query.
146+
147+
##### Usage
148+
```sql
149+
SELECT parse_function_names('SELECT upper(name), lower(email) FROM users;');
150+
----
151+
['upper', 'lower']
152+
```
153+
154+
##### Returns
155+
A list of strings, each being a function name.
156+
157+
##### Example
158+
```sql
159+
SELECT parse_function_names('SELECT rank() OVER (ORDER BY salary) FROM users;');
160+
----
161+
['rank']
162+
```
163+
164+
---
165+
166+
#### `parse_functions(sql_query)` – Scalar Function (Structured)
167+
168+
Similar to the table function, but returns a **list of structs** instead of a result table. Each struct contains:
169+
170+
- `function_name` (VARCHAR)
171+
- `schema` (VARCHAR)
172+
- `context` (VARCHAR)
173+
174+
##### Usage
175+
```sql
176+
SELECT parse_functions('SELECT upper(name), count(*) FROM users;');
177+
----
178+
[{'function_name': upper, 'schema': main, 'context': select}, {'function_name': count_star, 'schema': main, 'context': select}]
179+
```
180+
181+
##### Returns
182+
A list of STRUCTs with function name, schema, and context.
183+
184+
##### Example with filtering
185+
```sql
186+
SELECT list_filter(parse_functions('SELECT upper(name) FROM users WHERE lower(email) LIKE "%@example.com"'), f -> f.context = 'where') AS where_functions;
187+
----
188+
[{'function_name': lower, 'schema': main, 'context': where}]
189+
```
190+
191+
---
84192

85-
In general, errors (e.g. Parse Exception) will not be exposed to the user, but instead will result in an empty result. This simplifies batch processing. When validity is needed, [is_parsable](#is_parsablesql_query--scalar-function) can be used.
193+
### Table Parsing Functions
86194

87-
### `parse_tables(sql_query)` – Table Function
195+
#### `parse_tables(sql_query)` – Table Function
88196

89197
Parses a SQL `SELECT` query and returns all referenced tables along with their context of use (e.g. `from`, `join_left`, `cte`, etc.).
90198

src/include/parse_functions.hpp

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
#pragma once
2+
3+
#include "duckdb.hpp"
4+
#include <string>
5+
#include <vector>
6+
7+
namespace duckdb {
8+
9+
// Forward declarations
10+
class DatabaseInstance;
11+
12+
struct FunctionResult {
13+
std::string function_name;
14+
std::string schema;
15+
std::string context; // The context where this function appears (SELECT, WHERE, etc.)
16+
};
17+
18+
void RegisterParseFunctionsFunction(DatabaseInstance &db);
19+
void RegisterParseFunctionScalarFunction(DatabaseInstance &db);
20+
21+
} // namespace duckdb

0 commit comments

Comments
 (0)