-
Notifications
You must be signed in to change notification settings - Fork 5
Add the --merge option to merge input and output lines #426
Description
When executing SQL queries with JSONL files as input, it can be useful to merge the input and output lines to facilitate analysis and processing of results, especially in one-to-many relationships.
Need: Add a --merge or -M option to the lino query command to enable merging of input and output lines.
Proposed Solution:
--merge Option:
- Add a --merge or -M option to the
lino querycommand to enable merging of input and output lines. - When this option is enabled, each output line will be merged with the corresponding input line.
Example Usage:
lino query -F input.jsonl --binding v1=param1 --merge 'select * from table WHERE column=:v1'
In this example, for each line in the input.jsonl file, the value of param1 is used to execute the SQL query. The query results are merged with the corresponding input line.
Data Processing:
- The command would read each line of the JSONL file.
- For each line, it would extract the value of
param1and execute the SQL query with this value. - If the SQL query returns multiple lines, the input line would be duplicated for each output line.
- Each output line would be merged with the corresponding input line.
Example JSONL File: Suppose the input.jsonl file contains the following lines:
{"param1": "value1", "param2": "valueA"}
{"param1": "value2", "param2": "valueB"}
{"param1": "value3", "param2": "valueC"}
Example Output with --merge: Suppose the SQL query returns the following results for each value of param1:
For param1: value1:
{"result": "result1"}
{"result": "result2"}
For param1: value2:
{"result": "result3"}
For param1: value3:
{"result": "result4"}
{"result": "result5"}
The output with the --merge option might look like this:
{"param1": "value1", "param2": "valueA", "result": "result1"}
{"param1": "value1", "param2": "valueA", "result": "result2"}
{"param1": "value2", "param2": "valueB", "result": "result3"}
{"param1": "value3", "param2": "valueC", "result": "result4"}
{"param1": "value3", "param2": "valueC", "result": "result5"}
Benefits:
- Easy merging of input and output lines, facilitating analysis and processing of results.
- Automatic handling of input line duplication when multiple output lines are generated.
- Increased clarity of results, making them more readable and easier to use for subsequent data processing steps.