Skip to content

Add the --merge option to merge input and output lines #426

@youen

Description

@youen

When executing SQL queries with JSONL files as input, it can be useful to merge the input and output lines to facilitate analysis and processing of results, especially in one-to-many relationships.

Need: Add a --merge or -M option to the lino query command to enable merging of input and output lines.

Proposed Solution:

--merge Option:

  • Add a --merge or -M option to the lino query command to enable merging of input and output lines.
  • When this option is enabled, each output line will be merged with the corresponding input line.

Example Usage:

lino query -F input.jsonl --binding v1=param1 --merge 'select * from table WHERE column=:v1'

In this example, for each line in the input.jsonl file, the value of param1 is used to execute the SQL query. The query results are merged with the corresponding input line.

Data Processing:

  • The command would read each line of the JSONL file.
  • For each line, it would extract the value of param1 and execute the SQL query with this value.
  • If the SQL query returns multiple lines, the input line would be duplicated for each output line.
  • Each output line would be merged with the corresponding input line.

Example JSONL File: Suppose the input.jsonl file contains the following lines:

{"param1": "value1", "param2": "valueA"}
{"param1": "value2", "param2": "valueB"}
{"param1": "value3", "param2": "valueC"}

Example Output with --merge: Suppose the SQL query returns the following results for each value of param1:

For param1: value1:

{"result": "result1"}
{"result": "result2"}

For param1: value2:

{"result": "result3"}

For param1: value3:

{"result": "result4"}
{"result": "result5"}

The output with the --merge option might look like this:

{"param1": "value1", "param2": "valueA", "result": "result1"}
{"param1": "value1", "param2": "valueA", "result": "result2"}
{"param1": "value2", "param2": "valueB", "result": "result3"}
{"param1": "value3", "param2": "valueC", "result": "result4"}
{"param1": "value3", "param2": "valueC", "result": "result5"}

Benefits:

  • Easy merging of input and output lines, facilitating analysis and processing of results.
  • Automatic handling of input line duplication when multiple output lines are generated.
  • Increased clarity of results, making them more readable and easier to use for subsequent data processing steps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions