Skip to content

feature: Read input from an argv argument instead of stdin using --input flag #3293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
etosan opened this issue Mar 18, 2025 · 8 comments
Open

Comments

@etosan
Copy link

etosan commented Mar 18, 2025

The problem: No way to read input from argument

I am again dealing with lot of jq-ing and I am constantly having this issue:

I want to send data directly into jq using an argv[] argument and not the standard input descriptor. But so far, and please correct me, if I am wrong, this seems to be impossible.

While some more regular/traditional jq users might oppose the idea, it would be extremely powerful feature, that would come handy in many specialty situations: a subshell, parallel, xargs, execline, or for example socat, or in billions of other such specialty cases.

More than like 5 years ago, I requested something like this, and got redirected to --arg and --argjson. While I was thankful and while those are infinitely useful, it's not the same thing! I wasn't jq-ing that much since then, but I am, once again, and inability to do this is literally killing me.

For example: more modern gnu xargs versions have grown support for -o, --open-tty. This makes xargs process able consume data from it's own /dev/stdin, but still, for each record/line handling "execution", xargs will rebind child's ([command], -exec {} ;) /dev/stdin to it's original controlling terminal again. This allows you to process entries from file/pipe as usual, yet each record "handler" can still communicate with user on tty (for example, for password entry). It's nigh impossible to use jq in this setup efficiently without mucking around with subshelling idioms like X="$(echo "${json_data}" | jq -r '.somefield')". This also requires one to spawn sh -c for each xargs "record", to just be able to do subshelling.

For example: if using JSON as "binary safer" (and structured) string processing format, especially in shell scripts, which is very convenient and powerful ability, one often ends up mucking around with VAR="$(echo "${JSON_DATA}" | jq -r '.somefield')" again, just to extract value of .somefield from specific ${JSON_DATA}. Similarly, despite various modern shell optimizations, this can sometimes (and in certain setups) spawn 3 sub-processes: subshell, echo and jq(!) (and also constructs pipline). All just to "lift" single field (or field chain) from input JSON.

For example: in execline language (which is very similar to socat case) one would benefit greatly from ability to access fields from structured input data directly, making these tools much more powerful. But because jq cannot read it's input from it's argument, one has to wrap the "input sending part" into pipeline command, in case of execline, or into sh -c in case of socat, to get access to the fields, again.

In a nutshell, this feature would come incredibly handy in ad-hoc api explorations, and quick one off jobs, which iterate over larger datasets, using any OS level iterators or executors, that fork a child, but user would also benefit from /dev/stdin being left alone or for left open other uses.

While some might argue, that for such jobs one should use something like python, that language is not concise enough to cut through large swaths of data being pumped through command lines and pipelines, especially ad hoc. On the other hand, jq language is sufficiently terse and syntax efficient for exactly that kind of work.

Suggestion of solution

Thus I propose introduction of --input / -i argument, that would take the next string argument as input, make jq consume it verbatim as an input buffer, preferably completely ignoring /dev/stdin handling. Whether --input should exist in the argv[] as singleton, similarly to "jq program" argument, is probably best left to jq maintainers to decide. But to maintain parity with "jq program" argument handling, and to decrease implementation complexity, I suggest singleton approach, ie only and exactly one --input allowed only, ie either jq would read from stdin or from --input arg.

Usage example

This is little bit contrived, but I hope it illustrates a point well, so please bear with me.

Let's say one needs to do some specific ad-hoc action for each container managed by cri-o on a k8s node. With --input I can get .name field for each record directly (as if I was using unix native cut(1)):

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' jq --input '%j' -r '"name:" + .name'

Annotation (careful invalid shell code!):

  # gets JSON data from some data producer
  crictl ps -o json

  # "slice" and massage the dataset for our needs, ie select specific fields
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]'

  # now apply resulting fields as "named columns" in "subcomand"
  # - we can "reference" fields directly from argv
  | xargs -0 -I'%j' jq --input '%j' -r '"name:" + .name' 

Without --input, this needs to be done instead:

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' \
       sh -c "printf 'name:%s\n' \$(echo '%j' | jq -r '.name')"

Observe, that in first case, %j "variable" is just raw string. The "expansion" is handled by xargs implicitly: it searches for literal string '%j' in it's own argvector and then it just copy pastes it into it's subchild argvector: jq --input '%j' -r '"name:" +.name' ie jq subprocess literally becomes:

From:

['jq', '--input', '%j', '-r', '"name": + .name' ]

to

['jq', '--input', '{"name":"kube-proxy","id":"c31ef8zzssddrrtyt"}', '-r', '"name": + .name' ]

after each "line expansion", at the execve level.

When combined -0 this makes such executions very safe, without worry, that in-between shell will somehow mangle them. And we are not even talking about reduction of number of sub-forks, pipes, file descriptor etc.

Because maximum lengths for each argv element are quite big these days, this allows one to do expansions like these:

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' \
       sh -c "cmd-do-something-cmd-somewhere --name \$(jq -i '%j' -r '.name') --id \$(jq -i '%j' -r '.id')"

Where we save one echo and two fds per pipeline each JSON data field derefercing. If we want to be extra explicit:

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' \
       sh -c "exec cmd-do-something-cmd-somewhere --name \$(exec jq -i '%j' -r '.name') --id \$(exec jq -i '%j' -r '.id')"

But without the --input provision, most concise form I got to, is this (by abusing inline shell functions which removes lot of safety):

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'$j' \
       sh -c "j(){ jq -r \$@;};d(){ echo '\$j';}; cmd-do-something-cmd-somewhere --name \$(d|j '.name') --id \$(d|j '.id')"

While this might seem more compact (because of shell "hacks"), it is a lot worse from points of both execution complexity and string safety.

I believe you can infer much more advanced and even nested usage from here, especially if taking into account more complex jq programs (loaded from files) for initial jq "selector" part of the pipeline (the jq --raw-output0 -c '[.contain... part).

I hope what I wrote makes sense, and will make you consider this feature.

@wader
Copy link
Member

wader commented Mar 19, 2025

Not sure i follow, isn't --argjson what your looking for? but in this particular case it feels like you should be able to do something like this without going thru the xargs-hops.

$ echo '{"containers": [{"name": "a"}, {"name": "b"}]}' | jq -r '.containers[] | "name: \(.name)"'
name: a
name: b

and if you want to exec something per output i usually pipe to a shell, something like: (using @sh for escape things)

$ echo '{"containers": [{"name": "a"}, {"name": "b"}]}' | jq -r '.containers[] | @sh "echo some subcommand \(.name)"' | sh
some subcommand a
some subcommand b

and if you want to use --argjson it could be used like this:

$ jq -rn --argjson i '{"name":"kube-proxy","id":"c31ef8zzssddrrtyt"}' '"name: \($i.name)"'
name: kube-proxy

@etosan
Copy link
Author

etosan commented Mar 19, 2025

If I understand --argjson properly, it registers special variable into jq language's interpreter "executor" state. But jq still will also process it's /dev/stdin, if it's available. Also as expected, and as you pointed, without --null / -n, jq will hang in read("/dev/stdin). What I have in mind would be more equal to this but without all the overrides, jq understanding everything implicitly:

jq -n --argjson i '[{"name":"kube-proxy","id":"c31"},{"name":"kube-proxy","id":"c32"}]' '$i|.'

# would become:

jq -i '[{"name":"kube-proxy","id":"c31"},{"name":"kube-proxy","id":"c32"}]' '.' 

# thus something following would be possible:

jq -i '[{"name":"kube-proxy","id":"c31"},{"name":"kube-proxy","id":"c32"}]' --argjson v '{"vname":"vval"}' '.' # <-- you could do something with both "input" and argjson here, and `/dev/stdin` ie FD0 would remain untouched

I also studied @sh, but again, if I understand @sh properly, it just handles shell escaping. Ie it takes string and ensures that output values coming from within jq json objects are properly escaped shell safe. Ie they are safe to pipe into shells as commands, but do not deal with standalone "binary safe" "values".

Maybe the example was not the best one around, but the whole point of this feature request is to be able exactly "to jump through" xargs (or any other such tool) hoops.

Ie to "decouple" jq from shell processing if needed.

@etosan etosan closed this as completed Mar 19, 2025
@wader
Copy link
Member

wader commented Mar 19, 2025

👍 so --argjson i think is like your --input suggestion but it will bind the value to a name, ex you can have multiple:

$ jq -n --argjson a 1 --argjson b 2 '$a + $b'
3

@etosan
Copy link
Author

etosan commented Mar 19, 2025

Sorry, I had a work call and wrongly clicked on something - I think I closed the issue and comment midway through unintentionally @wader, can you fix this please?

@etosan
Copy link
Author

etosan commented Mar 19, 2025

I understand, that unless you are probably heavy chaining executable user (akin to nice(1), chroot(1)), the feature might not make much sense to you, but I assure you there are uses cases.

Concept of chaining is lost on many these days but some introductory text:

I essence in such setups the data often "flows" through argv[] arguments a eviron[] instead of shells. FDs like FD0 (/dev/stdin) are often left untouched. It's kind of input inversion. jq I believe would benefit extremely with this.

Chainers are extremely composable.

While we can emulate some structured data passing using cut(1), in my experience passing (and filtering) through to JSON is much more future/error proof. This would make using jq in such cases much easier.

@etosan etosan reopened this Mar 19, 2025
@etosan
Copy link
Author

etosan commented Mar 19, 2025

Seems I can fix the closure myself.

Okay maybe other example, let's say we have an hypothetical tool toucase, which consumes string and returns all letters UPPERCASE.

There are two classic approaches. Either read data from stdin:

$ echo "lower" | toucase
LOWER

or from argument:

$ toucase "lower"
LOWER

However most useful tool would support both at the same time and detect input either automatically or by hint:

$ echo "lower" | toucase
LOWER
$ toucase "lower"
LOWER

This is similar kind of situation. While --argjson is powerful, and can be coerced into this mode of operation, it is slightly different semantics.

@wader
Copy link
Member

wader commented Mar 19, 2025

I see. I'm not sure if adding yet another way to provide input is good, it's quite confusing as it is :) maybe some other maintainer has opinions?

As the jq language is a superset of JSON you can do something like this if the input is "trusted" JSON jq -n '<JSON> | <query>' or if not trusted maybe something like jq -n '"<escaped JSON>" | fromjson | <query>' ... but i would probably prefer to use --argjson or possibly env or $ENV to pass things.

@etosan
Copy link
Author

etosan commented Mar 19, 2025

Roger! I understand the input confusion issue, could this be made by nudging the documentation perhaps?

Anyway, should I ever find time to look a it, and maybe (just maybe) produce the patch, what are the chances of it getting accepted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants