I'm trying to adjust the rope_scaling environment variable. It's mentioned in the documentation as possible, but the assignment in engine_args.py parses the environment variable as a string (obviously). This behavior will also be the same for any other engine argument using dict.
As by default a handler is used to proxy the requests to the serverless vLLM instances, a simple command override with vllm serve is not sufficient in this case.
# Example for Qwen/Qwen3-30B-A3B with extended context window
vllm server --rope-scaling '{"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768}'