Skip to content

Add ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW Collections #1491

@fzipi

Description

@fzipi

Summary

Coraza, like ModSecurity v2 and v3, automatically URL-decodes argument values before storing them in ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES. These are "cooked" collections: the values are pre-processed by the engine (or the underlying Go HTTP library) before rules ever see them.

This means applying t:urlDecodeUni to ARGS results in double URL decoding, a silent semantic error that causes both false positives and security detection gaps depending on how the rule was written.

This issue requests the addition of ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW as complementary "raw" collections that hold the original, wire-format argument values before any URL decoding. This is a parallel feature request to the one filed against owasp-modsecurity/ModSecurity and is intentionally coordinated to ensure consistent semantics across engines.


Background

How argument parsing works today

When Coraza processes a request with a URL-encoded query string or application/x-www-form-urlencoded body, Go's net/url package (or the integration's HTTP parsing layer) decodes percent-encoded sequences before Coraza populates its collections. For example:

POST /login HTTP/1.1
Content-Type: application/x-www-form-urlencoded

username=admin&password=Secret%2500

The value stored in ARGS_POST:password is Secret%00, not Secret%2500. The host layer decoded %25%, leaving a bare %00 sequence in the value.

When a rule then applies t:urlDecodeUni, Coraza decodes %00\x00 (null byte). A rule checking for null bytes would fire — even though the user submitted a legitimately encoded percent sign. This is the double URL decoding false positive.

The opposite problem: detection gaps

A sophisticated rule author who is aware of pre-decoding may deliberately omit t:urlDecodeUni from their rules to avoid the double-decoding false positive. This is correct for cooked variables but introduces a detection gap: a single-URL-encoded attack payload (e.g., %3Cscript%3E) is decoded to <script> by the engine before the rule runs, so the rule's pattern matching against <script> still works. But if the rule was written to detect the encoded form specifically — for double-encoding detection, for example — omitting the transformation makes it blind to that vector.

The portability problem

The amount of pre-decoding Coraza performs depends on the integration:

Integration ARGS pre-decoded?
coraza-caddy ✅ Yes (via Go net/http)
coraza-proxy-wasm (Envoy) ✅ Yes
coraza-spoa (HAProxy) ✅ Yes
Direct API use ⚠️ Depends on caller

A rule written to work correctly on one integration may behave differently on another. The ARGS_RAW family of variables would eliminate this ambiguity by always providing the literal wire-format value, regardless of the integration layer.


Proposed New Collections

New Collection Existing Counterpart Semantics
ARGS_GET_RAW ARGS_GET Query string argument values, not URL-decoded
ARGS_POST_RAW ARGS_POST Form body argument values, not URL-decoded
ARGS_RAW ARGS Union of ARGS_GET_RAW and ARGS_POST_RAW
ARGS_NAMES_RAW ARGS_NAMES Argument names, not URL-decoded

Semantic requirements

  1. Wire-format preservation: Values must be stored exactly as they arrive in the HTTP request, before any percent-decoding. The %3C in q=%3Cscript must be stored as the three characters %3C, not as <.
  2. Full collection API compatibility: Support key-based access (ARGS_RAW:fieldname), wildcard exclusions (!ARGS_RAW:safe_field), and regex-based key selectors (ARGS_RAW:/^utm_/), consistent with the existing collections.
  3. Phase availability: Match the availability of their cooked counterparts — ARGS_GET_RAW available in phase 1, ARGS_POST_RAW and ARGS_RAW (POST portion) available in phase 2.
  4. No implicit decoding, ever: The engine must never URL-decode values before populating these collections, regardless of what the integration layer does.

Implementation Notes for Coraza (Go)

Where to intercept

Coraza populates argument collections in the transaction's request body and URI processing. The raw values must be captured before the call to url.QueryUnescape or url.ParseQuery. In practice this means:

  • For ARGS_GET_RAW: parse the raw query string (req.URL.RawQuery) using a percent-preserving parser that splits on & and = without decoding.
  • For ARGS_POST_RAW: read the raw application/x-www-form-urlencoded body bytes before passing them to url.ParseQuery.

Suggested Go implementation sketch

// parseRawFormValues splits a raw application/x-www-form-urlencoded string
// into key/value pairs WITHOUT percent-decoding, preserving the original encoding.
func parseRawFormValues(raw string) [][2]string {
    var pairs [][2]string
    for _, pair := range strings.Split(raw, "&") {
        if pair == "" {
            continue
        }
        k, v, _ := strings.Cut(pair, "=")
        // Do NOT call url.QueryUnescape here — store k and v verbatim.
        pairs = append(pairs, [2]string{k, v})
    }
    return pairs
}

The cooked counterparts (ARGS_GET, ARGS_POST) continue to use url.ParseQuery (which percent-decodes). The raw counterparts use the function above.

Coraza collection registration

The new collections would be registered in coraza/types/variables alongside the existing ones, and populated in the same transaction processing paths that populate their cooked counterparts.


Usage Examples

Detecting URL-encoded XSS without double-decoding risk

# t:urlDecodeUni on ARGS_RAW decodes exactly once — always safe, regardless of integration.
SecRule ARGS_RAW "@rx (?i)%3[cC]script" \
    "id:20001,phase:2,block,t:none,\
    msg:'URL-encoded XSS in raw argument value',\
    logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"

Detecting double URL encoding

# Look for %25xx in the raw value — indicates double encoding.
# This is impossible to write correctly using ARGS because %25 is already decoded to %.
SecRule ARGS_RAW "@rx %25[0-9a-fA-F]{2}" \
    "id:20002,phase:2,block,t:none,\
    msg:'Double URL Encoding Detected',\
    logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"

CRS integration opportunity

The OWASP CRS applies t:urlDecodeUni in many 941 (XSS), 942 (SQLi), and other rule files. With ARGS_RAW available, CRS could update those rules to use ARGS_RAW as the base variable and apply t:urlDecodeUni with well-defined, engine-agnostic semantics. This would resolve a longstanding portability problem in CRS and fix the double-decoding false positive class entirely.


Relationship to @validateUrlEncoding

The @validateUrlEncoding operator currently operates on cooked variables, which means it cannot distinguish between a legitimately encoded percent sign (%25%) and a malformed sequence. With ARGS_RAW, the operator could be applied to the raw value and produce correct results:

# Validate that the raw argument values are properly URL-encoded.
SecRule ARGS_RAW "!@validateUrlEncoding" \
    "id:20003,phase:2,block,t:none,\
    msg:'Invalid URL encoding in argument'"

Backward Compatibility

This is a purely additive change. No existing rules, no existing behavior, and no existing collection semantics are modified. All existing rules using ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES continue to work exactly as before.


Impact on ModSecurity Compatibility

Coraza aims for ModSecurity SecLang compatibility. If ModSecurity v3 implements ARGS_RAW (tracked in owasp-modsecurity/ModSecurity#2118), Coraza should implement the same variables with identical semantics to ensure rules written against one engine work correctly on the other.

The OWASP CRS team is coordinating both feature requests to encourage a consistent, cross-engine implementation.


Related Issues and References


Requested Action

  1. Implement ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW in Coraza with the semantics described above.
  2. Ensure consistency with the ModSecurity v3 implementation once it is merged, so that rules relying on these variables are portable across engines.
  3. Update Coraza documentation and the SecLang compatibility notes to reflect the new collections.
  4. Notify the OWASP CRS team when the feature is available so that CRS rules can be updated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions