-
-
Notifications
You must be signed in to change notification settings - Fork 311
Description
Summary
Coraza, like ModSecurity v2 and v3, automatically URL-decodes argument values before storing them in ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES. These are "cooked" collections: the values are pre-processed by the engine (or the underlying Go HTTP library) before rules ever see them.
This means applying t:urlDecodeUni to ARGS results in double URL decoding, a silent semantic error that causes both false positives and security detection gaps depending on how the rule was written.
This issue requests the addition of ARGS_RAW, ARGS_GET_RAW, ARGS_POST_RAW, and ARGS_NAMES_RAW as complementary "raw" collections that hold the original, wire-format argument values before any URL decoding. This is a parallel feature request to the one filed against owasp-modsecurity/ModSecurity and is intentionally coordinated to ensure consistent semantics across engines.
Background
How argument parsing works today
When Coraza processes a request with a URL-encoded query string or application/x-www-form-urlencoded body, Go's net/url package (or the integration's HTTP parsing layer) decodes percent-encoded sequences before Coraza populates its collections. For example:
POST /login HTTP/1.1
Content-Type: application/x-www-form-urlencoded
username=admin&password=Secret%2500
The value stored in ARGS_POST:password is Secret%00, not Secret%2500. The host layer decoded %25 → %, leaving a bare %00 sequence in the value.
When a rule then applies t:urlDecodeUni, Coraza decodes %00 → \x00 (null byte). A rule checking for null bytes would fire — even though the user submitted a legitimately encoded percent sign. This is the double URL decoding false positive.
The opposite problem: detection gaps
A sophisticated rule author who is aware of pre-decoding may deliberately omit t:urlDecodeUni from their rules to avoid the double-decoding false positive. This is correct for cooked variables but introduces a detection gap: a single-URL-encoded attack payload (e.g., %3Cscript%3E) is decoded to <script> by the engine before the rule runs, so the rule's pattern matching against <script> still works. But if the rule was written to detect the encoded form specifically — for double-encoding detection, for example — omitting the transformation makes it blind to that vector.
The portability problem
The amount of pre-decoding Coraza performs depends on the integration:
| Integration | ARGS pre-decoded? |
|---|---|
| coraza-caddy | ✅ Yes (via Go net/http) |
| coraza-proxy-wasm (Envoy) | ✅ Yes |
| coraza-spoa (HAProxy) | ✅ Yes |
| Direct API use |
A rule written to work correctly on one integration may behave differently on another. The ARGS_RAW family of variables would eliminate this ambiguity by always providing the literal wire-format value, regardless of the integration layer.
Proposed New Collections
| New Collection | Existing Counterpart | Semantics |
|---|---|---|
ARGS_GET_RAW |
ARGS_GET |
Query string argument values, not URL-decoded |
ARGS_POST_RAW |
ARGS_POST |
Form body argument values, not URL-decoded |
ARGS_RAW |
ARGS |
Union of ARGS_GET_RAW and ARGS_POST_RAW |
ARGS_NAMES_RAW |
ARGS_NAMES |
Argument names, not URL-decoded |
Semantic requirements
- Wire-format preservation: Values must be stored exactly as they arrive in the HTTP request, before any percent-decoding. The
%3Cinq=%3Cscriptmust be stored as the three characters%3C, not as<. - Full collection API compatibility: Support key-based access (
ARGS_RAW:fieldname), wildcard exclusions (!ARGS_RAW:safe_field), and regex-based key selectors (ARGS_RAW:/^utm_/), consistent with the existing collections. - Phase availability: Match the availability of their cooked counterparts —
ARGS_GET_RAWavailable in phase 1,ARGS_POST_RAWandARGS_RAW(POST portion) available in phase 2. - No implicit decoding, ever: The engine must never URL-decode values before populating these collections, regardless of what the integration layer does.
Implementation Notes for Coraza (Go)
Where to intercept
Coraza populates argument collections in the transaction's request body and URI processing. The raw values must be captured before the call to url.QueryUnescape or url.ParseQuery. In practice this means:
- For
ARGS_GET_RAW: parse the raw query string (req.URL.RawQuery) using a percent-preserving parser that splits on&and=without decoding. - For
ARGS_POST_RAW: read the rawapplication/x-www-form-urlencodedbody bytes before passing them tourl.ParseQuery.
Suggested Go implementation sketch
// parseRawFormValues splits a raw application/x-www-form-urlencoded string
// into key/value pairs WITHOUT percent-decoding, preserving the original encoding.
func parseRawFormValues(raw string) [][2]string {
var pairs [][2]string
for _, pair := range strings.Split(raw, "&") {
if pair == "" {
continue
}
k, v, _ := strings.Cut(pair, "=")
// Do NOT call url.QueryUnescape here — store k and v verbatim.
pairs = append(pairs, [2]string{k, v})
}
return pairs
}The cooked counterparts (ARGS_GET, ARGS_POST) continue to use url.ParseQuery (which percent-decodes). The raw counterparts use the function above.
Coraza collection registration
The new collections would be registered in coraza/types/variables alongside the existing ones, and populated in the same transaction processing paths that populate their cooked counterparts.
Usage Examples
Detecting URL-encoded XSS without double-decoding risk
# t:urlDecodeUni on ARGS_RAW decodes exactly once — always safe, regardless of integration.
SecRule ARGS_RAW "@rx (?i)%3[cC]script" \
"id:20001,phase:2,block,t:none,\
msg:'URL-encoded XSS in raw argument value',\
logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"Detecting double URL encoding
# Look for %25xx in the raw value — indicates double encoding.
# This is impossible to write correctly using ARGS because %25 is already decoded to %.
SecRule ARGS_RAW "@rx %25[0-9a-fA-F]{2}" \
"id:20002,phase:2,block,t:none,\
msg:'Double URL Encoding Detected',\
logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}'"CRS integration opportunity
The OWASP CRS applies t:urlDecodeUni in many 941 (XSS), 942 (SQLi), and other rule files. With ARGS_RAW available, CRS could update those rules to use ARGS_RAW as the base variable and apply t:urlDecodeUni with well-defined, engine-agnostic semantics. This would resolve a longstanding portability problem in CRS and fix the double-decoding false positive class entirely.
Relationship to @validateUrlEncoding
The @validateUrlEncoding operator currently operates on cooked variables, which means it cannot distinguish between a legitimately encoded percent sign (%25 → %) and a malformed sequence. With ARGS_RAW, the operator could be applied to the raw value and produce correct results:
# Validate that the raw argument values are properly URL-encoded.
SecRule ARGS_RAW "!@validateUrlEncoding" \
"id:20003,phase:2,block,t:none,\
msg:'Invalid URL encoding in argument'"Backward Compatibility
This is a purely additive change. No existing rules, no existing behavior, and no existing collection semantics are modified. All existing rules using ARGS, ARGS_GET, ARGS_POST, and ARGS_NAMES continue to work exactly as before.
Impact on ModSecurity Compatibility
Coraza aims for ModSecurity SecLang compatibility. If ModSecurity v3 implements ARGS_RAW (tracked in owasp-modsecurity/ModSecurity#2118), Coraza should implement the same variables with identical semantics to ensure rules written against one engine work correctly on the other.
The OWASP CRS team is coordinating both feature requests to encourage a consistent, cross-engine implementation.
Related Issues and References
- ModSecurity issue #2118 — Original 2019 report, CRS team proposal, and airween's v2 prototype patch:
'urlDecode|urlDecodeUni' transformations replaces the decoded strings owasp-modsecurity/ModSecurity#2118 - airween's v2 prototype patch:
https://github.com/SpiderLabs/ModSecurity/compare/v2/master...airween:v2/args_raw?expand=1 - CRS issue chore: changes CI to use main branch instead. #807 — Concrete false positive from double URL decoding of a user password:
Using a URL Encoded Percent sign, followed by hex digits other than 20-7e produces a false positive SpiderLabs/owasp-modsecurity-crs#807 - CRS PR chore(deps): bump golang.org/x/net from 0.1.0 to 0.5.0 #578 — Historical discussion of
urlDecodeUnion ARGS collections:
Add urlDecodeUni() operation to ARG/ARGS_NAMES SpiderLabs/owasp-modsecurity-crs#578 - ModSecurity parallel feature request: Feature Request: Add
ARGS_RAW,ARGS_GET_RAW,ARGS_POST_RAW, andARGS_NAMES_RAWVariables owasp-modsecurity/ModSecurity#3501
Requested Action
- Implement
ARGS_RAW,ARGS_GET_RAW,ARGS_POST_RAW, andARGS_NAMES_RAWin Coraza with the semantics described above. - Ensure consistency with the ModSecurity v3 implementation once it is merged, so that rules relying on these variables are portable across engines.
- Update Coraza documentation and the SecLang compatibility notes to reflect the new collections.
- Notify the OWASP CRS team when the feature is available so that CRS rules can be updated.