Fix regex pattern that parses extractedValues#7
Fix regex pattern that parses extractedValues#7sseifert-akamai wants to merge 2 commits intoakamai-contrib:mainfrom
Conversation
ynohat
left a comment
There was a problem hiding this comment.
Hi @sseifert-akamai, thanks a lot again for this contribution. While reviewing the change I just realized it is also implementing some auto-formatting from your IDE probably. This is noisy and prevents accurate identification of the fix, could you please reduce the change to just what is relevant?
| let pat = /^name=([^;]*); value=([^;]*).*$/; | ||
| return getResponseHeaderValues(response, "x-akamai-session-info") | ||
| .reduce(function (vars, value) { | ||
| let pat = /^name=(.*); value=([^\s]*)(;.*)?$/; |
There was a problem hiding this comment.
Regex in this line is assuming that there is at most 1 ";" followed by something that needs to be ignored after the value but
- there can be 2 when variable value is extracted from qsp or from cookie. In this case variable value is not only followed by full_location_id but also by separator. In that case, the part that includes full_location_id is included in the value against which assert function is matching
- there can be 0 when value is not extracted. In this case, if variable value includes some ";", the part of this value after the last ";" matches unepectedly with the last group.
Looking closer at groups in this regex:
- The 1st one
(.*)is capturing but it may not need to. More importantly, it can match with any caracter including those that don't belong to a variable name. It could be more specific. - The 2nd one
([^\s]*)is capturing and this is probably needed. More importantly, it excludes spaces but it shouldn't. - The 3rd one
(;.*)?is capturing but it may not need to. More importantly, it could be more specific.
With this regex proposed instead ^name=([^\s]*); value=(.*)(; full_location_id=[^;]*(; separator=[^;]*)?)?$:
- it is leaving asside capturing or not
- name and value are captured in same group 1 and 2 as before
- group 3 that captured one of full_location_id or separator before is now capturing both if present
- capture group 4 is added and it captures separator only
However, it requires the match to be ungreedy.
Attached screenshots show test results with U flag with PCRE2 for ungridiness, with sample values below:

- name=NOT_EXTRACTED; value= key1=value1; key2=value2
- name=NOT_EXTRACTED; value= key3=value3

3. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key1=value1; key2=value2; full_location_id=cookieName3; separator=%3d
4. name=EXTRACTED_FROM_COOKIE_OR_QSP; value=key3=value3; full_location_id=cookieName3; separator=%3d

5. name=EXTRACTED_FROM_HEADER; value=key1=value1; key2=value2; full_location_id=cookieName3
6. name=EXTRACTED_FROM_HEADER; value=key3=value3; full_location_id=cookieName3
These sample values are meant to cover:
- variable value contains ";" or not
- variable is not extracted
- variable is extracted from cookie (or qsp) or is extracted from header
There was a problem hiding this comment.
If greediness can't be disabled, another regex similar to the one below may be used.
It it's built with an alternative where 1st option captures full_location (and eventually separator) when it's present. 2nd option should match only when full_location and separator are absent.
^name=([^\s]*); value=(.*)(; full_location_id=.*)$|^name=([^\s]*); value=(.*)$
A problem with this approach is that name and value are captured in group 1 and 2 when full_location_id is present
and in groups 4 and 5 otherwise. So this needs to be dealt with if captured groups are actually used.
|
It could be nice to add also tests that cover the different kind of values that may be found in the header:
for example |





This change fixes an issue with extractedValues when the value contains semicolons.