binary/hex integer literals with separators lead to parse error #75

bloerwald · 2022-09-26T10:18:32Z

Integer literals are allowed to contain ' separators these days as per [lex.icon]. The implementation handles them correctly for decimal/octal literals, but chokes on hex/binary ones, if the number of ' is odd:

# for n in 0 "0'0"; do for p in 0b 0x 0 ''; do echo "int i = ${p}0'${n};" > test.cpp; cat test.cpp; flawfinder test.cpp  |& head -n4 | tail -n1; done; done
int i = 0b0'0;
Error: File ended while in string.
int i = 0x0'0;
Error: File ended while in string.
int i = 00'0;

int i = 0'0;

int i = 0b0'0'0;

int i = 0x0'0'0;

int i = 00'0'0;

int i = 0'0'0;

# flawfinder --version
2.0.19

It appears that

flawfinder/flawfinder.py

Line 1902 in 614801f

elif p_digits.match(c):

needs to also handle the prefixes.

The text was updated successfully, but these errors were encountered:

david-a-wheeler · 2022-09-26T14:27:35Z

Makes sense. Have a proposed code change?

bloerwald · 2022-09-27T21:24:54Z

From 0a246afa77ca2cbc2f55fc3221f2c857fb871a3c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bernd=20L=C3=B6rwald?= <[email protected]>
Date: Tue, 27 Sep 2022 23:22:14 +0200
Subject: [PATCH] fix: correctly parse integer constants

---
 flawfinder.py | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/flawfinder.py b/flawfinder.py
index 1e7eb04..ad59bd4 100755
--- a/flawfinder.py
+++ b/flawfinder.py
@@ -1697,6 +1697,9 @@ numberset = string.hexdigits + "_x.Ee"
 p_whitespace = re.compile(r'[ \t\v\f]+')
 p_include = re.compile(r'#\s*include\s+(<.*?>|".*?")')
 p_digits = re.compile(r'[0-9]')
+p_digits_binary = re.compile(r'[01]')
+p_digits_octal = re.compile(r'[0-7]')
+p_digits_hex = re.compile(r'[0-9a-fA-F]')
 p_alphaunder = re.compile(r'[A-Za-z_]')  # Alpha chars and underline.
 # A "word" in C.  Note that "$" is permitted -- it's not permitted by the
 # C standard in identifiers, but gcc supports it as an extension.
@@ -1900,12 +1903,30 @@ def process_c_file(f, patch_infos):
                                                      startpos + max_lookahead]
                             hit.hook(hit)
                 elif p_digits.match(c):
-                    while i < len(text): # Process a number.
-                        # C does not have digit separator
-                        if p_digits.match(text[i]) or (cpplanguage and text[i] == "'"):
+                    # Found a literal integer constant. Exact grammar depends on language and version:
+                    # C2x and C++14 add digit separators as well as binary constants. Before either, only
+                    # hex or octal or decimal constants are allowed, and ' digit separators are a parse
+                    # error. C2x is not yet passed, so only C++ mode allows for the new features right now.
+
+                    digitpattern = p_digits
+                    if i < len(text) and c == "0":
+                        if cpplanguage and text[i] in ["b", "B"]:
                             i += 1
+                            digitpattern = p_digits_binary
+                        elif text[i] in ["x", "X"]:
+                            i += 1
+                            digitpattern = p_digits_hex
                         else:
-                            break
+                            digitpattern = p_digits_octal
+
+                    # First char shall *not* be a digit separator.
+                    if i < len(text) and digitpattern.match(text[i]):
+                        i += 1
+
+                    # Remaining chars are either a valid digit or (if c++) a separator.
+                    while i < len(text) and (digitpattern.match(text[i]) or (cpplanguage and text[i] == "'")):
+                        i += 1
+
                 # else some other character, which we ignore.
                 # End of loop through text. Wrap up.
     if codeinline:
-- 
2.30.1 (Apple Git-130)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binary/hex integer literals with separators lead to parse error #75

binary/hex integer literals with separators lead to parse error #75

bloerwald commented Sep 26, 2022 •

edited

Loading

david-a-wheeler commented Sep 26, 2022

bloerwald commented Sep 27, 2022

binary/hex integer literals with separators lead to parse error #75

binary/hex integer literals with separators lead to parse error #75

Comments

bloerwald commented Sep 26, 2022 • edited Loading

david-a-wheeler commented Sep 26, 2022

bloerwald commented Sep 27, 2022

bloerwald commented Sep 26, 2022 •

edited

Loading