Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary/hex integer literals with separators lead to parse error #75

Open
bloerwald opened this issue Sep 26, 2022 · 2 comments
Open

binary/hex integer literals with separators lead to parse error #75

bloerwald opened this issue Sep 26, 2022 · 2 comments

Comments

@bloerwald
Copy link

bloerwald commented Sep 26, 2022

Integer literals are allowed to contain ' separators these days as per [lex.icon]. The implementation handles them correctly for decimal/octal literals, but chokes on hex/binary ones, if the number of ' is odd:

# for n in 0 "0'0"; do for p in 0b 0x 0 ''; do echo "int i = ${p}0'${n};" > test.cpp; cat test.cpp; flawfinder test.cpp  |& head -n4 | tail -n1; done; done
int i = 0b0'0;
Error: File ended while in string.
int i = 0x0'0;
Error: File ended while in string.
int i = 00'0;

int i = 0'0;

int i = 0b0'0'0;

int i = 0x0'0'0;

int i = 00'0'0;

int i = 0'0'0;

# flawfinder --version
2.0.19

It appears that

elif p_digits.match(c):
needs to also handle the prefixes.

@david-a-wheeler
Copy link
Owner

Makes sense. Have a proposed code change?

@bloerwald
Copy link
Author

From 0a246afa77ca2cbc2f55fc3221f2c857fb871a3c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bernd=20L=C3=B6rwald?= <[email protected]>
Date: Tue, 27 Sep 2022 23:22:14 +0200
Subject: [PATCH] fix: correctly parse integer constants

---
 flawfinder.py | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/flawfinder.py b/flawfinder.py
index 1e7eb04..ad59bd4 100755
--- a/flawfinder.py
+++ b/flawfinder.py
@@ -1697,6 +1697,9 @@ numberset = string.hexdigits + "_x.Ee"
 p_whitespace = re.compile(r'[ \t\v\f]+')
 p_include = re.compile(r'#\s*include\s+(<.*?>|".*?")')
 p_digits = re.compile(r'[0-9]')
+p_digits_binary = re.compile(r'[01]')
+p_digits_octal = re.compile(r'[0-7]')
+p_digits_hex = re.compile(r'[0-9a-fA-F]')
 p_alphaunder = re.compile(r'[A-Za-z_]')  # Alpha chars and underline.
 # A "word" in C.  Note that "$" is permitted -- it's not permitted by the
 # C standard in identifiers, but gcc supports it as an extension.
@@ -1900,12 +1903,30 @@ def process_c_file(f, patch_infos):
                                                      startpos + max_lookahead]
                             hit.hook(hit)
                 elif p_digits.match(c):
-                    while i < len(text): # Process a number.
-                        # C does not have digit separator
-                        if p_digits.match(text[i]) or (cpplanguage and text[i] == "'"):
+                    # Found a literal integer constant. Exact grammar depends on language and version:
+                    # C2x and C++14 add digit separators as well as binary constants. Before either, only
+                    # hex or octal or decimal constants are allowed, and ' digit separators are a parse
+                    # error. C2x is not yet passed, so only C++ mode allows for the new features right now.
+
+                    digitpattern = p_digits
+                    if i < len(text) and c == "0":
+                        if cpplanguage and text[i] in ["b", "B"]:
                             i += 1
+                            digitpattern = p_digits_binary
+                        elif text[i] in ["x", "X"]:
+                            i += 1
+                            digitpattern = p_digits_hex
                         else:
-                            break
+                            digitpattern = p_digits_octal
+
+                    # First char shall *not* be a digit separator.
+                    if i < len(text) and digitpattern.match(text[i]):
+                        i += 1
+
+                    # Remaining chars are either a valid digit or (if c++) a separator.
+                    while i < len(text) and (digitpattern.match(text[i]) or (cpplanguage and text[i] == "'")):
+                        i += 1
+
                 # else some other character, which we ignore.
                 # End of loop through text. Wrap up.
     if codeinline:
-- 
2.30.1 (Apple Git-130)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants