-
Hi Everyone, Probably a silly question, but I couldn't find much around. I need to track data flow in Python scripts. The goal is to track the expressions that flow into an open() call. For now I'm using this sample query:
And this sample testcase: def get_input_from_stdin():
return sys.argv[1]
def get_hardcoded_filename():
return "output.txt"
def combine_input_and_hardcoded():
user_input = get_input_from_stdin()
return f"{user_input}_log.txt"
def intermediate_variable():
part1 = get_input_from_stdin()
part2 = "_data"
filename = part1 + part2 + ".txt"
return filename
def filename_from_multiple_functions():
prefix = get_input_from_stdin()
suffix = get_hardcoded_filename()
return prefix + "_" + suffix
def nested_function_call():
def inner():
return get_input_from_stdin()
filename = inner() + "_nested.txt"
return filename
def custom_open(filepath):
with open(filepath, 'w') as f:
f.write("This is a test file.\n")
def main():
custom_open(get_hardcoded_filename())
custom_open(combine_input_and_hardcoded())
custom_open(intermediate_variable())
custom_open(filename_from_multiple_functions())
custom_open(nested_function_call())
if __name__ == "__main__":
main() As expected CodeQL returns me one path for each source Expr . However, I would like to keep only the most updated one. For instance, for the function Obviously I could solve this by checking the line number but that's no going to fly for slightly more complex scenarios. Another solution I was thinking is to have an other DataFlow tracking to check intra-procedural flows among local expressions but before doing that I'm curious if there's a more "CodeQL"-like way to do this. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
👋 @elManto As far as I can tell, what you are seeing is a result of how you defined Once you restrict For example, if you restrict
|
Beta Was this translation helpful? Give feedback.
-
Try adding this to
What this means is that source nodes also block any taint from flowing into them. So if you used to have a path "source1 -> source2 -> source3 -> sink", with this change it will only give you "source3 -> sink". |
Beta Was this translation helpful? Give feedback.
Another option is to use
DataFlow::Global
instead ofTaintTracking::Global
. Then you will only get expressions that actually flow to the sink, without being modified in some way first.