Fix ArgumentException when analysis target URI contains characters that are illegal in the file system#2860
Fix ArgumentException when analysis target URI contains characters that are illegal in the file system#2860joscol wants to merge 15 commits into
Conversation
…at are illegal in the file system.
| * BUG: Fix error `ERR997.NoValidAnalysisTargets` when scanning symbolic link files. | ||
| * BUG: Fix `ERR999.UnhandledEngineException: System.IO.FileNotFoundException: Could not find file` when a file name or directory path contains URL-encoded characters. | ||
| * BUG: Fix error `ERR997.NoValidAnalysisTargets` when ambiguous file/directory references are provided to `OrderedFileSpecifier`. Previously, the code required an explicit directory separator to be added to the end of a directory path. Now, the code inspects the file system and assumes that a reference to an existing directory was intended by the user (even without a trailing separator). | ||
| * BUG: Fix `ArgumentException` when analysis target URI contains characters that are illegal in the file system. |
| { | ||
| extension = Path.GetExtension(filePath); | ||
| } | ||
| catch (ArgumentException) |
There was a problem hiding this comment.
Since the exception is rare, I thought the performance cost of always evaluating the string for illegal characters would outweigh the cost of processing the exception. I will change it if you think it is better.
|
|
||
| namespace Test.UnitTests.Sarif.Driver.Net48 | ||
| { | ||
| public class AnalyzeTestCommand : MultithreadedAnalyzeCommandBase<AnalyzeTestContext, AnalyzeTestOptions> |
There was a problem hiding this comment.
The normal testing runs on .NET 6 and Path.GetExtension silently ignores illegal characters in later versions. It only throws the exception in older versions of .NET.
There was a problem hiding this comment.
This is good information but doesn't describe your rationale for creating a new test binary for this specific case. Maybe you didn't want to take on the performance costs for running tests on multiple frameworks?
I think those costs might be worth taking on. The SARIF binary in particular has a lot of fairly low level API (mostly in the IO namespace, the URI handling, etc.) where there are some troublesome compat issues like this one.
So I suggest you add net482 to the target frameworks for the core unit test dll and move the new tests over to it.
net6.0;
There was a problem hiding this comment.
Attempting to build Test.UnitTests.Sarif.Driver against net48 results in several instances of `error CS8831: Target runtime doesn't support covariant types in overrides.'
|
|
||
| string extension = Path.GetExtension(filePath); | ||
| string extension; | ||
| int lastDotIndex = filePath.LastIndexOf('.'); |
There was a problem hiding this comment.
you have added lots of code and you didn't add unit tests for this.
if you're going to inline Path.GetExtension, perhaps just grab the actual code (and get rid of the invalid path chars check).
can you please create a helper for this? also, did you look for other occurrences of this issue? i see other uses of Path.GetExtension in the code.
https://referencesource.microsoft.com/#mscorlib/system/io/path.cs,f424e433705aeb09
There was a problem hiding this comment.
None of the other uses of GetExtension looked like they were problematic to me and none of the telemetry points to any other code.
There was a problem hiding this comment.
I'm not yet convinced. The export rules and export config commands look like they could problematic to me as well. By inlining this change here and here only, you have left a non-obvious fix which compromises maintainability (i.e., it's very likely that if this problem is encountered again, a dev might not be aware of what you've done, add a duplicate helper or another inlined fix, etc.).
What I'd suggest is to create a static helper for this rewrite and please add unit tests. Then, add a new entry to IFileSystem for GetExtension and call this helper through IFileSystem everywhere in the code base.
|
Can you please update the PR description with a more thorough analysis of the problem and solution? If a path contains illegal characters, presumably it will fail some later code condition, so what's the qualitative improvement you're looking for? In some cases, perhaps, our code would short-circuit on a code path where the GetExtension check seamlessly shuts things down (rather than crashing the app). The change creates consistency in behavior across platforms. Etc. Can you please confirm the behavior of GetExtension on more recent versions? I assume this API is simply entirely tolerant of illegal chars and will search for the final dot, the remaining chars, etc.? In reply to: 2808155094 |
| strategy: | ||
| matrix: | ||
| os: [windows-latest, ubuntu-20.04, macos-13] | ||
| os: [windows-latest, ubuntu-22.04, macos-13] |
There was a problem hiding this comment.
The latest Ubuntu image (version 24) doesn't include nuget - actions/runner-images#12086
There was a problem hiding this comment.
Version 20 isn't supported any more - actions/runner-images#11101
There was a problem hiding this comment.
Version 20 is still a required policy:
build-and-test (ubuntu-20.04)Expected — Waiting for status to be reported
I don't see how to change that. I probably don't have permission.
@michaelcfanning , do you know what I need to do?
| * BUG: Fix error `ERR997.NoValidAnalysisTargets` when scanning symbolic link files. | ||
| * BUG: Fix `ERR999.UnhandledEngineException: System.IO.FileNotFoundException: Could not find file` when a file name or directory path contains URL-encoded characters. | ||
| * BUG: Fix error `ERR997.NoValidAnalysisTargets` when ambiguous file/directory references are provided to `OrderedFileSpecifier`. Previously, the code required an explicit directory separator to be added to the end of a directory path. Now, the code inspects the file system and assumes that a reference to an existing directory was intended by the user (even without a trailing separator). | ||
| * BUG: Fix `ArgumentException: Illegal characters in path` when analysis target URI contains characters that are illegal in the file system. |
There was a problem hiding this comment.
You haven't described the method that raises the exception. This means that customers who have a stack frame in an exception might not easily find your note. MultithreadedAnalyzeCommandBase.Run would provide this information. Although you've documented this specific fix, you haven't described the other code locations you changed, which you could add. e.g. ExportRulesMetadataCommandBase.Run and ExportConfigurationCommandBase.Run
| return string.Empty; | ||
| } | ||
|
|
||
| } |
| [Fact] | ||
| public void PathGetExtension_ShouldExtractExtension() | ||
| { | ||
| FileSystem.Instance.PathGetExtension("test.txt").Should().Be(".txt"); |
There was a problem hiding this comment.
Add at least one test with multiple non-sequential dots in the file name. Add at least one test with a period in the directory name. Add a file name that begins with a period (this is legal).Add a file name that ends with a period (not sure what happens here) Add a file name that starts and ends with a period (ditto). Add a file name with sequential dots in the file name test...txt.
One thing about your change is that it hasn't really told us whether the inlined implementation matches our goal of consistent behavior across different versions of .NET (absent the throw behavior).
So can you also supplement these tests to compare these behaviors to actual Path.GetExtension? you will need to do some #defines to cover the exception raising cases. ideally, you would build the unit tests to demonstrates the exception raising vs. non exception raising conditions (but we didn't figure out how to run on net482 is that right?)
| mock.Setup(fs => fs.PathGetFullPath(It.IsAny<string>())).Returns((string path) => path); | ||
| mock.Setup(fs => fs.FileReadAllText(fileName)).Returns(fileText ?? string.Join(Environment.NewLine, fileLines)); | ||
| mock.Setup(fs => fs.FileReadAllLines(fileName)).Returns(fileLines ?? fileText.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)); | ||
| mock.Setup(x => x.PathGetExtension(It.IsAny<string>())).Returns((string path) => SarifUtilities.PathGetExtension(path)); |
Did you consider using the mock factory creator here? I'm not sure why we have one of those if in other cases we precisely describe all setups. :) And are all these setups required? I might have tried to call the default factory, then only explicitly provide additional setups (related to streaming) that were required. Or I might have considered adding the stream helpers to the mock factory. in general, our test code could be improved by strengthening behavior of the factory, for example, by requiring all mocked things to be called, etc., that's a bridge to far for this change, but please do look at this code for tightening opportunities since you touched it. Refers to: src/Test.UnitTests.Sarif.Driver/Sdk/MultithreadedAnalyzeCommandBaseTests.cs:877 in c42794f. [](commit_id = c42794f, deletion_comment = False) |
During processing of analysis targets, we check if the file path has a file extension matching a known ZIP compatible format. This check is done using
Path.GetExtension. In older, but still in use, versions of .NET, this method will throw an ArgumentException if the path contains any illegal characters anywhere in the path. Since this path is taken from a URL it can contain such characters as the URL and file system illegal character sets are not the same. This PR replaces the call toPath.GetExtensionwith a new helper method that behaves the same way as newer versions of .NET*Subsequent processing of the file name does not show any risk of future exceptions. If the file appears to be a ZIP compatible format, the file will be read from disk. But as the file could only have been created if it contains no illegal characters, there is no risk here. Regardless of the extension, the file name is only used in logging which does not have character restrictions.
*See: https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getextension?view=net-9.0 -