Skip to content

Fix ArgumentException when analysis target URI contains characters that are illegal in the file system#2860

Open
joscol wants to merge 15 commits into
microsoft:mainfrom
joscol:illegal_path_chars_url
Open

Fix ArgumentException when analysis target URI contains characters that are illegal in the file system#2860
joscol wants to merge 15 commits into
microsoft:mainfrom
joscol:illegal_path_chars_url

Conversation

@joscol
Copy link
Copy Markdown
Contributor

@joscol joscol commented Apr 30, 2025

During processing of analysis targets, we check if the file path has a file extension matching a known ZIP compatible format. This check is done using Path.GetExtension. In older, but still in use, versions of .NET, this method will throw an ArgumentException if the path contains any illegal characters anywhere in the path. Since this path is taken from a URL it can contain such characters as the URL and file system illegal character sets are not the same. This PR replaces the call to Path.GetExtension with a new helper method that behaves the same way as newer versions of .NET*

Subsequent processing of the file name does not show any risk of future exceptions. If the file appears to be a ZIP compatible format, the file will be read from disk. But as the file could only have been created if it contains no illegal characters, there is no risk here. Regardless of the extension, the file name is only used in logging which does not have character restrictions.

*See: https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getextension?view=net-9.0 -

Exceptions
[ArgumentException](https://learn.microsoft.com/en-us/dotnet/api/system.argumentexception?view=net-9.0)
.NET Framework and .NET Core versions older than 2.1: path contains one or more of the invalid characters defined in [GetInvalidPathChars()](https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getinvalidpathchars?view=net-9.0#system-io-path-getinvalidpathchars).

Comment thread ReleaseHistory.md Outdated
* BUG: Fix error `ERR997.NoValidAnalysisTargets` when scanning symbolic link files.
* BUG: Fix `ERR999.UnhandledEngineException: System.IO.FileNotFoundException: Could not find file` when a file name or directory path contains URL-encoded characters.
* BUG: Fix error `ERR997.NoValidAnalysisTargets` when ambiguous file/directory references are provided to `OrderedFileSpecifier`. Previously, the code required an explicit directory separator to be added to the end of a directory path. Now, the code inspects the file system and assumes that a reference to an existing directory was intended by the user (even without a trailing separator).
* BUG: Fix `ArgumentException` when analysis target URI contains characters that are illegal in the file system.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArgumentException

this release note is insufficient.

{
extension = Path.GetExtension(filePath);
}
catch (ArgumentException)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArgumentException

please find a fix that doesn't require catching an exception. i.e., scan the file path to see whether it contains invalid chars.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the exception is rare, I thought the performance cost of always evaluating the string for illegal characters would outweigh the cost of processing the exception. I will change it if you think it is better.


namespace Test.UnitTests.Sarif.Driver.Net48
{
public class AnalyzeTestCommand : MultithreadedAnalyzeCommandBase<AnalyzeTestContext, AnalyzeTestOptions>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeTestCommand

why isn't our standard testing catching this problem? it seems like configuring some tests to run in downlevel CLR would be preferable than adding a specialized test assembly.

what is your thinking here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normal testing runs on .NET 6 and Path.GetExtension silently ignores illegal characters in later versions. It only throws the exception in older versions of .NET.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good information but doesn't describe your rationale for creating a new test binary for this specific case. Maybe you didn't want to take on the performance costs for running tests on multiple frameworks?

I think those costs might be worth taking on. The SARIF binary in particular has a lot of fairly low level API (mostly in the IO namespace, the URI handling, etc.) where there are some troublesome compat issues like this one.

So I suggest you add net482 to the target frameworks for the core unit test dll and move the new tests over to it.

net6.0;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to build Test.UnitTests.Sarif.Driver against net48 results in several instances of `error CS8831: Target runtime doesn't support covariant types in overrides.'

Copy link
Copy Markdown
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐


string extension = Path.GetExtension(filePath);
string extension;
int lastDotIndex = filePath.LastIndexOf('.');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filePath

you have added lots of code and you didn't add unit tests for this.

if you're going to inline Path.GetExtension, perhaps just grab the actual code (and get rid of the invalid path chars check).

can you please create a helper for this? also, did you look for other occurrences of this issue? i see other uses of Path.GetExtension in the code.

https://referencesource.microsoft.com/#mscorlib/system/io/path.cs,f424e433705aeb09

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the other uses of GetExtension looked like they were problematic to me and none of the telemetry points to any other code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet convinced. The export rules and export config commands look like they could problematic to me as well. By inlining this change here and here only, you have left a non-obvious fix which compromises maintainability (i.e., it's very likely that if this problem is encountered again, a dev might not be aware of what you've done, add a duplicate helper or another inlined fix, etc.).

What I'd suggest is to create a static helper for this rewrite and please add unit tests. Then, add a new entry to IFileSystem for GetExtension and call this helper through IFileSystem everywhere in the code base.

@michaelcfanning
Copy link
Copy Markdown
Member

Can you please update the PR description with a more thorough analysis of the problem and solution? If a path contains illegal characters, presumably it will fail some later code condition, so what's the qualitative improvement you're looking for? In some cases, perhaps, our code would short-circuit on a code path where the GetExtension check seamlessly shuts things down (rather than crashing the app). The change creates consistency in behavior across platforms. Etc. Can you please confirm the behavior of GetExtension on more recent versions? I assume this API is simply entirely tolerant of illegal chars and will search for the final dot, the remaining chars, etc.?


In reply to: 2808155094

strategy:
matrix:
os: [windows-latest, ubuntu-20.04, macos-13]
os: [windows-latest, ubuntu-22.04, macos-13]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest Ubuntu image (version 24) doesn't include nuget - actions/runner-images#12086

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version 20 isn't supported any more - actions/runner-images#11101

Copy link
Copy Markdown
Contributor Author

@joscol joscol May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version 20 is still a required policy:

build-and-test (ubuntu-20.04)Expected — Waiting for status to be reported

I don't see how to change that. I probably don't have permission.

@michaelcfanning , do you know what I need to do?

Comment thread ReleaseHistory.md Outdated
* BUG: Fix error `ERR997.NoValidAnalysisTargets` when scanning symbolic link files.
* BUG: Fix `ERR999.UnhandledEngineException: System.IO.FileNotFoundException: Could not find file` when a file name or directory path contains URL-encoded characters.
* BUG: Fix error `ERR997.NoValidAnalysisTargets` when ambiguous file/directory references are provided to `OrderedFileSpecifier`. Previously, the code required an explicit directory separator to be added to the end of a directory path. Now, the code inspects the file system and assumes that a reference to an existing directory was intended by the user (even without a trailing separator).
* BUG: Fix `ArgumentException: Illegal characters in path` when analysis target URI contains characters that are illegal in the file system.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArgumentException

This release note doesn't describe the code location where the fix was made.

Copy link
Copy Markdown
Member

@michaelcfanning michaelcfanning May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't described the method that raises the exception. This means that customers who have a stack frame in an exception might not easily find your note. MultithreadedAnalyzeCommandBase.Run would provide this information. Although you've documented this specific fix, you haven't described the other code locations you changed, which you could add. e.g. ExportRulesMetadataCommandBase.Run and ExportConfigurationCommandBase.Run

return string.Empty;
}

}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

NIT: you introduced a blank line here

[Fact]
public void PathGetExtension_ShouldExtractExtension()
{
FileSystem.Instance.PathGetExtension("test.txt").Should().Be(".txt");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathGetExtension

Add at least one test with multiple non-sequential dots in the file name. Add at least one test with a period in the directory name. Add a file name that begins with a period (this is legal).Add a file name that ends with a period (not sure what happens here) Add a file name that starts and ends with a period (ditto). Add a file name with sequential dots in the file name test...txt.

One thing about your change is that it hasn't really told us whether the inlined implementation matches our goal of consistent behavior across different versions of .NET (absent the throw behavior).

So can you also supplement these tests to compare these behaviors to actual Path.GetExtension? you will need to do some #defines to cover the exception raising cases. ideally, you would build the unit tests to demonstrates the exception raising vs. non exception raising conditions (but we didn't figure out how to run on net482 is that right?)

Comment thread src/Test.UnitTests.Sarif/MockFactory.cs Outdated
mock.Setup(fs => fs.PathGetFullPath(It.IsAny<string>())).Returns((string path) => path);
mock.Setup(fs => fs.FileReadAllText(fileName)).Returns(fileText ?? string.Join(Environment.NewLine, fileLines));
mock.Setup(fs => fs.FileReadAllLines(fileName)).Returns(fileLines ?? fileText.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries));
mock.Setup(x => x.PathGetExtension(It.IsAny<string>())).Returns((string path) => SarifUtilities.PathGetExtension(path));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathGetExtension

this mocking code is a little troubling in that it sets some default behaviors (input agnostic) and others that are exclusive to the file name.

can we at least separate these setups cleanly in the code?

@michaelcfanning
Copy link
Copy Markdown
Member

        mockFileSystem.Setup(x => x.IsSymbolicLink(path)).Returns(false);

Did you consider using the mock factory creator here?

I'm not sure why we have one of those if in other cases we precisely describe all setups. :)

And are all these setups required? I might have tried to call the default factory, then only explicitly provide additional setups (related to streaming) that were required.

Or I might have considered adding the stream helpers to the mock factory.

in general, our test code could be improved by strengthening behavior of the factory, for example, by requiring all mocked things to be called, etc., that's a bridge to far for this change, but please do look at this code for tightening opportunities since you touched it.


Refers to: src/Test.UnitTests.Sarif.Driver/Sdk/MultithreadedAnalyzeCommandBaseTests.cs:877 in c42794f. [](commit_id = c42794f, deletion_comment = False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants