Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find in Files #413

Open
ProgerXP opened this issue Jun 14, 2022 · 2 comments
Open

Find in Files #413

ProgerXP opened this issue Jun 14, 2022 · 2 comments

Comments

@ProgerXP
Copy link
Owner

ProgerXP commented Jun 14, 2022

This describes implementation of the search operation carried upon multiple files (inspired by grep, vim's quickfix cc list and Notepad++).

The operation is built around special "FindInFiles" INI file. There may be multiple such INIs. The program determines if it's a special search file or a regular file by checking the file's first line: if current buffer is not Unnamed, uses UTF-8 encoding (no BOM) and starts with ASCII [FindInFiles] followed by EOF, \r or \n then it is special.

Default contents for this file is the [FindInFiles] section header followed by regular Find's settings: string, case, word/beginning, RE/backslashes, comments, as well as FIF-specific settings, some in their own sections (see below). After the search new sections are added after [FindInFiles] as described below.

New command

Add new command Find in Files to the end of Edit > Find, with the Ctrl+Shift+F hotkey (reassign). It is disabled if current buffer is Unnamed.

The command works as follows:

  • If current selection is not empty:
    • If it starts with ASCII a-zA-Z then ASCII : or if starts with two ASCII \, assume it is a file path. Else assume no path.
  • Else, if current selection is empty, do left-scan from the current caret¹ position searching for absolute file path + zero or more : with numbers (see below). The scan is done this way:
    • Upon hitting line break or document start - restart the scan but this time do right-scan from this position (of line break of BOF). However, if it is the restarted scan that hit line break or BOF - stop, assuming no path.
    • Upon hitting : - examine up to two preceding characters depending on buffer length (XY:). If : is at BOF then stop (assuming no path), else if Y is ASCII a-zA-Z and if X is {either BOF or not one of ASCII a-zA-Z0-9} then carry on to next point, else continue scanning.
    • Determined Y is the drive letter. Now do right-scan (from the position of Y, not from the original caret position¹) to find where the file path ends. Do this by locating the first line break or EOF or any of ASCII ? * | " < >. Then stop.
      • Special case²: if hit line break or EOF then check if last symbol before it (i.e. last symbol in the path) is ]; if so, remove ] from the path (i.e. back-track by one symbol to the left).
  • If a path was determined, try opening it (treating trailing :line:col according to Open dialog - allow ":line:column" #325; ignoring until that is implemented). The opening function is a combination of Ctrl+O plus command-line processing: it may show message boxes like Ctrl+O does (if current buffer is unsaved or if the path doesn't exists or is not a file) but it must also respect Single File Instance (Ctrl+O doesn't respect it, only attempt to run Notepad2e.exe file.txt respects it). In other words, first check if another process/window has the same file opened; if it does, focus it (command processing stops), otherwise work like Ctrl+O in the current window.
  • Else, if no path was determined:
    • ① If current file is not a special INI file, try to open standard search INI: determine its path (read from Notepad2's INI, section [Notepad2e], key FindInFilesINI, default = %TEMP%\FindInFiles.ini; expand environment variables in the value), create default file if that path does not exist, open that file as if by Ctrl+O.
    • ② Else, if current file is a special INI file, save it (stop command processing upon saving error) and perform the search.

Examples:

|[C:\Foo\Bar.txt]
Does left-scan, hits BOF, restarts with right-scan, finds "[C:" (XY:), does
right-scan from "C", hits line break. Path = "C:\Foo\Bar.txt]" but special case²
instructs to remove "]" because it's located before EOF (or line end).

:| C:\Foo\Bar.txt:123:4567
Does left-scan, finds ":" but there's no "Y" before it so continues, hits BOF,
restarts with right-scan, finds " C:", does right-scan from "C", hits line
break. Path = "C:\Foo\Bar.txt:123:4567".

C:\F|oo\Bar.txt>NUL
Does left-scan, finds "C:" ("Y:"), does right-scan from "C", hits ">". Path =
"C:\Foo\Bar.txt".

;[C:\Foo\Bar.txt]:123:4567
Here [ ] represent selection. Since selection starts with "C:", it is treated as
a path. Path = "C:\Foo\Bar.txt".

[;C:\Foo\Bar.txt]:123:4567
Here [ ] represent selection. Since selection starts with ";C", it is treated as
having no path.

New button

Add new button In Files above Close in Ctrl+F with hint: Ctrl+Shift+F. It is disabled if current buffer is Unnamed (it's okay to set the disabled state once when Find is shown instead of monitoring this condition and updating the state while Find is visible).

The button (clicking the button or pressing the hotkey while Find is focused) works as follows:

  • If current file is not a special INI file, work as ① with an addition: before opening the search file, update keys in [FindInFiles] according to current Find dialog's controls including Search String, and save the file.
  • If current file is a special INI file, work as ② with an addition: use state of Find dialog's controls in the new search (this state will be written to [FindInFiles] once the search finishes). Values for keys that are not represented by Find's controls are taken from current INI file's values (or program defaults if the file doesn't exist or has no such key). Technically the result is the same as the user manually changing [FindInFiles] before starting the search.

Search settings

[FindInFiles] keys:

  • Regular Find's: String, Case, Word (0, 1, 2 = beginning), Expression (0, 1 = PCRE, 2 = backslashes), Comments (0, 1, 2 = any). Defaults for these are Find's defaults.
  • Depth - descending into subdirectories (0+), defaults to 20; 0 = search the directory only, 1 = search immediate subdirectories, etc.
  • MaxSize - do not start processing a single file if its size is greater than this number of bytes (0+), defaults to 100 000 000 (~100 MiB)
  • MaxPerFile - stop processing a single file after this many matches (1+), defaults to 1000
  • LinesAbove, LinesBelow - number of lines output before and after the matched line (0+), default to 0
  • CacheNames - causes results of FindFirstFile/etc. calls to be remembered and avoided in future searches, default to 1

[FileName] - every non-empty non-comment line is a regular expression (as accepted by Find/PCRE). Since WinAPI doesn't allow reading raw section contents (?), we should parse it by hand and in this case treat lines starting with ; as comments. RE is matched against base file names (not paths), case-insensitively. In the default contents this section has a single ^ line (^ = start of line, always matches).

[Path] - every non-empty non-comment line is an absolute or relative path to directory or file. Paths are resolved relative to the special INI file (this can be achieved by changing CWD of the process before starting the search). Empty contents by default.

[C:\Cached\File\List\] - when CacheNames is 0, these sections are neither read from nor written to the INI. If 1, the search operation performs the lookup before descending into a subdirectory to read its list of files: if a section for this subdirectory (path) exists, skip FindFirstFile calls; if it doesn't exist, do the calls and write this section when finished. This mechanism speeds up future searches in the same root directory if user knows there were no new files added. Removed files don't hurt since the search operation ignores file open errors, including errors when opening non-existing files. Trailing \ in the name of these sections prevents conflicts with matching files' sections ([C:\Foo\Bar.txt]).

CacheNames sections can be manually populated by the user, allowing flexible file selection if simple [FileName] and [Path] rules are not enough (for example, different file patterns for different paths). It guarantees non-listed files are not accessed by the search.

Search

The search operation is automated execution of existing Notepad 2e operations similarly to #250. It is up to the implementation to decide what happens with the program's window while the search is running. At minimum, the status bar should be in the "busy" mode (as done by commands like Base64 Encode). The window can show the INI file, be blank or be loading files as the search progresses. If the INI file remains opened, File Change Notification must ignore changes done by the running search operation and not produce "File changed, reload?" message boxes.

Find settings, search strings, recent files (History), etc. must not be updated regardless of Save Settings. In other words, Notepad2's INI file should not be changed during the search operation. Currently, #250 does change History but this can be seen as a bug and addressed together with this task.

  • Check the search INI file: If it doesn't exist, create the default file and open it, adding [Path] and focusing the line (per the next after next point). If the file exists but is not "special", open it (unless already opened) and do nothing.
  • Read all settings from the INI.
  • If [Path] is empty, open the INI file (unless already opened) and move the caret to be after the [Path] line. If there is no such section, add one blank line and the [Path] line to the end of the file and move the caret. If the section exists but is followed by EOF then move to EOF or add a blank line and move to it, whichever is simpler. Stop the operation.
  • For every path in [Path], do the usual FindFirstFileW() cycle. For every found directory, test Depth and possibly descend (yielding a nested FindFirstFileW() cycle). Take into account CacheNames and write full file list even for names mismatching [FileName]. For every found file:
    • Test MaxSize.
    • Test every RE in [FileName]. If this section is not empty and no REs match, skip the file.
    • Open the file as if by Ctrl+O. If there was an error, skip the file.
    • Find the first match from file start as if by Ctrl+F (using the settings from the search INI file). If not found, close the file and process the next one.
    • Find next match. Now there are two matches (A and B, where B may be not-found). Record A in the list of matches with these properties: matched-line = contents of the line where A was located; lines-above = array of contents of up to LinesAbove lines starting with A-1; lines-below = same but up to min(LinesBelow, line of B) (if B = not-found then up to LinesBelow). In other words, if LinesBelow is 1 and A and B are adjacent lines, for A lines-below is recorded as an empty array.
    • Test MaxPerFile. If the number of recorded matches is smaller, repeat the previous point for the new match B (match A is now discarded, and former match B becomes match A).
  • Write new contents for the search INI file (previous contents is erased).
  • Open the INI file (if not already opened) or Reload it (as if by F5). If there was at least one match, move the caret to the first file's section (before [).

The new contents starts with the default contents template except values are not defaults but this operation's parameters:

[FindInFiles]
<regular Find's settings and FIF-specific settings, in any order, with values = this search operation's parameters>
[FileName]
<this search operation's FileName's>
[Path]
<this search operation's Path's>
[C:\Cached\File\List\]
<zero or more sections>

Then go file sections: one per every file name that had at least one match (file names go in any order, matches go in order of position in the file). A blank line exists before each section. Template:


[C:\Foo\Bar.txt]
; C:\Foo\Bar.txt:123:4567
First match
; C:\Foo\Bar.txt:456:7890
Second match

In case LinesAbove and/or LinesBelow is 1+ (1 and 2 in this example):

[C:\Foo\Bar.txt]
; C:\Foo\Bar.txt:123:4567
Line before the match
Match
After it
Another, after
; C:\Foo\Bar.txt:456:7890
...

Implementation note: instead of writing full new INI contents after the search has finished, it is possible to update the INI in process of searching (i.e. first write the header with [FindInFiles] and other sections and then append new [C:\...] sections when a match is found). It may be assumed that the INI file is never accessed outside of the running process.

This format allows quickly navigating whole files with Ctrl+[/], going in/out with Ctrl+Shift+F (especially with #411 enabled), searching within file names using Find (inside the INI file) with Search comments enabled or searching within matches only with Search comments disabled (and optionally Ungrep'ped ^\[, i.e. removed section headers).

@babanga
Copy link

babanga commented Jan 31, 2024

Hi, I'm using notepad2e with AstroGrep and there's one thing that really bothers me. It has an option to pass the highlighted string, but it fails with quoted strings. Can this be fixed on n2e's side?

Example:
Notepad2e.exe /g 33,10 /m "class="red blue"" c:\ex.txt

@ProgerXP
Copy link
Owner Author

@babanga See #476.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants