|
| 1 | +# Test Task Implementation Notes |
| 2 | + |
| 3 | +## Decision progress |
| 4 | + |
| 5 | +The task execution took much longer than I originally estimated. |
| 6 | + |
| 7 | +It was quite humiliating when the first version was 5-6 times slower than `grep` |
| 8 | +at a very high level of speed optimization with all sorts of statically allocated arrays without memory re-allocations. |
| 9 | +And I expected that the result should already be better than analogues. Moreover, I had a specialized algorithm, |
| 10 | +while `grep` has a different syntax and the universal algorithm inside. I supposed `grep` to be slower by definition. |
| 11 | + |
| 12 | +Originally I had the line match algorithm with dynamic programming and `O(P*N)` memory. |
| 13 | +Then I replaced it with another one, faster and with constant memory. |
| 14 | +And... it became only 2 times slower than `grep`. It was almost a success `:)` |
| 15 | +At the same time, according to the profiler, 80+% of time was spent exactly in the line match algorithm. |
| 16 | + |
| 17 | +I had no doubt that a more efficient algorithm must exist for this task. |
| 18 | +The only way to defeat `grep` was to add a few optimizations that simply scrolled faster |
| 19 | +algorithm in the most popular scenarios. This change accelerated the matching algorithm by about 6 times, |
| 20 | +and the total running time of the program accelerated by 3 times. |
| 21 | +The improved algorithm gave a victory over grep by only 25-35%. |
| 22 | + |
| 23 | +After that, according to the profiler, almost half of the time was spent |
| 24 | +in a line match algorithm, and near to 40% of the time was spent in a synchronous `ReadFile()` call. |
| 25 | + |
| 26 | +I decided that this is the finest hour of asynchronous file reading! |
| 27 | +Thus, the operating system will read the next data block of the file during parsing and processing the previous one. |
| 28 | +I implemented and ... nothing. The total running time has not changed. |
| 29 | +But the profiler showed a redistribution of time towards the line match algorithm. |
| 30 | +It was very strange that the line matching algorithm slowed down. And I still don't understand why it slowed down. |
| 31 | +I am convinced that this 40% spent by `ReadFile()` could be compressed to a maximum of 5% |
| 32 | +by parallelizing data proofreading and data processing. |
| 33 | +Perhaps this is somehow related to the fact that the data is in the system file cache, |
| 34 | +and it is not really read from the disk (shorter IRP path). |
| 35 | +Perhaps this banal copying of memory in kernel mode is poorly parallelized. |
| 36 | +Maybe it was worth rewriting so that reading from disk in a dedicated thread was performed ... |
| 37 | + |
| 38 | +In the next iteration I tried to map the file to memory. |
| 39 | +This solution does not meet requirements because it may throw SEH exceptions in case of disk read errors. |
| 40 | +And I had doubts about the effectiveness of the speed of loading new pages in this solution. |
| 41 | +Result was slightly slower, the total time of the program has increased by 20% percent. |
| 42 | +Although it is also strange when the disk cache is warmed up. |
| 43 | +Theoretically, if the data is in the disk cache, then it would be possible to map it |
| 44 | +to process readonly virtual memory in `O(1)`, |
| 45 | +and then save time on transitions to kernel mode during memory scan + save time on copying memory. |
| 46 | + |
| 47 | +## Testing and Notes |
| 48 | + |
| 49 | +I tested using web server log, 2 GB, 5.5 million lines, |
| 50 | +average line length was 380 bytes, all lines are no longer than 1024 bytes. |
| 51 | +1600 lines out of 5.5M matched the pattern. I chose the pattern `*string*` as the most popular in everyday life. |
| 52 | + |
| 53 | +The SSD drive was used, but I warmed it up so that all the data got to the system file cache. |
| 54 | + |
| 55 | +CPU: `Intel Core i5 8th Gen`, laptop edition. |
| 56 | + |
| 57 | +Application built under `x64` architecture worked faster than under `x86`. |
| 58 | + |
| 59 | +The `FILE_FLAG_SEQUENTIAL_SCAN` flag did not give a performance boost on a warmed cache. |
| 60 | +Without warming the cache, it must be measured separately. |
| 61 | + |
| 62 | +Sometimes the application execution time is kept at +25% for a long time. |
| 63 | +Most likely this is due to the fact that I have a laptop and CPU cores have economical modes. |
| 64 | + |
| 65 | +The latest application version takes 1.6 seconds to process test data while `grep` takes 2.5 seconds. |
| 66 | + |
| 67 | +**ADDED:** |
| 68 | +I also implemented reading the file in a separate thread. The file operation is synchronous, |
| 69 | +synchronization between threads is done with the lock free loop (spinlock). |
| 70 | +It gave a total gain of 25% over the synchronous and asynchronous API solutions (total work time is 1.2 seconds). |
| 71 | +This is 2 times faster than `grep`. |
| 72 | + |
| 73 | +## Implementation features |
| 74 | + |
| 75 | +I kept all four implementations of reading files. You can switch them in code: |
| 76 | + |
| 77 | +```cpp |
| 78 | +#if 0 |
| 79 | +#if 0 |
| 80 | + CSyncLineReader _lineReader; |
| 81 | +#else |
| 82 | + CMappingLineReader _lineReader; |
| 83 | +#endif |
| 84 | +#else |
| 85 | +#if 0 |
| 86 | + CAsyncLineReader _lineReader; |
| 87 | +#else |
| 88 | + CLockFreeLineReader _lineReader; |
| 89 | +#endif |
| 90 | +#endif |
| 91 | +``` |
| 92 | + |
| 93 | +The solution contains unit tests in a separate project based on `gtest` framework. |
| 94 | + |
| 95 | +As required by the challenge, the main console application is built with C++ exceptions disabled and no RTTI. |
| 96 | + |
| 97 | +I have used some parts of the STL at my own risk. |
| 98 | +These parts do not use exceptions and work without unnecessary overhead. |
| 99 | +I see no reason not to use cheap abstractions that allow you to write cleaner and more error-free code. |
| 100 | +I mean all kinds of `std::unique_ptr`, `std::string_view`, `std::optional` and etc |
| 101 | + |
| 102 | +--- |
0 commit comments