Investigate ways to speed up CSV reading/writing AND/OR alternate file formats #34
Open
1 of 6 tasks
Labels
benchmarking
Related to benchmarking performance
enhancement
New feature or request
performance
A task related to assessing/enhancing performance
Milestone
Problem
While profiling a recent release build of ERIN, it looks like time being spent is roughly:
Although these ratios may change on larger problems, the time spent in reading / writing is still rather large. The objective of this task would be to:
string->double
conversionReference: #40 #41
A study of the speed in reading .csv files performed using the
hyperfine
benchmark utility indicates marked speed improvements in reading a given amount of data packed into a single file, as compared to distributing the data across multiple files. Here we define a single "entry" as two columns. For comparison, two files were used, both with 8760 rows. In "repeat" mode, a file with one entry was opened, read, and then closed 1024 times. In "mixed" mode, a file with 1024 entries was open, read, and then closed only once. For comparison, a third mode, "multi", which reads from a list of 128 files, each with only one entry, gave comparable results to "repeat" mode, as expected. (The same single entry file was used for "repeat" and "multi" modes.)p: # of files to read
q: # of entries (8760 rows each)
r: # of trials
# of entries to read = p x q x r (= 1024)
The hyperfine results are below:
"repeat": p =1, q = 1., r = 1024
Benchmark 1: ../../build/bin/erin read test_files.toml repeat -v
Time (mean ± σ): 4.412 s ± 0.238 s [User: 4.331 s, System: 0.067 s]
Range (min … max): 4.040 s … 4.887 s 10 runs
"mixed": p = 1, q = 1024, r = 1
Benchmark 1: ../../build/bin/erin read test_files.toml mixed -v
Time (mean ± σ): 1.993 s ± 0.078 s [User: 1.925 s, System: 0.061 s]
Range (min … max): 1.833 s … 2.101 s 10 runs
"multi": p = 128, q = 1, r = 8
Benchmark 1: ../../build/bin/erin read test_files.toml multi -v
Time (mean ± σ): 4.397 s ± 0.201 s [User: 4.303 s, System: 0.070 s]
Range (min … max): 4.049 s … 4.734 s 10 runs
These tests indicate that reductions in read times are possible using packed .csv data formats.
The text was updated successfully, but these errors were encountered: