Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Packed Loads Format Could (Potentially) be even faster #84

Open
michael-okeefe opened this issue Aug 23, 2024 · 0 comments
Open
Labels
benchmarking Related to benchmarking performance enhancement New feature or request low-priority "Nice to have" but not necessary; prioritize lower performance A task related to assessing/enhancing performance
Milestone

Comments

@michael-okeefe
Copy link
Member

Problem Description

The current packed loads format is something like this:

load-tag0,4,load-tag1,4,load-tag2,2
hours,kW,hours,kW,hours,kW
0,100,0,150,0,300
1,80,1,120,3,220
2,90,2,111,,
3,95,3,105,,

128 characters

However, we can incorporate a few simple ideas that should reduce the number of bytes required to write:

  1. use abbreviations for units such a h instead of hours (note: this may already work)
  2. transpose the file to have data run by row instead of column; this will help when load profiles have different lengths; it will also make it unnecessary to print the number of elements since the file is read row-wise
  3. make it so the time-series tag can be shared by all subsequent rows until re-defined

Note that item 2 above only really helps when we write the "blank cells" using commas which is fairly cheap. In the above example, it is unnecessary since the longer loads are on the left. However, if the load-tag2 had been written first, the "blank cell" commas would have been necessary to write. If we had quite a few loads to write with differing sizes, the number of "blank cell commas" could become significant.

Here is the above format using these 3 principles:

time,h,0,1,2,3
load-tag0,kW,100,80,90,95
load-tag1,kW,150,120,111,105
time,h,0,3
load-tag3,kW,300,220

105 characters

Strings can be read one line at a time. The last read time vector is assumed to be "active" until redefined. For reserving space for std::vector, we can read a single line and count the number of commas. The length of the last active time vector will be the length of all subsequent load vectors.

NOTE: I did check and Microsoft Excel is able to display over 8,760 columns so a transposed packed load file could still be loaded to view in MS Excel.

Second Note: this is extremely unlikely to make much of a difference in terms of time or space (especially if the file is zipped). Also, two of the suggestions could be implemented without transposing the file. However, just want to jot down the idea while thinking of it.

@michael-okeefe michael-okeefe added enhancement New feature or request low-priority "Nice to have" but not necessary; prioritize lower performance A task related to assessing/enhancing performance labels Aug 23, 2024
@michael-okeefe michael-okeefe added this to the 2024 (Year End) milestone Aug 23, 2024
@michael-okeefe michael-okeefe added the benchmarking Related to benchmarking performance label Aug 23, 2024
@michael-okeefe michael-okeefe modified the milestones: 2024 (Year End), 2025+ Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking Related to benchmarking performance enhancement New feature or request low-priority "Nice to have" but not necessary; prioritize lower performance A task related to assessing/enhancing performance
Projects
None yet
Development

No branches or pull requests

1 participant