-
Notifications
You must be signed in to change notification settings - Fork 3
Fix PineAPPL to APPLgrid conversion problems #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jamspandex: I'm not sure how to contact you properly to report a problem with APPLgrid, so I'm doing it here: two of the exported grids, export properly but then can't be read by APPLgrid, which exits with an exception: I tracked this down to a problem in --- src/appl_file.cxx 2025-07-20 12:44:59.505362251 +0200
+++ src/appl_file.cxx.new 2025-07-20 12:46:00.353096239 +0200
@@ -27,8 +27,8 @@
// std::cout << "appl::file::opening file: " << filename() << "\toptions: " << mopt << std::endl;
/// file sizes - actual filesize and uncopmpressed size
- int filesize = 0;
- int usize = 0;
+ off_t filesize = 0;
+ off_t usize = 0;
if ( mopt.find("r")!=std::string::npos ) {
@@ -50,7 +50,7 @@
if ( (zip_signature&0xffffff)==0x88b1f ) {
/// if it is read the file size, and the uncompressed filesize from the file ...
/// uncompressed size
- int offset = filesize - 4;
+ off_t offset = filesize - 4;
fseek( tmp_file, offset, SEEK_SET );
fread( &usize, sizeof(int), 1, tmp_file );
}
@@ -117,11 +117,11 @@
gzseek( mfile, usize-sizeof(SB::TYPE)*3, SEEK_SET );
gzread( mfile, (void*)vtrailer, sizeof(double)*3 );
- int index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes
+ off_t index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes
std::vector<SB::TYPE> vindex(index_size/sizeof(SB::TYPE));
- int indexptr = vtrailer[0];
+ off_t indexptr = vtrailer[0];
/// std::cout << "vtrailer[0]: " << vtrailer[0] << std::endl;
/// std::cout << "vtrailer[1]: " << vtrailer[1] << std::endl;I changed |
|
OK thanks, although are you really creating grids which are larger than
2 GB ? I would question the efficacy
of such large grids.
In addition, I had to recently modify the grid code because of the
unfortunate tendency for all lumipdfs
in pineappl convertex grids to be given the same
PineAPPL-Lumi.config
name, which will cause problems if you try to load grids for different
processes with different parton-parton
luminosity combinations.
I added code to circumvent that, such that if it find lumipdfs with the
same name, but different parton-parton
combinationms, then it automatically renames the girds internall, but I
don;t really like doing that, since you
lose the correspondence between the actual name encoded in the grid, and
what the gird is using.
So really it would be better practice to encode the name of the process
in the lumipdf name itself, such as with
wmjets-PineAPPL-Lumi.config
for W- + jets and so on. In principle, when running a calculation, the
lumipdf config can also be read from a
file, so clearly they would all need to have process specific names.
Thanks
Mark
…On 20/07/2025 12:18, Christopher Schwan wrote:
*cschwan* left a comment (NNPDF/pineappl#358)
<#358 (comment)>
@jamspandex <https://github.com/jamspandex>: I'm not sure how to
contact you properly to report a problem with APPLgrid, so I'm doing
it here: at least one of the exported grids,
|https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-cdf-z0-arxiv-0908.3914/pinejet-cdf-z0-arxiv-0908.3914.tgz
|
exports properly but then can't be read by APPLgrid, which exits with
an exception:
|cannot create std::vector larger than max_size() |
I tracked this down to a problem in |src/appl_file.cxx|, which uses
|int| for file sizes, but the uncompressed APPLgrid size is larger
than |2^31 - 1|, which in turn is larger than |int| can hold on my
system. This fix is quite simple:
--- src/appl_file.cxx 2025-07-20 12:44:59.505362251 +0200
+++ src/appl_file.cxx.new 2025-07-20 12:46:00.353096239 +0200
@@ -27,8 +27,8 @@
// std::cout << "appl::file::opening file: " << filename() << "\toptions: " << mopt << std::endl;
/// file sizes - actual filesize and uncopmpressed size
- int filesize = 0;
- int usize = 0;
+ off_t filesize = 0;
+ off_t usize = 0;
if ( mopt.find("r")!=std::string::npos ) {
@@ -50,7 +50,7 @@
if ( (zip_signature&0xffffff)==0x88b1f ) {
/// if it is read the file size, and the uncompressed filesize from the file ...
/// uncompressed size
- int offset = filesize - 4;
+ off_t offset = filesize - 4;
fseek( tmp_file, offset, SEEK_SET );
fread( &usize, sizeof(int), 1, tmp_file );
}
@@ -117,11 +117,11 @@
gzseek( mfile, usize-sizeof(SB::TYPE)*3, SEEK_SET );
gzread( mfile, (void*)vtrailer, sizeof(double)*3 );
- int index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); ///
in bytes
+ off_t index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); ///
in bytes
std::vector<SB::TYPE> vindex(index_size/sizeof(SB::TYPE));
- int indexptr = vtrailer[0];
+ off_t indexptr = vtrailer[0];
/// std::cout << "vtrailer[0]: " << vtrailer[0] << std::endl;
/// std::cout << "vtrailer[1]: " << vtrailer[1] << std::endl;
I changed |int| to |off_t|, which is the type of |stat_buf.st_size|.
Let me know if you need to know more!
—
Reply to this email directly, view it on GitHub
<#358 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZUEJLP7MJCJLX5RJTKEID3JN3IHAVCNFSM6AAAAACBM2GUUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOJUGQ2TEMRTHE>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--------------vHpYe1tGU0DoCVSNP0HMxGJk
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#fdfcff">
<br>
OK thanks, although are you really creating grids which are larger
than 2 GB ? I would question the efficacy <br>
of such large grids.<br>
<br>
In addition, I had to recently modify the grid code because of the
unfortunate tendency for all lumipdfs<br>
in pineappl convertex grids to be given the same <br>
<br>
PineAPPL-Lumi.config<br>
<br>
name, which will cause problems if you try to load grids for
different processes with different parton-parton<br>
luminosity combinations. <br>
<br>
I added code to circumvent that, such that if it find lumipdfs with
the same name, but different parton-parton<br>
combinationms, then it automatically renames the girds internall,
but I don;t really like doing that, since you <br>
lose the correspondence between the actual name encoded in the grid,
and what the gird is using.<br>
<br>
So really it would be better practice to encode the name of the
process in the lumipdf name itself, such as with <br>
<br>
wmjets-PineAPPL-Lumi.config<br>
<br>
for W- + jets and so on. In principle, when running a calculation,
the lumipdf config can also be read from a <br>
file, so clearly they would all need to have process specific names.
<br>
<br>
Thanks<br>
Mark<br>
<br>
<br>
<br>
<br>
<div class="moz-cite-prefix">On 20/07/2025 12:18, Christopher Schwan
wrote:<br>
</div>
<blockquote type="cite" ***@***.***">
<div style="display: flex; flex-wrap: wrap; white-space: pre-wrap; align-items: center; "><img style="border-radius:50%; margin-right: 4px;" decoding="async" src="https://avatars.githubusercontent.com/u/94436?s=20&v=4" moz-do-not-send="true" width="20" height="20"><strong>cschwan</strong> left a comment <a href="#358 (comment)" moz-do-not-send="true">(NNPDF/pineappl#358)</a></div>
<p dir="auto"><a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/jamspandex/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/jamspandex" ***@***.***</a>:
I'm not sure how to contact you properly to report a problem
with APPLgrid, so I'm doing it here: at least one of the
exported grids,</p>
<pre class="notranslate"><code class="notranslate"><a class="moz-txt-link-freetext" href="https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-cdf-z0-arxiv-0908.3914/pinejet-cdf-z0-arxiv-0908.3914.tgz">https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-cdf-z0-arxiv-0908.3914/pinejet-cdf-z0-arxiv-0908.3914.tgz</a>
</code></pre>
<p dir="auto">exports properly but then can't be read by APPLgrid,
which exits with an exception:</p>
<pre class="notranslate"><code class="notranslate">cannot create std::vector larger than max_size()
</code></pre>
<p dir="auto">I tracked this down to a problem in <code class="notranslate">src/appl_file.cxx</code>, which uses <code class="notranslate">int</code> for file sizes, but the
uncompressed APPLgrid size is larger than <code class="notranslate">2^31 - 1</code>, which in turn is larger
than <code class="notranslate">int</code> can hold on my
system. This fix is quite simple:</p>
<div class="highlight highlight-source-diff" dir="auto">
<pre class="notranslate"><span class="pl-md">--- src/appl_file.cxx 2025-07-20 12:44:59.505362251 +0200</span>
<span class="pl-mi1">+++ src/appl_file.cxx.new 2025-07-20 12:46:00.353096239 +0200</span>
<span class="pl-mdr">@@ -27,8 +27,8 @@</span>
// std::cout << "appl::file::opening file: " << filename() << "\toptions: " << mopt << std::endl;
/// file sizes - actual filesize and uncopmpressed size
<span class="pl-md"><span class="pl-md">-</span> int filesize = 0;</span>
<span class="pl-md"><span class="pl-md">-</span> int usize = 0;</span>
<span class="pl-mi1"><span class="pl-mi1">+</span> off_t filesize = 0;</span>
<span class="pl-mi1"><span class="pl-mi1">+</span> off_t usize = 0;</span>
if ( mopt.find("r")!=std::string::npos ) {
<span class="pl-mdr">@@ -50,7 +50,7 @@</span>
if ( (zip_signature&0xffffff)==0x88b1f ) {
/// if it is read the file size, and the uncompressed filesize from the file ...
/// uncompressed size
<span class="pl-md"><span class="pl-md">-</span> int offset = filesize - 4;</span>
<span class="pl-mi1"><span class="pl-mi1">+</span> off_t offset = filesize - 4;</span>
fseek( tmp_file, offset, SEEK_SET );
fread( &usize, sizeof(int), 1, tmp_file );
}
<span class="pl-mdr">@@ -117,11 +117,11 @@</span>
gzseek( mfile, usize-sizeof(SB::TYPE)*3, SEEK_SET );
gzread( mfile, (void*)vtrailer, sizeof(double)*3 );
<span class="pl-md"><span class="pl-md">-</span> int index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes</span>
<span class="pl-mi1"><span class="pl-mi1">+</span> off_t index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes</span>
std::vector<SB::TYPE> vindex(index_size/sizeof(SB::TYPE));
<span class="pl-md"><span class="pl-md">-</span> int indexptr = vtrailer[0];</span>
<span class="pl-mi1"><span class="pl-mi1">+</span> off_t indexptr = vtrailer[0];</span>
/// std::cout << "vtrailer[0]: " << vtrailer[0] << std::endl;
/// std::cout << "vtrailer[1]: " << vtrailer[1] << std::endl;</pre>
</div>
<p dir="auto">I changed <code class="notranslate">int</code> to <code class="notranslate">off_t</code>, which is the type of <code class="notranslate">stat_buf.st_size</code>. Let me know if
you need to know more!</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br>
Reply to this email directly, <a href="#358 (comment)" moz-do-not-send="true">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAZUEJLP7MJCJLX5RJTKEID3JN3IHAVCNFSM6AAAAACBM2GUUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOJUGQ2TEMRTHE" moz-do-not-send="true">unsubscribe</a>.<br>
You are receiving this because you were mentioned.<img src="https://github.com/notifications/beacon/AAZUEJIXMIWDGKFRRCG5M3T3JN3IHA5CNFSM6AAAAACBM2GUUCWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTVYOGMA6.gif" alt="" moz-do-not-send="true" width="1" height="1"><span style="color: transparent; font-size: 0; display: none;
visibility: hidden; overflow: hidden; opacity: 0; width: 0;
height: 0; max-width: 0; max-height: 0; mso-hide: all">Message
ID: <span><NNPDF/pineappl/pull/358/c3094452239</span><span>@</span><span>github</span><span>.</span><span>com></span></span></p>
<script type="application/ld+json">[
{
***@***.***": "http://schema.org",
***@***.***": "EmailMessage",
"potentialAction": {
***@***.***": "ViewAction",
"target": "#358 (comment)",
"url": "#358 (comment)",
"name": "View Pull Request"
},
"description": "View this Pull Request on GitHub",
"publisher": {
***@***.***": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]</script>
</blockquote>
<br>
</body>
</html>
--------------vHpYe1tGU0DoCVSNP0HMxGJk--
|
Yes and no, before the conversion to APPLgrid I'm rewriting the luminosities, which probably replicate some subgrids very inefficiently, but the LZ4 and GZIP compression algorithms fix that again. I think the compressed grids (both APPLgrid and PineAPPL) are about 70 MB large, but uncompressed much larger.
I didn't know that this is a problem, I will change it. |
|
I opened a new Issue for the last remaining item: #361. |
TODO:
Grid::optimizewrongly detects as a static scale