Fix PineAPPL to APPLgrid conversion problems #358

cschwan · 2025-07-13T10:05:50Z

TODO:

Issue when converting grids into APPLgrid #357; fixed by commit 2d72820
proton-anti-proton grids should successfully convert, but the convolution test fails, because we pass a proton PDF instead of an anti-proton PDF; fixed by commit d337717
one grid fails, because it contains a channel that was integrated with a single phase-space point which Grid::optimize wrongly detects as a static scale
fix wrong coupling-order permutation; fixed by commit 335e8c5
change internal APPLgrid lumipdf name; fixed by commit 6f1fed1

cschwan · 2025-07-13T12:10:23Z

Commit 2d72820 fixes the problem described in #357, but it uncovered another problem. Apparently, the coupling orders are wrongly permuted: LO subgrids seem to be saved as NNLO subgrids and vice versa, NLO subgrids are correct.

cschwan · 2025-07-20T11:18:05Z

@jamspandex: I'm not sure how to contact you properly to report a problem with APPLgrid, so I'm doing it here: two of the exported grids,

https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-cdf-z0-arxiv-0908.3914/pinejet-cdf-z0-arxiv-0908.3914.tgz
https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-d0-z0-arxiv-0702025/pinejet-d0-z0-arxiv-0702025.tgz

export properly but then can't be read by APPLgrid, which exits with an exception:

cannot create std::vector larger than max_size()

I tracked this down to a problem in src/appl_file.cxx, which uses int for file sizes, but the uncompressed APPLgrid size is larger than 2^31 - 1, which in turn is larger than int can hold on my system. This fix is quite simple:

--- src/appl_file.cxx	2025-07-20 12:44:59.505362251 +0200
+++ src/appl_file.cxx.new	2025-07-20 12:46:00.353096239 +0200
@@ -27,8 +27,8 @@
     //    std::cout << "appl::file::opening file: " << filename() << "\toptions: " << mopt << std::endl; 
 
     /// file sizes - actual filesize and uncopmpressed size
-    int filesize = 0;
-    int usize    = 0;
+    off_t filesize = 0;
+    off_t usize    = 0;
     
     if ( mopt.find("r")!=std::string::npos ) { 
 
@@ -50,7 +50,7 @@
       if ( (zip_signature&0xffffff)==0x88b1f ) {   
 	/// if it is read the file size, and the uncompressed filesize from the file ...
 	/// uncompressed size
-	int offset = filesize - 4;
+	off_t offset = filesize - 4;
 	fseek( tmp_file, offset, SEEK_SET );
 	fread( &usize, sizeof(int), 1, tmp_file );
       }
@@ -117,11 +117,11 @@
       gzseek( mfile, usize-sizeof(SB::TYPE)*3, SEEK_SET );
       gzread( mfile, (void*)vtrailer, sizeof(double)*3 );
 
-      int index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes
+      off_t index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes
       
       std::vector<SB::TYPE> vindex(index_size/sizeof(SB::TYPE)); 
            
-      int indexptr = vtrailer[0];
+      off_t indexptr = vtrailer[0];
 
       ///      std::cout << "vtrailer[0]: " << vtrailer[0] << std::endl;
       ///      std::cout << "vtrailer[1]: " << vtrailer[1] << std::endl;

I changed int to off_t, which is the type of stat_buf.st_size. Let me know if you need to know more!

jamspandex · 2025-07-20T13:14:08Z

OK thanks, although are you really creating grids which are larger than 2 GB ? I would question the efficacy of such large grids. In addition, I had to recently modify the grid code because of the unfortunate tendency for all lumipdfs in pineappl convertex grids to be given the same PineAPPL-Lumi.config name, which will cause problems if you try to load grids for different processes with different parton-parton luminosity combinations. I added code to circumvent that, such that if it find lumipdfs with the same name, but different parton-parton combinationms, then it automatically renames the girds internall, but I don;t really like doing that, since you lose the correspondence between the actual name encoded in the grid, and what the gird is using. So really it would be better practice to encode the name of the process in the lumipdf name itself, such as with wmjets-PineAPPL-Lumi.config for W- + jets and so on. In principle, when running a calculation, the lumipdf config can also be read from a file, so clearly they would all need to have process specific names. Thanks Mark

…

On 20/07/2025 12:18, Christopher Schwan wrote: *cschwan* left a comment (NNPDF/pineappl#358) <#358 (comment)> @jamspandex <https://github.com/jamspandex>: I'm not sure how to contact you properly to report a problem with APPLgrid, so I'm doing it here: at least one of the exported grids, |https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-cdf-z0-arxiv-0908.3914/pinejet-cdf-z0-arxiv-0908.3914.tgz | exports properly but then can't be read by APPLgrid, which exits with an exception: |cannot create std::vector larger than max_size() | I tracked this down to a problem in |src/appl_file.cxx|, which uses |int| for file sizes, but the uncompressed APPLgrid size is larger than |2^31 - 1|, which in turn is larger than |int| can hold on my system. This fix is quite simple: --- src/appl_file.cxx 2025-07-20 12:44:59.505362251 +0200 +++ src/appl_file.cxx.new 2025-07-20 12:46:00.353096239 +0200 @@ -27,8 +27,8 @@ // std::cout << "appl::file::opening file: " << filename() << "\toptions: " << mopt << std::endl; /// file sizes - actual filesize and uncopmpressed size - int filesize = 0; - int usize = 0; + off_t filesize = 0; + off_t usize = 0; if ( mopt.find("r")!=std::string::npos ) { @@ -50,7 +50,7 @@ if ( (zip_signature&0xffffff)==0x88b1f ) { /// if it is read the file size, and the uncompressed filesize from the file ... /// uncompressed size - int offset = filesize - 4; + off_t offset = filesize - 4; fseek( tmp_file, offset, SEEK_SET ); fread( &usize, sizeof(int), 1, tmp_file ); } @@ -117,11 +117,11 @@ gzseek( mfile, usize-sizeof(SB::TYPE)*3, SEEK_SET ); gzread( mfile, (void*)vtrailer, sizeof(double)*3 ); - int index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes + off_t index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes std::vector<SB::TYPE> vindex(index_size/sizeof(SB::TYPE)); - int indexptr = vtrailer[0]; + off_t indexptr = vtrailer[0]; /// std::cout << "vtrailer[0]: " << vtrailer[0] << std::endl; /// std::cout << "vtrailer[1]: " << vtrailer[1] << std::endl; I changed |int| to |off_t|, which is the type of |stat_buf.st_size|. Let me know if you need to know more! — Reply to this email directly, view it on GitHub <#358 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZUEJLP7MJCJLX5RJTKEID3JN3IHAVCNFSM6AAAAACBM2GUUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOJUGQ2TEMRTHE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

--------------vHpYe1tGU0DoCVSNP0HMxGJk Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body text="#000000" bgcolor="#fdfcff"> OK thanks, although are you really creating grids which are larger than 2 GB ?  I would question the efficacy of such large grids. In addition, I had to recently modify the grid code because of the unfortunate tendency for all lumipdfs in pineappl convertex grids to be given the same   PineAPPL-Lumi.config name, which will cause problems if you try to load grids for different processes with different parton-parton luminosity combinations. I added code to circumvent that, such that if it find lumipdfs with the same name, but different parton-parton combinationms, then it automatically renames the girds internall, but I don;t really like doing that, since you lose the correspondence between the actual name encoded in the grid, and what the gird is using. So really it would be better practice to encode the name of the process in the lumipdf name itself, such as with   wmjets-PineAPPL-Lumi.config for W- + jets and so on.  In principle, when running a calculation, the lumipdf config can also be read from a file, so clearly they would all need to have process specific names. Thanks Mark   <div class="moz-cite-prefix">On 20/07/2025 12:18, Christopher Schwan wrote: </div> <blockquote type="cite" ***@***.***"> <div style="display: flex; flex-wrap: wrap; white-space: pre-wrap; align-items: center; "><img style="border-radius:50%; margin-right: 4px;" decoding="async" src="https://avatars.githubusercontent.com/u/94436?s=20&v=4" moz-do-not-send="true" width="20" height="20">cschwan left a comment <a href="#358 (comment)" moz-do-not-send="true">(NNPDF/pineappl#358)</a></div> <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/jamspandex/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/jamspandex" ***@***.***</a>: I'm not sure how to contact you properly to report a problem with APPLgrid, so I'm doing it here: at least one of the exported grids, <pre class="notranslate"><code class="notranslate"><a class="moz-txt-link-freetext" href="https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-cdf-z0-arxiv-0908.3914/pinejet-cdf-z0-arxiv-0908.3914.tgz">https://ploughshare.web.cern.ch/ploughshare/db/pinejet/pinejet-cdf-z0-arxiv-0908.3914/pinejet-cdf-z0-arxiv-0908.3914.tgz</a> </code></pre> exports properly but then can't be read by APPLgrid, which exits with an exception: <pre class="notranslate"><code class="notranslate">cannot create std::vector larger than max_size() </code></pre> I tracked this down to a problem in <code class="notranslate">src/appl_file.cxx</code>, which uses <code class="notranslate">int</code> for file sizes, but the uncompressed APPLgrid size is larger than <code class="notranslate">2^31 - 1</code>, which in turn is larger than <code class="notranslate">int</code> can hold on my system. This fix is quite simple: <div class="highlight highlight-source-diff" dir="auto"> <pre class="notranslate">--- src/appl_file.cxx 2025-07-20 12:44:59.505362251 +0200 +++ src/appl_file.cxx.new 2025-07-20 12:46:00.353096239 +0200 @@ -27,8 +27,8 @@ // std::cout << "appl::file::opening file: " << filename() << "\toptions: " << mopt << std::endl; /// file sizes - actual filesize and uncopmpressed size - int filesize = 0; - int usize = 0; + off_t filesize = 0; + off_t usize = 0; if ( mopt.find("r")!=std::string::npos ) { @@ -50,7 +50,7 @@ if ( (zip_signature&0xffffff)==0x88b1f ) { /// if it is read the file size, and the uncompressed filesize from the file ... /// uncompressed size - int offset = filesize - 4; + off_t offset = filesize - 4; fseek( tmp_file, offset, SEEK_SET ); fread( &usize, sizeof(int), 1, tmp_file ); } @@ -117,11 +117,11 @@ gzseek( mfile, usize-sizeof(SB::TYPE)*3, SEEK_SET ); gzread( mfile, (void*)vtrailer, sizeof(double)*3 ); - int index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes + off_t index_size = (vtrailer[1]-vtrailer[0]-sizeof(SB::TYPE)*3); /// in bytes std::vector<SB::TYPE> vindex(index_size/sizeof(SB::TYPE)); - int indexptr = vtrailer[0]; + off_t indexptr = vtrailer[0]; /// std::cout << "vtrailer[0]: " << vtrailer[0] << std::endl; /// std::cout << "vtrailer[1]: " << vtrailer[1] << std::endl;</pre> </div> I changed <code class="notranslate">int</code> to <code class="notranslate">off_t</code>, which is the type of <code class="notranslate">stat_buf.st_size</code>. Let me know if you need to know more! — Reply to this email directly, <a href="#358 (comment)" moz-do-not-send="true">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAZUEJLP7MJCJLX5RJTKEID3JN3IHAVCNFSM6AAAAACBM2GUUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOJUGQ2TEMRTHE" moz-do-not-send="true">unsubscribe</a>. You are receiving this because you were mentioned.<img src="https://github.com/notifications/beacon/AAZUEJIXMIWDGKFRRCG5M3T3JN3IHA5CNFSM6AAAAACBM2GUUCWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTVYOGMA6.gif" alt="" moz-do-not-send="true" width="1" height="1">Message ID: <NNPDF/pineappl/pull/358/c3094452239@github.com> <script type="application/ld+json">[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#358 (comment)", "url": "#358 (comment)", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script> </blockquote> </body> </html>

--------------vHpYe1tGU0DoCVSNP0HMxGJk--

cschwan · 2025-07-20T14:03:20Z

OK thanks, although are you really creating grids which are larger than 2 GB ? I would question the efficacy of such large grids.

Yes and no, before the conversion to APPLgrid I'm rewriting the luminosities, which probably replicate some subgrids very inefficiently, but the LZ4 and GZIP compression algorithms fix that again. I think the compressed grids (both APPLgrid and PineAPPL) are about 70 MB large, but uncompressed much larger.

In addition, I had to recently modify the grid code because of the unfortunate tendency for all lumipdfs in pineappl convertex grids to be given the same PineAPPL-Lumi.config name, which will cause problems if you try to load grids for different processes with different parton-parton luminosity combinations. I added code to circumvent that, such that if it find lumipdfs with the same name, but different parton-parton combinationms, then it automatically renames the girds internall, but I don;t really like doing that, since you lose the correspondence between the actual name encoded in the grid, and what the gird is using. So really it would be better practice to encode the name of the process in the lumipdf name itself, such as with wmjets-PineAPPL-Lumi.config for W- + jets and so on. In principle, when running a calculation, the lumipdf config can also be read from a file, so clearly they would all need to have process specific names. Thanks Mark

I didn't know that this is a problem, I will change it.

cschwan · 2025-08-12T12:00:48Z

I opened a new Issue for the last remaining item: #361.

cschwan added 3 commits July 13, 2025 12:01

Add test script for PineAPPL -> APPLgrid conversion

7dd0fb4

Add tentative fix for #357

2d72820

Remove dead code

16d4417

cschwan self-assigned this Jul 13, 2025

cschwan added the bug Something isn't working label Jul 13, 2025

cschwan added this to the v1.2 milestone Jul 13, 2025

cschwan added 3 commits July 13, 2025 15:01

Fix coupling order setting in APPLgrid exporter

335e8c5

Fix export of proton-anti-proton grids to APPLgrid

d337717

Update CHANGELOG.md

0c819cd

Make APPLgrid lumipdf name more unique

6f1fed1

This was referenced Aug 12, 2025

Repair zero-scaled grids #360

Closed

Add CLI fixers for incorrectly static-scale-/node-optimized grids #361

Open

cschwan merged commit 181d1e5 into master Aug 12, 2025
10 checks passed

cschwan deleted the fix-export-applgrid branch August 12, 2025 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix PineAPPL to APPLgrid conversion problems #358

Fix PineAPPL to APPLgrid conversion problems #358

Uh oh!

cschwan commented Jul 13, 2025 •

edited

Loading

Uh oh!

cschwan commented Jul 13, 2025

Uh oh!

cschwan commented Jul 20, 2025 •

edited

Loading

Uh oh!

jamspandex commented Jul 20, 2025 via email

Uh oh!

cschwan commented Jul 20, 2025

Uh oh!

cschwan commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix PineAPPL to APPLgrid conversion problems #358

Fix PineAPPL to APPLgrid conversion problems #358

Uh oh!

Conversation

cschwan commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cschwan commented Jul 13, 2025

Uh oh!

cschwan commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamspandex commented Jul 20, 2025 via email

Uh oh!

cschwan commented Jul 20, 2025

Uh oh!

cschwan commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cschwan commented Jul 13, 2025 •

edited

Loading

cschwan commented Jul 20, 2025 •

edited

Loading