Skip to content

Add more information to several output files#1579

Open
erhart1 wants to merge 5 commits into
masterfrom
enh/add-more-information-to-several-output-files
Open

Add more information to several output files#1579
erhart1 wants to merge 5 commits into
masterfrom
enh/add-more-information-to-several-output-files

Conversation

@erhart1

@erhart1 erhart1 commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

This PR adds headers to several output files analogously to #1552, specifically including

  • thermo.out (dump_thermo)
  • msd.out (compute_msd)
  • viscosity.out (compute_viscosity)
  • sdc.out (compute_sdc)

This makes parsing these files simpler and much less error prone since information such as the number of atoms, the cell metric, and the time step or dump frequency are included in the file and do not have to provided externally.

Each header records the originating command and its call signature, format_version, num_atoms, cell metric (where not already in every data row), the time spacing between output rows, and column names.
The headers are compatible with numpy/pandas comment-skipping parsers.

erhart1 added 4 commits June 30, 2026 12:04
Write a #-comment header block at the start of msd.out (and any
msd_step*.out intermediate files) to make them self-describing.
The header records the originating command with its full argument
list, format version, atom count, group count, atoms per group,
cell matrix, time spacing between rows (in ps), and column names.
Column names are generated dynamically: plain names for single-group
runs, indexed names (msdx_0, msdx_1, ...) when all_groups is used.
The cell matrix is cached in a new cpu_h_[9] member during preprocess
since write() has no direct access to Box.
Write a #-comment header block at the start of thermo.out recording
the originating command, format version, atom count, time spacing
between rows (fs), and column names. The columns line is generated
dynamically: classical ensembles label the first two columns T and KE;
RPMD/TRPMD/PIMD ensembles (integrate.type >= 31) label them T_target
and KE_quantum to reflect that the temperature is the target value and
the kinetic energy is the quantum estimator. The cell is omitted from
the header because thermo.out already writes the full box matrix in
every data row.
Write a #-comment header block at the start of viscosity.out recording
the originating command, format version, atom count, initial cell
matrix, time spacing between correlation lags (ps), and column names.
The 19 fixed columns are: correlation lag time, stress autocorrelation
function (SACF) and running viscosity integral for each of the 9 stress
tensor components in the order xx yy zz xy xz yz yx zx zy. The header
is written in postprocess where box and atom are directly available, so
no member variable caching is needed.
@erhart1 erhart1 requested a review from brucefan1983 June 30, 2026 10:27
@brucefan1983

Copy link
Copy Markdown
Owner

Thanks, I will let some users to know this first, and see if there is any question.

@erhart1

erhart1 commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks, I will let some users to know this first, and see if there is any question.

Sounds good.
I added shc.out in a separate PR (#1581) since that file format is a bit special.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants