This page explains the different data formatting options available in otoole. Firstly, the format of the user configuration file is explained. Following this, the different input data formats are explained.
.. SEEALSO:: See the Simplicity_ repository for a full example of these formats
.. versionadded:: v1.0.0 The user configuration file is now required for data conversion
Most commands in otoole
require the user to specify a configuration file that describes
the parameters
, sets
, and results
in the model. This configuration file is
written in yaml
and is typically saved as config.yaml
. This section will cover how to
format the user configuration file. If the user incorrectly enters data, validation checks in
otoole
should catch this.
The table below highlights what information is required for each Set
,
Parameter
and Result
definition in the configuration file. Required values are
given by X, while optional values are given by (X).
Set | Parameter | Result | |
---|---|---|---|
name | X | X | X |
short_name | (X) | (X) | (X) |
dtype | X | X | X |
type | X | X | X |
default | X | X |
.. deprecated:: v1.0.3 The ``Calculated`` keyword is no longer needed for Result definitions
Warning
Names longer than 31 characters require a short_name
field. This is due
to character limits on excel sheet names. otoole
will raise an error if a
short_name
is not provided in these instances.
Sets are defined as follows:
SETNAME: short_name: SET (Optional) dtype: "int" or "string" type: set
Note
It's convention in OSeMOSYS to capitalize set names
Parameters are defined as follows. When referencing set indices use the full
name, not the short_name
:
ParameterName: short_name: ParamName (Optional) indices: [SETNAME, SETNAME, ...] type: param dtype: "int" or "float" default: 0
Note
It's convention in OSeMOSYS to use Pascal case for parameter names
Results are defined as follows. When referencing set indices use the full
name, not the short_name
:
AnnualEmissions: short_name: ParamName (Optional) indices: [SETNAME, SETNAME, ...] type: result dtype: "int" or "float" default: 0
Note
It's convention in OSeMOSYS to use Pascal case for result names
Below are examples of correctly formatted configuration file values. See the Simplicity repository for a complete example.
Set definition of
TECHNOLOGY
:TECHNOLOGY: dtype: str type: set
Parameter definition of
AccumulatedAnnualDemand
:AccumulatedAnnualDemand: short_name: AccAnnualDemand indices: [REGION,FUEL,YEAR] type: param dtype: float default: 0
Result definition of
AnnualEmissions
:AnnualEmissions: indices: [REGION,EMISSION,YEAR] type: result dtype: float default: 0
Tip
See the :ref:`examples` page to create a template configuration file
.. deprecated:: v1.0.0 The ``datapackage`` format is no longer supported
This section will describe how to format data for excel
, csv
, and datafile
formats.
Interfacing with otoole
through excel is a very user-friendly method to handle OSeMOSYS
input data. In the excel workbook (an *.xlsx
file), each sheet will correspond to a
single parameter or set. Parameters that are indexed over years are pivoted on the YEAR
index. This creates a wide formatted dataset, where each year is the column header, with
the first columns holding the remaining indices.
For example, referencing the Simplicity model, the AccumulatedAnnualDemand
parameter
data will be under the AccumulatedAnnualDemand
sheet and contain the data
REGION | TECHNOLOGY | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
---|---|---|---|---|---|---|---|---|
SIMPLICITY | BACKSTOP1 | 999999 | 999999 | 999999 | 999999 | 999999 | 999999 | 999999 |
SIMPLICITY | BACKSTOP2 | 999999 | 999999 | 999999 | 999999 | 999999 | 999999 | 999999 |
SIMPLICITY | ETHPLANT | 25 | 25 | 25 | 25 | 25 | 25 | 25 |
SIMPLICITY | GRID_EXP | 4000 | 4000 | 4000 | 4000 | 4000 | 4000 | 4000 |
SIMPLICITY | HYD1 | 4500 | 4500 | 4500 | 4500 | 4500 | 4500 | 4500 |
SIMPLICITY | HYD2 | 3500 | 3500 | 3500 | 3500 | 3500 | 3500 | 3500 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Parameters that are not indexed over years will have an extra column titled VALUE
.
This column will hold the input value for that parameter. For example, the
OperationalLife
parameter in the Simplicity example will be formatted
as shown
REGION | TECHNOLOGY | VALUE |
---|---|---|
SIMPLICITY | BACKSTOP1 | 1 |
SIMPLICITY | BACKSTOP2 | 1 |
SIMPLICITY | ETHPLANT | 30 |
SIMPLICITY | GAS_EXTRACTION | 1 |
SIMPLICITY | GAS_IMPORT | 1 |
SIMPLICITY | GRID_EXP | 50 |
SIMPLICITY | HYD1 | 80 |
SIMPLICITY | HYD2 | 80 |
... | ... | ... |
Set definitions will have a single column, titled VALUE
. For example, the set
TECHNOLOGY
will be formatted as shown
VALUE |
---|
BACKSTOP1 |
BACKSTOP2 |
ETHPLANT |
GAS_EXTRACTION |
GAS_IMPORT |
GRID_EXP |
HYD1 |
HYD2 |
... |
Interfacing with otoole
through a folder of CSV files is the most "computer friendly"
way to handle input data. This is due to csv files being easy to read and write, and
independent of the program, programming language, and operating system. This allows
otoole
to easily integrate into workflows.
When working with CSV data, all parameters and sets are saved under their name given in the
configuration file, and nested in a single directory. CSV data will follow long formatting
standards, where each column is the name of the index, and the final column is titled
VALUE
.
For example, the following data for AccumulatedAnnualDemand
will be under
the file data/AccumulatedAnnualDemand.csv
REGION | FUEL | YEAR | VALUE |
---|---|---|---|
SIMPLICITY | ETH | 2014 | 1 |
SIMPLICITY | RAWSUG | 2014 | 0.5 |
SIMPLICITY | ETH | 2015 | 1.03 |
SIMPLICITY | RAWSUG | 2015 | 0.51 |
SIMPLICITY | ETH | 2016 | 1.061 |
SIMPLICITY | RAWSUG | 2016 | 0.519 |
SIMPLICITY | ETH | 2017 | 1.093 |
SIMPLICITY | RAWSUG | 2017 | 0.529 |
SIMPLICITY | ETH | 2018 | 1.126 |
... | ... | ... | ... |
While the TECHNOLOGY
set data will be under the file data/TECHNOLOGY.csv`
and
formatted as shown with a single VALUE
column.
VALUE |
---|
BACKSTOP1 |
BACKSTOP2 |
ETHPLANT |
GAS_EXTRACTION |
GAS_IMPORT |
GRID_EXP |
HYD1 |
HYD2 |
... |
Datafiles are the least user-friendly method of handling data, however, they are required for the OSeMOSYS GNU MathProg version of OSeMOSYS. Datafiles are written in MathProg, which shares syntax with the AMPL programming language.
Datafiles contain all model data in one file (often a *.txt
file), and will follow
a similar data standard to long formatted CSV data. However, the default value for the
parameter is included in its declaration statement.
For example, in the file data.txt
, the parameter AccumulatedAnnualDemand
will
be defined as follows:
param default 0.0 : AccumulatedAnnualDemand := SIMPLICITY ETH 2014 1 SIMPLICITY RAWSUG 2014 0.5 SIMPLICITY ETH 2015 1.03 SIMPLICITY RAWSUG 2015 0.51 SIMPLICITY ETH 2016 1.061 SIMPLICITY RAWSUG 2016 0.519 SIMPLICITY ETH 2017 1.093 SIMPLICITY RAWSUG 2017 0.529 SIMPLICITY ETH 2018 1.126 SIMPLICITY RAWSUG 2018 0.538 SIMPLICITY ETH 2019 1.159 SIMPLICITY RAWSUG 2019 0.548 SIMPLICITY ETH 2020 1.194 SIMPLICITY RAWSUG 2020 0.558 ...
And in the same data.txt
file, the set TECHNOLOGY
will be defined as follows:
set TECHNOLOGY := BACKSTOP1 BACKSTOP2 ETHPLANT GAS_EXTRACTION GAS_IMPORT GRID_EXP HYD1 HYD2 ...
.. SEEALSO:: For reading and writing between Python and AMPL_, see the amply_ Python package.