Module mpp_io_mod

mpp_io_mod, is a set of simple calls for parallel I/O on distributed systems. It is geared toward the writing of data in netCDF format. It requires the modules mpp_domains_mod and mpp_mod, upon which it is built.

In massively parallel environments, an often difficult problem is the reading and writing of data to files on disk. MPI-IO and MPI-2 IO are moving toward providing this capability, but are currently not widely implemented. Further, it is a rather abstruse API. mpp_io_mod is an attempt at a simple API encompassing a certain variety of the I/O tasks that will be required. It does not attempt to be an all-encompassing standard such as MPI, however, it can be implemented in MPI if so desired. It is equally simple to add parallel I/O capability to mpp_io_mod based on vendor-specific APIs while providing a layer of insulation for user codes.

The mpp_io_mod parallel I/O API built on top of the mpp_domains_mod and mpp_mod API for domain decomposition and message passing. Features of mpp_io_mod include:

1) Simple, minimal API, with free access to underlying API for more complicated stuff.
2) Self-describing files: comprehensive header information (metadata) in the file itself.
3) Strong focus on performance of parallel write: the climate models for which it is designed typically read a minimal amount of data (typically at the beginning of the run); but on the other hand, tend to write copious amounts of data during the run. An interface for reading is also supplied, but its performance has not yet been optimized.
4) Integrated netCDF capability: netCDF is a data format widely used in the climate/weather modeling community. netCDF is considered the principal medium of data storage for mpp_io_mod. But I provide a raw unformatted fortran I/O capability in case netCDF is not an option, either due to unavailability, inappropriateness, or poor performance.
5) May require off-line post-processing: a tool for this purpose, mppnccombine, is available. GFDL users may use ~hnv/pub/mppnccombine. Outside users may obtain the source here. It can be compiled on any C compiler and linked with the netCDF library. The program is free and is covered by the GPL license.

The internal representation of the data being written out is assumed be the default real type, which can be 4 or 8-byte. Time data is always written as 8-bytes to avoid overflow on climatic time scales in units of seconds.

I/O modes in mpp_io_mod

The I/O activity critical to performance in the models for which mpp_io_mod is designed is typically the writing of large datasets on a model grid volume produced at intervals during a run. Consider a 3D grid volume, where model arrays are stored as (i,j,k). The domain decomposition is typically along i or j: thus to store data to disk as a global volume, the distributed chunks of data have to be seen as non-contiguous. If we attempt to have all PEs write this data into a single file, performance can be seriously compromised because of the data reordering that will be required. Possible options are to have one PE acquire all the data and write it out, or to have all the PEs write independent files, which are recombined offline. These three modes of operation are described in the mpp_io_mod terminology in terms of two parameters, threading and fileset, as follows:

Single-threaded I/O: a single PE acquires all the data and writes it out.
Multi-threaded, single-fileset I/O: many PEs write to a single file.
Multi-threaded, multi-fileset I/O: many PEs write to independent files. This is also called distributed I/O.

The middle option is the most difficult to achieve performance. The choice of one of these modes is made when a file is opened for I/O, in mpp_open.

Metadata in mpp_io_mod

A requirement of the design of mpp_io_mod is that the file must be entirely self-describing: comprehensive header information describing its contents is present in the header of every file. The header information follows the model of netCDF. Variables in the file are divided into axes and fields. An axis describes a co-ordinate variable, e.g x,y,z,t. A field consists of data in the space described by the axes. An axis is described in mpp_io_mod using the defined type axistype:

   type, public :: axistype
      character(len=128) :: name
      character(len=128) :: units
      character(len=256) :: longname
      character(len=8) :: cartesian
      integer :: len
      integer :: sense           !+/-1, depth or height?
      type(domain1D), pointer :: domain
      real, dimension(:), pointer :: data
      integer :: id, did
      integer :: type  ! external NetCDF type format for axis data
      integer :: natt
      type(atttype), pointer :: Att(:) ! axis attributes
   end type axistype
A field is described using the type fieldtype:

   type, public :: fieldtype
      character(len=128) :: name
      character(len=128) :: units
      character(len=256) :: longname
      real :: min, max, missing, fill, scale, add
      integer :: pack
      type(axistype), dimension(:), pointer :: axes
      integer, dimension(:), pointer :: size
      integer :: time_axis_index
      integer :: id
      integer :: type ! external NetCDF format for field data
      integer :: natt, ndim
      type(atttype), pointer :: Att(:) ! field metadata
   end type fieldtype
An attribute (global, field or axis) is described using the atttype:

   type, public :: atttype
      integer :: type, len
      character(len=128) :: name
      character(len=256)  :: catt
      real(FLOAT_KIND), pointer :: fatt(:)
   end type atttype
This default set of field attributes corresponds closely to various conventions established for netCDF files. The pack attribute of a field defines whether or not a field is to be packed on output. Allowed values of pack are 1,2,4 and 8. The value of pack is the number of variables written into 8 bytes. In typical use, we write 4-byte reals to netCDF output; thus the default value of pack is 2. For pack = 4 or 8, packing uses a simple-minded linear scaling scheme using the scale and add attributes. There is thus likely to be a significant loss of dynamic range with packing. When a field is declared to be packed, the missing and fill attributes, if supplied, are packed also.

Please note that the pack values are the same even if the default real is 4 bytes, i.e PACK=1 still follows the definition above and writes out 8 bytes.

A set of attributes for each variable is also available. The variable definitions and attribute information is written/read by calling mpp_write_meta or mpp_read_meta. A typical calling sequence for writing data might be:

     type(domain2D), dimension(:), allocatable, target :: domain
     type(fieldtype) :: field
     type(axistype) :: x, y, z, t
     call mpp_define_domains( (/1,nx,1,ny/), domain )
     allocate( a(domain(pe)%x%data%start_index:domain(pe)%x%data%end_index, &
                 domain(pe)%y%data%start_index:domain(pe)%y%data%end_index,nz) )
     call mpp_write_meta( unit, x, 'X', 'km', 'X distance', &
          domain=domain(pe)%x, data=(/(float(i),i=1,nx)/) )
     call mpp_write_meta( unit, y, 'Y', 'km', 'Y distance', &
          domain=domain(pe)%y, data=(/(float(i),i=1,ny)/) )
     call mpp_write_meta( unit, z, 'Z', 'km', 'Z distance', &
          data=(/(float(i),i=1,nz)/) )
     call mpp_write_meta( unit, t, 'Time', 'second', 'Time' )
     call mpp_write_meta( unit, field, (/x,y,z,t/), 'a', '(m/s)', AAA', &
          missing=-1e36 )
     call mpp_write( unit, x )
     call mpp_write( unit, y )
     call mpp_write( unit, z )
In this example, x and y have been declared as distributed axes, since a domain decomposition has been associated. z and t are undistributed axes. t is known to be a record axis (netCDF terminology) since we do not allocate the data element of the axistype. Only one record axis may be associated with a file. The call to mpp_write_meta initializes the axes, and associates a unique variable ID with each axis. The call to mpp_write_meta with argument field declared field to be a 4D variable that is a function of (x,y,z,t), and a unique variable ID is associated with it. A 3D field will be written at each call to mpp_write(field).

The data to any variable, including axes, is written by mpp_write.

Any additional attributes of variables can be added through subsequent mpp_write_meta calls, using the variable ID as a handle. Global attributes, associated with the dataset as a whole, can also be written thus. See the mpp_write_meta call syntax below for further details.

You cannot interleave calls to mpp_write and mpp_write_meta: the first call to mpp_write implies that metadata specification is complete.

A typical calling sequence for reading data might be:

     integer :: unit, natt, nvar, ntime
     type(domain2D), dimension(:), allocatable, target :: domain
     type(fieldtype), allocatable, dimension(:) :: fields
     type(atttype), allocatable, dimension(:) :: global_atts
     real, allocatable, dimension(:) :: times
     call mpp_define_domains( (/1,nx,1,ny/), domain )
     call mpp_read_meta(unit)
     call mpp_get_info(unit,natt,nvar,ntime)
     call mpp_get_atts(unit,global_atts)
     call mpp_get_vars(unit, fields)
     call mpp_get_times(unit, times)
     allocate( a(domain(pe)%x%data%start_index:domain(pe)%x%data%end_index, &
                 domain(pe)%y%data%start_index:domain(pe)%y%data%end_index,nz) )
     do i=1, nvar
       if (fields(i)%name == 'a')  call mpp_read(unit,fields(i),domain(pe), a,
In this example, the data are distributed as in the previous example. The call to mpp_read_meta initializes all of the metadata associated with the file, including global attributes, variable attributes and non-record dimension data. The call to mpp_get_info returns the number of global attributes (natt), variables (nvar) and time levels (ntime) associated with the file identified by a unique ID (unit). mpp_get_atts returns all global attributes for the file in the derived type atttype(natt). mpp_get_vars returns variable types (fieldtype(nvar)). Since the record dimension data are not allocated for calls to mpp_write, a separate call to mpp_get_times is required to access record dimension data. Subsequent calls to mpp_read return the field data arrays corresponding to the fieldtype. The domain type is an optional argument. If domain is omitted, the incoming field array should be dimensioned for the global domain, otherwise, the field data is assigned to the computational domain of a local array.

Multi-fileset reads are not supported with mpp_read.




Get file global metdata.
Read from an open file.
Write metadata.
Write to an open file.




  1. mpp_get_atts

    call mpp_get_atts ( unit, global_atts)
    Get file global metdata.


  2. mpp_read

    call mpp_read ( unit, field, data, time_index )
    call mpp_read ( unit, field, domain, data, time_index )
    mpp_read is used to read data to the file on an I/O unit using the file parameters supplied by mpp_open. There are two forms of mpp_read, one to read distributed field data, and one to read non-distributed field data. Distributed data refer to arrays whose two fastest-varying indices are domain-decomposed. Distributed data must be 2D or 3D (in space). Non-distributed data can be 0-3D.

    The data argument for distributed data is expected by mpp_read to contain data specified on the data domain, and will read the data belonging to the compute domain, fetching data as required by the parallel I/O mode specified in the mpp_open call. This is consistent with our definition of domains, where all arrays are expected to be dimensioned on the data domain, and all operations performed on the compute domain.

    time_index    time_index is an optional argument. It is to be omitted if the field was defined not to be a function of time. Results are unpredictable if the argument is supplied for a time- independent field, or omitted for a time-dependent field.


    The type of read performed by mpp_read depends on the file characteristics on the I/O unit specified at the mpp_open call. Specifically, the format of the input data (e.g netCDF or IEEE) and the threading flags, etc., can be changed there, and require no changes to the mpp_read calls. (fileset = MPP_MULTI is not supported by mpp_read; IEEE is currently not supported).

    Packed variables are unpacked using the scale and add attributes.

    mpp_read_meta must be called prior to calling mpp_read.

  3. mpp_write_meta

    call mpp_write_meta ( unit, axis, name, units, longname, cartesian, sense, domain, data )
    call mpp_write_meta ( unit, field, axes, name, units, longname, min, max, missing, fill, scale, add, pack )
    call mpp_write_meta ( unit, id, name, rval=rval, pack=pack )
    call mpp_write_meta ( unit, id, name, ival=ival )
    call mpp_write_meta ( unit, id, name, cval=cval )
    call mpp_write_meta ( unit, name, rval=rval, pack=pack )
    call mpp_write_meta ( unit, name, ival=ival )
    call mpp_write_meta ( unit, name, cval=cval )
    This routine is used to write the metadata describing the contents of a file being written. Each file can contain any number of fields, which are functions of 0-3 space axes and 0-1 time axes. (Only one time axis can be defined per file). The basic metadata defined above for axistype and fieldtype are written in the first two forms of the call shown below. These calls will associate a unique variable ID with each variable (axis or field). These can be used to attach any other real, integer or character attribute to a variable. The last form is used to define a global real, integer or character attribute that applies to the dataset as a whole.

    min, max   


    The first form defines a time or space axis. Metadata corresponding to the type above are written to the file on <unit>. A unique ID for subsequen references to this axis is returned in axis%id. If the <domain> element is present, this is recognized as a distributed data axis and domain decomposition information is also written if required (the domain decomposition info is required for multi-fileset multi-threaded I/O). If the <data> element is allocated, it is considered to be a space axis, otherwise it is a time axis with an unlimited dimension. Only one time axis is allowed per file.

    The second form defines a field. Metadata corresponding to the type above are written to the file on <unit>. A unique ID for subsequen references to this field is returned in field%id. At least one axis must be associated, 0D variables are not considered. mpp_write_meta must previously have been called on all axes associated with this field.

    The third form (3 - 5) defines metadata associated with a previously defined axis or field, identified to mpp_write_meta by its unique ID <id>. The attribute is named <name> and can take on a real, integer or character value. <rval> and <ival> can be scalar or 1D arrays. This need not be called for attributes already contained in the type.

    The last form (6 - 8) defines global metadata associated with the file as a whole. The attribute is named <name> and can take on a real, integer or character value. <rval> and <ival> can be scalar or 1D arrays.

    Note that mpp_write_meta is expecting axis data on the global domain even if it is a domain-decomposed axis.

    You cannot interleave calls to mpp_write and mpp_write_meta: the first call to mpp_write implies that metadata specification is complete.

  4. mpp_write

    mpp_write ( unit, axis )
    mpp_write ( unit, field, data, tstamp )
    mpp_write ( unit, field, domain, data, tstamp )
    mpp_write is used to write data to the file on an I/O unit using the file parameters supplied by mpp_open. Axis and field definitions must have previously been written to the file using mpp_write_meta. There are three forms of mpp_write, one to write axis data, one to write distributed field data, and one to write non-distributed field data. Distributed data refer to arrays whose two fastest-varying indices are domain-decomposed. Distributed data must be 2D or 3D (in space). Non-distributed data can be 0-3D.

    The data argument for distributed data is expected by mpp_write to contain data specified on the data domain, and will write the data belonging to the compute domain, fetching or sending data as required by the parallel I/O mode specified in the mpp_open call. This is consistent with our definition of domains, where all arrays are expected to be dimensioned on the data domain, and all operations performed on the compute domain.

    The type of the data argument must be a default real, which can be 4 or 8 byte.

    tstamp    tstamp is an optional argument. It is to be omitted if the field was defined not to be a function of time. Results are unpredictable if the argument is supplied for a time- independent field, or omitted for a time-dependent field. Repeated writes of a time-independent field are also not recommended. One time level of one field is written per call. tstamp must be an 8-byte real, even if the default real type is 4-byte.

    The type of write performed by mpp_write depends on the file characteristics on the I/O unit specified at the mpp_open call. Specifically, the format of the output data (e.g netCDF or IEEE), the threading and fileset flags, etc., can be changed there, and require no changes to the mpp_write calls.

    Packing is currently not implemented for non-netCDF files, and the pack attribute is ignored. On netCDF files, NF_DOUBLEs (8-byte IEEE floating point numbers) are written for pack=1 and NF_FLOATs for pack=2. (pack=2 gives the customary and default behaviour). We write NF_SHORTs (2-byte integers) for pack=4, or NF_BYTEs (1-byte integers) for pack=8. Integer scaling is done using the scale and add attributes at pack=4 or 8, satisfying the relation

        data = packed_data*scale + add
    NOTE: mpp_write does not check to see if the scaled data in fact fits into the dynamic range implied by the specified packing. It is incumbent on the user to supply correct scaling attributes.

    You cannot interleave calls to mpp_write and mpp_write_meta: the first call to mpp_write implies that metadata specification is complete.




