Code Description

This program computes a 3-D hydrodynamics problem on a uniform mesh, using a simplified version of the PPM (Piecewise Parabolic Method) code.

Short problem description
The coordinates are -1:1 in x and y, and 0:zmax in z, where zmax depends on the overall aspect ratio prescribed.

A plane shock traveling up the +z axis encounters a density discontinuity, at which the gas becomes denser. The shock is carefully designed to be simple, though strong, about Mach 5. The gas initially has density 0.1 ahead of the shock; over 5dz at the discontinuity, it changes to 1.0. This discontinuity surface is also initially perturbed in +z the following way.

There are 5 components to the perturbation:

cos( 3pi/2 * sqrt(x^2+y^2) )
cos( 3pi/2 * x*10 )
cos( 3pi/2 * y*12 )
ppmimage
teximage

A scale factor is applied to each. See the subroutine 'setup', in 'main.m4'. The two image perturbations depend on the presence of byte image files named 'ppmimage' and 'teximage' being present in the run directory. If they are not present, only the sinusoidal variations are done.

Implementation details
This version of sPPM is implemented primarily in Fortran, with system- dependent calls in C. All message passing is done with MPI. Only a small set of MPI routines are actually required:

MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize
MPI_Wtime, MPI_IRecv, MPI_Isend
MPI_Wait, MPI_Allreduce, MPI_Request_free

The parallelization strategy within an SMP is based on spawning extra copies of the parent MPI process. This is done with the 'sproc' routine, which is similar but not identical to the standard Unix 'fork' call. It differs in that 'sproc' causes all static (i.e. not stack based) storage to be implicitly shared between processes on a single SMP node. This storage will always be declared in Fortran with either SAVE or COMMON.

Synchronization is accomplished by examining shared variables, and by use of Unix System V semaphore calls. (See twiddle.m4 and cstuff.c.) In addition, a fast vendor supplied implementation of an atomic fetch&add (test_and_add) was used.

This threaded version of sPPM was developed on an SGI system. However an attempt has been made to minimize the places where the thread interface is exposed. There is also a Makefile switch THREADED to generate a version of sPPM that does not spawn any threads - but uses only MPI. Use of this MPI-only option for sPPM is discouraged for SMPs, because that would not be as efficient an implementation. It may, however, be useful in porting it initially. See further notes in the next section.


Building Issues

Porting sPPM
Under control of the Makefile switch THREADED, either an SMP version of sPPM or an MPI-only version of sPPM can be generated. There is no formal standard for SMP interfaces. This version is based on an SGI implementation. As a general guide to where the thread-specific parts of the source are, you can grep on THREADED in the .m4 and .h files.

In addition to obvious changes in the Makefile, there are three basic places to look in the source files for the SMP control:

  1. The thread initialization and spawning. These are located in main.m4 and in the routine spawn which is in runhyd3.m4. The THREADED switch will guide you to these locations. The MPI version simply comments these areas out.
  2. Actual thread control is done with 4 macros in the cliches.h file. PLOOP and PLOOPEND generate code for a fetch-and-add based do-across loop. These revert to a simple WHILE loop in the MPI only version. In addition there are macros for two barrier routines. These are no-ops in the MPI version. These four macros should be replaced by the appropriate equivalent for the new platform.
  3. The code for the routines referenced in the cliches.h files is in the file cstuff.c. This should be replaced with the appropriate routines, if any, that your macros need.
The routines for I/O - related calls are in C and have been grouped together in the file c_io.c. It is possible that there may also be system dependencies in this part of the code.

Single or double precision versions of the code can be generated by control of the switch REAL in the Makefile. Note that DOUBLE assumes you are also turning on a Fortran compiler switch that will cause all REAL storage (implicit or explicitly declared) to be taken as if it were declared "DOUBLE PRECISION".

Porting the sPPM Makefile
To generate a Makefile for a new platform:

Problem size - control at compile time
Before you make a version of sPPM, you must decide what the size of the data block on each node will be. If you are using a single SMP node - this will determine the size of the entire problem. If you are running on multiple SMP nodes (under control of the file run/inputdeck ... see About the Data), the problem size also depends on inputdeck.

The file 'iq.h' defines the size of the brick resident on each node. By node, I mean a node on the communications network. This size, along with the number and layout of nodes, determines the total problem size. For example, if the input contains:

	nodelayout 1 2 2

and iq.h contains:

	define(IQ,128)
	define(IQX,128)
	define(IQY,128)
	define(IQZ,128)
... then each node updates a 128^3, and the total problem size is (x,y,z) 128 by 256 by 256.

sPPM executable
When you "make" sppm, note that the executable is actually put in the directory called run. There are two different lines in Makefile: run/sppm has a link-line that expects cstuff.o (for a threaded version). run/sppm1 has a link-line that does not use cstuff.o (for MPI-only version).


Files in this distribution

sppm.tar    -   This file contains all the
                files listed below.

README		This file.
README.html	HTML version of this file.
Makefile	Contains compile/link commands.

arrays.h	State variables & coord. mesh 
bdrys.m4	Boundary setting/communications
buffers.h	Memory used by calchyd
c_io.c		I/O related routines.
cliches.h	Common constants/macros
cstuff.c	System dependent calls

iq.h		Definition of problem tile size.
main.m4		Most initialization/io code.
msgcom.h	Common shared variables - an assortment.
params.m4	Constants, most derived automatically from iq.h
runhyd3.m4	Contains runhyd/calchyd control logic to call sppm.
sppm.m4		The hydro. kernel sppm itself.
twiddle.m4	Trivial routine which spins on a shared variable.

ppmimage128	128x128x1-byte image used for the sample run.
ppmimage256     Same thing, though 256x256
ppmimage512     "" though 512x512
ppmimage64      "" though 64x64

run/		Runtime files
  inputdeck     Sample control file   
  output        Sample program output
   0/		directory for storing node-0 files 
   1/		directory for storing node-1 files


Optimization Constraints

Performance optimizations to the SPPM source are encouraged.


Execution Issues

sPPM is executed from a directory where inputdeck is - and where there is a directory named for each of the SMP nodes numbers in use: 0, 1, ... n. Once the input is read (see About the Data), the program attempts to change the current directory to one named as the MPI node id. For example, a 2-node run would execute 'cd 0' and 'cd 1', respectively. This is handy for keeping output from different nodes separate, and on disks physically attached to the node. (These directories should really be symbolic links to node-specific disk partitions.) Note these subdirectories are required for multi-node runs, as output files such as 'd0000' and 'restart' will be written by each node; there is no other provision for keeping them separate.

SPPM has two optional files that may be used to perturb the initial state. These are named ppmimage and teximage. These files are not necessary for the benchmark runs for SPPM - the program will print out a warning message stating that the files could not be located. Ignore this. These files are simple byte raster image files, which should match to the overall x-y size of the problem. For example, ppmimage64 is appropriate for any problem with 64 x and 64 y zones. If they are to be used, place a copy of the file of the appropriate size in the run directory.


Timing Issues

Elapsed time (wallclock) time should be used to evaluate performance. These times are printed out by the program for the run; the timer starts just before the initial state is defined.

A separate timing is printed for I/O; the first time includes I/O time. I/O time is the time devoted to writing bricks of bytes, (including reformatting & rescaling), and writing or reading restart dumps. Since each node has independent I/O, the time is the maximum of all the nodes I/O times, and the number of bytes written is the sum of all the nodes I/O number of bytes.


About the data

A small text file named 'inputdeck' is read at the start of the program. It contains lines of keyword & value pairs, the order of which is unimportant.

In addition to this file (described following), the iq.h file must also be set to control the problem size. See Problem Size under Building Issues.

Keyword      Value(s)		    Description
----------------------------------------------------------------
nodelayout   npx npy npz            Node geometry

nthreads     nthreads               # threads on each node

dtime        dtime                  Initial timestep

dtdump       dtdump                 Time between bob dumps

checkpoint   ndump                  Timesteps between restart
                                    dumps.  Since timesteps are
				    done in pairs, choose even
				    numbers.

stoptime     nstop tstop            Stop at simulation time tstop,
                                    or timestep nstop, whichever
                                    occurs first.

timings      idetailtm	            Level of timing info. printed.
  			            0 none, 1 timestep, 2 each sweep.

restart                             Mere presence will cause 
                                    the reloading of a checkpoint file,
				    aptly named 'restart'.
                                    In this case, dtime is ignored if
                                    it occurs in the inputdeck.


Expected Results

Two example runs are provided below, for your reference in determining correctness of answers. One was done with single precision, the other double precision (real*8 in this case.) The timestamp values such as '815256106', (etc.) are of course time and date dependent; do not expect to duplicate them. The critical numbers you should duplicate are the Courant numbers for each timestep. The energies may vary slightly due to rounding errors caused by summing these quantities in different orders.

For quantitative results, skip ahead to single precision output, or double precision output. For a qualitative idea of what is being computed, here are some volume renderings made of a 128 cube simulation.

d0000.jpg Ln(density), time = 0.0 This initial state has been perturbed by 'ppmimage128', and by a random noise texture image. The color mapping puts Ln(Density)=0 at the center of the scale, which is true of the other two Ln(Density) images as well.

d0005.jpg Ln(Density), time = 0.5 Here a cutting plane has cut away part of the departing shock wave, and exposes the center of the cube.

d0009.jpg Ln(Density), time = 0.9

v0006.jpg Vorticity magnitude, time = 0.6, center 1/3 of the cube.

s0006.jpg Density Laplacian, time = 0.6 This is an image of del^2(Density). The grayscale is centered, with zero at midscale.



Program output, SINGLE PRECISION.
******* sPPM benchmark *************************** 1 Communication nodes, in a 1 1 1 array. 6 threads update a 128 128 128 tile. Total problem size = 128 128 128 Restart file size= 48754540. (bytes) Brick-of-bytes sz= 6291456. (bytes) ************************************************** timestamp: 815337295 elapsed: 0.00 Timer initialized Problem parameters: Perturbation amplitudes: 0.050 0.002 0.002 0.035 Interface z: 0.60, Shock z: 0.50 Density ratio: 10.00 Warning, no image perturbation. 'ppmimage':: No such file or directory Warning, no image perturbation. 'teximage':: No such file or directory nstep Time Dtime Courant Kinetic Heat Total Energy _____ ___________ ___________ ___________ ___________ ___________ ____________ 0 0.00000E+00 3.00000E-04 0.00000E+00 2.03857E+01 3.93347E+01 5.97204E+01 Writing bricks-of-bytes 0 2 6.00000E-04 3.00000E-04 1.35669E-01 2.04296E+01 3.94500E+01 5.98796E+01 4 1.20000E-03 3.00000E-04 1.46746E-01 2.04743E+01 3.95645E+01 6.00388E+01 6 1.80000E-03 3.00000E-04 1.23499E-01 2.05154E+01 3.96826E+01 6.01980E+01 8 2.40000E-03 3.00000E-04 1.31502E-01 2.05547E+01 3.98025E+01 6.03572E+01 10 3.00000E-03 3.00000E-04 1.38890E-01 2.05965E+01 3.99198E+01 6.05164E+01 12 3.60000E-03 3.00000E-04 1.45549E-01 2.06408E+01 4.00348E+01 6.06755E+01 14 6.87097E-03 1.63548E-03 7.93437E-01 2.08686E+01 4.06747E+01 6.15434E+01 16 1.01688E-02 1.64893E-03 8.12054E-01 2.10586E+01 4.13598E+01 6.24183E+01 18 1.34177E-02 1.62446E-03 8.25161E-01 2.11763E+01 4.21039E+01 6.32803E+01 20 1.65676E-02 1.57492E-03 8.07598E-01 2.12321E+01 4.28838E+01 6.41160E+01 Writing checkpoint file. Timestamp: 815337491 Total Elapsed time: 196 Seconds. 5.2496E+01 Mbytes written in 2.9 Seconds.

Program output, DOUBLE PRECISION

******* sPPM benchmark *************************** 1 Communication nodes, in a 1 1 1 array. 6 threads update a 128 128 128 tile. Total problem size = 128 128 128 Restart file size= 97509080. (bytes) Brick-of-bytes sz= 6291456. (bytes) ************************************************** timestamp: 815336423 elapsed: 0.00 Timer initialized Problem parameters: Perturbation amplitudes: 0.050 0.002 0.002 0.035 Interface z: 0.60, Shock z: 0.50 Density ratio: 10.00 Warning, no image perturbation. 'ppmimage':: No such file or directory Warning, no image perturbation. 'teximage':: No such file or directory nstep Time Dtime Courant Kinetic Heat Total Energy _____ ___________ ___________ ___________ ___________ ___________ ____________ 0 0.00000E+00 3.00000E-04 0.00000E+00 2.03857E+01 3.93347E+01 5.97204E+01 Writing bricks-of-bytes 0 2 6.00000E-04 3.00000E-04 1.35669E-01 2.04296E+01 3.94500E+01 5.98796E+01 4 1.20000E-03 3.00000E-04 1.46745E-01 2.04743E+01 3.95645E+01 6.00388E+01 6 1.80000E-03 3.00000E-04 1.23444E-01 2.05154E+01 3.96825E+01 6.01980E+01 8 2.40000E-03 3.00000E-04 1.31478E-01 2.05547E+01 3.98025E+01 6.03571E+01 10 3.00000E-03 3.00000E-04 1.39018E-01 2.05965E+01 3.99198E+01 6.05163E+01 12 3.60000E-03 3.00000E-04 1.45561E-01 2.06408E+01 4.00347E+01 6.06755E+01 14 6.87097E-03 1.63549E-03 7.93443E-01 2.08686E+01 4.06747E+01 6.15433E+01 16 1.01686E-02 1.64879E-03 8.11972E-01 2.10585E+01 4.13597E+01 6.24182E+01 18 1.34175E-02 1.62448E-03 8.25185E-01 2.11763E+01 4.21039E+01 6.32802E+01 20 1.65673E-02 1.57490E-03 8.07589E-01 2.12321E+01 4.28838E+01 6.41159E+01 Writing checkpoint file. Timestamp: 815336674 Total Elapsed time: 251 Seconds. 9.8992E+01 Mbytes written in 5.0 Seconds.