Short problem description
The coordinates are -1:1 in x and y, and 0:zmax in z, where zmax depends on the overall aspect ratio prescribed.
A plane shock traveling up the +z axis encounters a density discontinuity, at which the gas becomes denser. The shock is carefully designed to be simple, though strong, about Mach 5. The gas initially has density 0.1 ahead of the shock; over 5dz at the discontinuity, it changes to 1.0. This discontinuity surface is also initially perturbed in +z the following way.
There are 5 components to the perturbation:
cos( 3pi/2 * sqrt(x^2+y^2) )
cos( 3pi/2 * x*10 )
cos( 3pi/2 * y*12 )
A scale factor is applied to each. See the subroutine 'setup', in 'main.m4'. The two image perturbations depend on the presence of byte image files named 'ppmimage' and 'teximage' being present in the run directory. If they are not present, only the sinusoidal variations are done.
This version of sPPM is implemented primarily in Fortran, with system- dependent calls in C. All message passing is done with MPI. Only a small set of MPI routines are actually required:
MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize
MPI_Wtime, MPI_IRecv, MPI_Isend
MPI_Wait, MPI_Allreduce, MPI_Request_free
The parallelization strategy within an SMP is based on spawning extra copies of the parent MPI process. This is done with the 'sproc' routine, which is similar but not identical to the standard Unix 'fork' call. It differs in that 'sproc' causes all static (i.e. not stack based) storage to be implicitly shared between processes on a single SMP node. This storage will always be declared in Fortran with either SAVE or COMMON.
Synchronization is accomplished by examining shared variables, and by use of Unix System V semaphore calls. (See twiddle.m4 and cstuff.c.) In addition, a fast vendor supplied implementation of an atomic fetch&add (test_and_add) was used.
This threaded version of sPPM was developed on an SGI system. However an attempt has been made to minimize the places where the thread interface is exposed. There is also a Makefile switch THREADED to generate a version of sPPM that does not spawn any threads - but uses only MPI. Use of this MPI-only option for sPPM is discouraged for SMPs, because that would not be as efficient an implementation. It may, however, be useful in porting it initially. See further notes in the next section.
Under control of the Makefile switch THREADED, either an SMP version of sPPM or an MPI-only version of sPPM can be generated. There is no formal standard for SMP interfaces. This version is based on an SGI implementation. As a general guide to where the thread-specific parts of the source are, you can grep on THREADED in the .m4 and .h files.
In addition to obvious changes in the Makefile, there are three basic places to look in the source files for the SMP control:
Single or double precision versions of the code can be generated by control of the switch REAL in the Makefile. Note that DOUBLE assumes you are also turning on a Fortran compiler switch that will cause all REAL storage (implicit or explicitly declared) to be taken as if it were declared "DOUBLE PRECISION".
Porting the sPPM Makefile
To generate a Makefile for a new platform:
Problem size - control at compile time
Before you make a version of sPPM, you must decide what the size of the data block on each node will be. If you are using a single SMP node - this will determine the size of the entire problem. If you are running on multiple SMP nodes (under control of the file run/inputdeck ... see About the Data), the problem size also depends on inputdeck.
The file 'iq.h' defines the size of the brick resident on each node.
By node, I mean a node on the communications network. This size,
along with the number and layout of nodes, determines the total
problem size. For example, if the input contains:
nodelayout 1 2 2
and iq.h contains:
... then each node updates a 128^3, and the total problem size is
(x,y,z) 128 by 256 by 256.
When you "make" sppm, note that the executable is actually put in the directory called run. There are two different lines in Makefile: run/sppm has a link-line that expects cstuff.o (for a threaded version). run/sppm1 has a link-line that does not use cstuff.o (for MPI-only version).
sppm.tar - This file contains all the files listed below. README This file. README.html HTML version of this file. Makefile Contains compile/link commands. arrays.h State variables & coord. mesh bdrys.m4 Boundary setting/communications buffers.h Memory used by calchyd c_io.c I/O related routines. cliches.h Common constants/macros cstuff.c System dependent calls iq.h Definition of problem tile size. main.m4 Most initialization/io code. msgcom.h Common shared variables - an assortment. params.m4 Constants, most derived automatically from iq.h runhyd3.m4 Contains runhyd/calchyd control logic to call sppm. sppm.m4 The hydro. kernel sppm itself. twiddle.m4 Trivial routine which spins on a shared variable. ppmimage128 128x128x1-byte image used for the sample run. ppmimage256 Same thing, though 256x256 ppmimage512 "" though 512x512 ppmimage64 "" though 64x64 run/ Runtime files inputdeck Sample control file output Sample program output 0/ directory for storing node-0 files 1/ directory for storing node-1 files
SPPM has two optional files that may be used to perturb the initial state. These are named ppmimage and teximage. These files are not necessary for the benchmark runs for SPPM - the program will print out a warning message stating that the files could not be located. Ignore this. These files are simple byte raster image files, which should match to the overall x-y size of the problem. For example, ppmimage64 is appropriate for any problem with 64 x and 64 y zones. If they are to be used, place a copy of the file of the appropriate size in the run directory.
A separate timing is printed for I/O; the first time includes I/O time. I/O time is the time devoted to writing bricks of bytes, (including reformatting & rescaling), and writing or reading restart dumps. Since each node has independent I/O, the time is the maximum of all the nodes I/O times, and the number of bytes written is the sum of all the nodes I/O number of bytes.
In addition to this file (described following), the iq.h file must also be set to control the problem size. See Problem Size under Building Issues.
Keyword Value(s) Description ---------------------------------------------------------------- nodelayout npx npy npz Node geometry nthreads nthreads # threads on each node dtime dtime Initial timestep dtdump dtdump Time between bob dumps checkpoint ndump Timesteps between restart dumps. Since timesteps are done in pairs, choose even numbers. stoptime nstop tstop Stop at simulation time tstop, or timestep nstop, whichever occurs first. timings idetailtm Level of timing info. printed. 0 none, 1 timestep, 2 each sweep. restart Mere presence will cause the reloading of a checkpoint file, aptly named 'restart'. In this case, dtime is ignored if it occurs in the inputdeck.
For quantitative results, skip ahead to single precision output, or double precision output. For a qualitative idea of what is being computed, here are some volume renderings made of a 128 cube simulation.
Ln(density), time = 0.0
This initial state has been perturbed by 'ppmimage128', and by a random noise
texture image. The color mapping puts Ln(Density)=0 at the center of the
scale, which is true of the other two Ln(Density) images as well.
Ln(Density), time = 0.5 Here a cutting plane has cut away part of the departing shock wave, and exposes the center of the cube.
Ln(Density), time = 0.9
Vorticity magnitude, time = 0.6, center 1/3 of the cube.
Density Laplacian, time = 0.6 This is an image of del^2(Density). The grayscale is centered, with zero at midscale.
Program output, SINGLE PRECISION.
******* sPPM benchmark *************************** 1 Communication nodes, in a 1 1 1 array. 6 threads update a 128 128 128 tile. Total problem size = 128 128 128 Restart file size= 48754540. (bytes) Brick-of-bytes sz= 6291456. (bytes) ************************************************** timestamp: 815337295 elapsed: 0.00 Timer initialized Problem parameters: Perturbation amplitudes: 0.050 0.002 0.002 0.035 Interface z: 0.60, Shock z: 0.50 Density ratio: 10.00 Warning, no image perturbation. 'ppmimage':: No such file or directory Warning, no image perturbation. 'teximage':: No such file or directory nstep Time Dtime Courant Kinetic Heat Total Energy _____ ___________ ___________ ___________ ___________ ___________ ____________ 0 0.00000E+00 3.00000E-04 0.00000E+00 2.03857E+01 3.93347E+01 5.97204E+01 Writing bricks-of-bytes 0 2 6.00000E-04 3.00000E-04 1.35669E-01 2.04296E+01 3.94500E+01 5.98796E+01 4 1.20000E-03 3.00000E-04 1.46746E-01 2.04743E+01 3.95645E+01 6.00388E+01 6 1.80000E-03 3.00000E-04 1.23499E-01 2.05154E+01 3.96826E+01 6.01980E+01 8 2.40000E-03 3.00000E-04 1.31502E-01 2.05547E+01 3.98025E+01 6.03572E+01 10 3.00000E-03 3.00000E-04 1.38890E-01 2.05965E+01 3.99198E+01 6.05164E+01 12 3.60000E-03 3.00000E-04 1.45549E-01 2.06408E+01 4.00348E+01 6.06755E+01 14 6.87097E-03 1.63548E-03 7.93437E-01 2.08686E+01 4.06747E+01 6.15434E+01 16 1.01688E-02 1.64893E-03 8.12054E-01 2.10586E+01 4.13598E+01 6.24183E+01 18 1.34177E-02 1.62446E-03 8.25161E-01 2.11763E+01 4.21039E+01 6.32803E+01 20 1.65676E-02 1.57492E-03 8.07598E-01 2.12321E+01 4.28838E+01 6.41160E+01 Writing checkpoint file. Timestamp: 815337491 Total Elapsed time: 196 Seconds. 5.2496E+01 Mbytes written in 2.9 Seconds.
Program output, DOUBLE PRECISION
******* sPPM benchmark *************************** 1 Communication nodes, in a 1 1 1 array. 6 threads update a 128 128 128 tile. Total problem size = 128 128 128 Restart file size= 97509080. (bytes) Brick-of-bytes sz= 6291456. (bytes) ************************************************** timestamp: 815336423 elapsed: 0.00 Timer initialized Problem parameters: Perturbation amplitudes: 0.050 0.002 0.002 0.035 Interface z: 0.60, Shock z: 0.50 Density ratio: 10.00 Warning, no image perturbation. 'ppmimage':: No such file or directory Warning, no image perturbation. 'teximage':: No such file or directory nstep Time Dtime Courant Kinetic Heat Total Energy _____ ___________ ___________ ___________ ___________ ___________ ____________ 0 0.00000E+00 3.00000E-04 0.00000E+00 2.03857E+01 3.93347E+01 5.97204E+01 Writing bricks-of-bytes 0 2 6.00000E-04 3.00000E-04 1.35669E-01 2.04296E+01 3.94500E+01 5.98796E+01 4 1.20000E-03 3.00000E-04 1.46745E-01 2.04743E+01 3.95645E+01 6.00388E+01 6 1.80000E-03 3.00000E-04 1.23444E-01 2.05154E+01 3.96825E+01 6.01980E+01 8 2.40000E-03 3.00000E-04 1.31478E-01 2.05547E+01 3.98025E+01 6.03571E+01 10 3.00000E-03 3.00000E-04 1.39018E-01 2.05965E+01 3.99198E+01 6.05163E+01 12 3.60000E-03 3.00000E-04 1.45561E-01 2.06408E+01 4.00347E+01 6.06755E+01 14 6.87097E-03 1.63549E-03 7.93443E-01 2.08686E+01 4.06747E+01 6.15433E+01 16 1.01686E-02 1.64879E-03 8.11972E-01 2.10585E+01 4.13597E+01 6.24182E+01 18 1.34175E-02 1.62448E-03 8.25185E-01 2.11763E+01 4.21039E+01 6.32802E+01 20 1.65673E-02 1.57490E-03 8.07589E-01 2.12321E+01 4.28838E+01 6.41159E+01 Writing checkpoint file. Timestamp: 815336674 Total Elapsed time: 251 Seconds. 9.8992E+01 Mbytes written in 5.0 Seconds.