Index of /research/sppm

Icon  Name                    Last modified      Size  Description
[DIR] Parent Directory - [TXT] x_title.html 26-Sep-2006 11:02 122 [TXT] x_menu.html 26-Sep-2006 11:02 583 [IMG] v0006s.jpg 26-Sep-2006 11:02 3.6K [IMG] d0000s.jpg 26-Sep-2006 11:02 3.9K [IMG] s0006s.jpg 26-Sep-2006 11:02 4.2K [IMG] d0005s.jpg 26-Sep-2006 11:02 4.7K [IMG] d0009s.jpg 26-Sep-2006 11:02 4.7K [TXT] x_main.html 26-Sep-2006 11:02 17K [IMG] v0006.jpg 26-Sep-2006 11:02 27K [IMG] d0000.jpg 26-Sep-2006 11:02 31K [IMG] d0005.jpg 26-Sep-2006 11:02 33K [IMG] d0009.jpg 26-Sep-2006 11:02 34K [IMG] s0006.jpg 26-Sep-2006 11:02 40K [   ] sppm0.tar 26-Sep-2006 11:02 610K [   ] sppm.tar 26-Sep-2006 11:02 620K
<html> <head> <title>The sPPM benchmark code README file</title> </head> <body> <H1>The sPPM benchmark code README file</H1> <H2>Table of contents </H2> <UL> <LI><A HREF="#Code">Code Description</A> <LI><A HREF="#Files">Files in this distribution</A> <LI><A HREF="#Building">Building Issues</A> <LI><A HREF="#Timing">Timing Issues</A> <LI><A HREF="#Execution">Execution Issues</A> <LI><A HREF="#About">About the data</A> <LI><A HREF="#Expected">Expected Results</A> </UL> <a name="Code"></A> <H2>Code Description</H2> This program computes a 3-D hydrodynamics problem on a uniform mesh, using a simplified version of the PPM (Piecewise Parabolic Method) code. <p> <b>Short problem description</b><br> The coordinates are -1:1 in x and y, and 0:zmax in z, where zmax depends on the overall aspect ratio prescribed. <p> A plane shock traveling up the +z axis encounters a density discontinuity, at which the gas becomes denser. The shock is carefully designed to be simple, though strong, about Mach 5. The gas initially has density 0.1 ahead of the shock; over 5dz at the discontinuity, it changes to 1.0. This discontinuity surface is also initially perturbed in +z the following way. <p> There are 5 components to the perturbation: <p> cos( 3pi/2 * sqrt(x^2+y^2) )<br> cos( 3pi/2 * x*10 )<br> cos( 3pi/2 * y*12 )<br> ppmimage<br> teximage<br> <p> A scale factor is applied to each. See the subroutine 'setup', in 'main.m4'. The two image perturbations depend on the presence of byte image files named 'ppmimage' and 'teximage' being present in the run directory. If they are not present, only the sinusoidal variations are done. <p> <b>Implementation details</b><br> This version of sPPM is implemented primarily in Fortran, with system- dependent calls in C. All message passing is done with MPI. Only a small set of MPI routines are actually required: <p><samp> MPI_Init, MPI_Comm_rank, MPI_Comm_size, MPI_Finalize <br> MPI_Wtime, MPI_IRecv, MPI_Isend<br> MPI_Wait, MPI_Allreduce, MPI_Request_free<br> </samp><p> The parallelization strategy within an SMP is based on spawning extra copies of the parent MPI process. This is done with the 'sproc' routine, which is similar but not identical to the standard Unix 'fork' call. It differs in that 'sproc' causes all static (i.e. not stack based) storage to be implicitly shared between processes on a single SMP node. This storage will always be declared in Fortran with either SAVE or COMMON. <p> Synchronization is accomplished by examining shared variables, and by use of Unix System V semaphore calls. (See twiddle.m4 and cstuff.c.) In addition, a fast vendor supplied implementation of an atomic fetch&add (test_and_add) was used. <p> This threaded version of sPPM was developed on an SGI system. However an attempt has been made to minimize the places where the thread interface is exposed. There is also a Makefile switch THREADED to generate a version of sPPM that does not spawn any threads - but uses only MPI. Use of this MPI-only option for sPPM is discouraged for SMPs, because that would not be as efficient an implementation. It may, however, be useful in porting it initially. See further notes in the next section. <p> <hr> <A NAME="Building"></A> <H2>Building Issues</H2> <p> <b>Porting sPPM</b><br> Under control of the Makefile switch THREADED, either an SMP version of sPPM or an MPI-only version of sPPM can be generated. There is no formal standard for SMP interfaces. This version is based on an SGI implementation. As a general guide to where the thread-specific parts of the source are, you can grep on THREADED in the .m4 and .h files. <p> In addition to obvious changes in the Makefile, there are three basic places to look in the source files for the SMP control: <p> <OL> <LI> The thread initialization and spawning. These are located in main.m4 and in the routine spawn which is in runhyd3.m4. The THREADED switch will guide you to these locations. The MPI version simply comments these areas out. <LI> Actual thread control is done with 4 macros in the cliches.h file. PLOOP and PLOOPEND generate code for a fetch-and-add based do-across loop. These revert to a simple WHILE loop in the MPI only version. In addition there are macros for two barrier routines. These are no-ops in the MPI version. These four macros should be replaced by the appropriate equivalent for the new platform. <LI> The code for the routines referenced in the cliches.h files is in the file cstuff.c. This should be replaced with the appropriate routines, if any, that your macros need. </OL> The routines for I/O - related calls are in C and have been grouped together in the file c_io.c. It is possible that there may also be system dependencies in this part of the code. <p> Single or double precision versions of the code can be generated by control of the switch REAL in the Makefile. Note that DOUBLE assumes you are also turning on a Fortran compiler switch that will cause all REAL storage (implicit or explicitly declared) to be taken as if it were declared "DOUBLE PRECISION". <p> <b>Porting the sPPM Makefile</b><br> To generate a Makefile for a new platform: <ul> <LI> check the compiler, m4, and cpp names at the beginning of Makefile <LI> change the 3 lines that specify the location of MPI <LI> check all the compiler flags for appropriate values for your system <LI> set the values of REAL to float or double and THREADED to 1 or 0. </ul> <p> <b>Problem size - control at compile time</b><br> Before you make a version of sPPM, you must decide what the size of the data block on each node will be. If you are using a single SMP node - this will determine the size of the entire problem. If you are running on multiple SMP nodes (under control of the file <I>run/inputdeck</I> ... see About the Data), the problem size also depends on <I>inputdeck</I>. <p> The file 'iq.h' defines the size of the brick resident on each node. By node, I mean a node on the communications network. This size, along with the number and layout of nodes, determines the total problem size. For example, if the input contains: <samp><pre> nodelayout 1 2 2 </samp> and iq.h contains:<samp> define(IQ,128) define(IQX,128) define(IQY,128) define(IQZ,128) </samp></pre> ... then each node updates a 128^3, and the total problem size is (x,y,z) 128 by 256 by 256. <p> <b>sPPM executable</b><br> When you "make" sppm, note that the executable is actually put in the directory called run. There are two different lines in Makefile: run/sppm has a link-line that expects cstuff.o (for a threaded version). run/sppm1 has a link-line that does not use cstuff.o (for MPI-only version). <p> <hr> <A NAME="Files"></A> <H2>Files in this distribution</H2> <PRE> <A HREF="sppm.tar">sppm.tar</A> - This file contains all the files listed below. README This file. README.html HTML version of this file. Makefile Contains compile/link commands. arrays.h State variables & coord. mesh bdrys.m4 Boundary setting/communications buffers.h Memory used by calchyd c_io.c I/O related routines. cliches.h Common constants/macros cstuff.c System dependent calls iq.h Definition of problem tile size. main.m4 Most initialization/io code. msgcom.h Common shared variables - an assortment. params.m4 Constants, most derived automatically from iq.h runhyd3.m4 Contains runhyd/calchyd control logic to call sppm. sppm.m4 The hydro. kernel sppm itself. twiddle.m4 Trivial routine which spins on a shared variable. ppmimage128 128x128x1-byte image used for the sample run. ppmimage256 Same thing, though 256x256 ppmimage512 "" though 512x512 ppmimage64 "" though 64x64 run/ Runtime files inputdeck Sample control file output Sample program output 0/ directory for storing node-0 files 1/ directory for storing node-1 files </PRE> <p> <hr> <A NAME="Optimization"></A> <H2>Optimization Constraints</H2> Performance optimizations to the SPPM source are encouraged. <p> <hr> <A NAME="Execution"></A> <H2>Execution Issues</H2> sPPM is executed from a directory where <I>inputdeck</I> is - and where there is a directory named for each of the SMP nodes numbers in use: 0, 1, ... n. Once the input is read (see About the Data), the program attempts to change the current directory to one named as the MPI node id. For example, a 2-node run would execute 'cd 0' and 'cd 1', respectively. This is handy for keeping output from different nodes separate, and on disks physically attached to the node. (These directories should really be symbolic links to node-specific disk partitions.) Note these subdirectories are <I>required</I> for multi-node runs, as output files such as 'd0000' and 'restart' will be written by each node; there is no other provision for keeping them separate. <p> SPPM has two optional files that may be used to perturb the initial state. These are named ppmimage and teximage. These files are <I>not</I> necessary for the benchmark runs for SPPM - the program will print out a warning message stating that the files could not be located. Ignore this. These files are simple byte raster image files, which should match to the overall x-y size of the problem. For example, <I>ppmimage64</I> is appropriate for any problem with 64 x and 64 y zones. If they are to be used, place a copy of the file of the appropriate size in the run directory. <p> <hr> <A NAME="Timing"></A> <H2>Timing Issues</H2> Elapsed time (wallclock) time should be used to evaluate performance. These times are printed out by the program for the run; the timer starts just before the initial state is defined.<p> A separate timing is printed for I/O; the first time includes I/O time. I/O time is the time devoted to writing bricks of bytes, (including reformatting & rescaling), and writing or reading restart dumps. Since each node has independent I/O, the time is the maximum of all the nodes I/O times, and the number of bytes written is the sum of all the nodes I/O number of bytes. <p> <hr> <A NAME="About"></A> <H2>About the data</H2> A small text file named 'inputdeck' is read at the start of the program. It contains lines of keyword & value pairs, the order of which is unimportant. <p> In addition to this file (described following), the iq.h file must also be set to control the problem size. See Problem Size under Building Issues. <p> <pre> Keyword Value(s) Description ---------------------------------------------------------------- nodelayout npx npy npz Node geometry nthreads nthreads # threads on each node dtime dtime Initial timestep dtdump dtdump Time between bob dumps checkpoint ndump Timesteps between restart dumps. Since timesteps are done in pairs, choose even numbers. stoptime nstop tstop Stop at simulation time tstop, or timestep nstop, whichever occurs first. timings idetailtm Level of timing info. printed. 0 none, 1 timestep, 2 each sweep. restart Mere presence will cause the reloading of a checkpoint file, aptly named 'restart'. In this case, dtime is ignored if it occurs in the inputdeck. </pre> <p> <hr> <A NAME="Expected"></A> <H2>Expected Results</H2> Two example runs are provided below, for your reference in determining correctness of answers. One was done with single precision, the other double precision (real*8 in this case.) The timestamp values such as '815256106', (etc.) are of course time and date dependent; do not expect to duplicate them. The critical numbers you should duplicate are the Courant numbers for each timestep. The energies <em>may</em> vary slightly due to rounding errors caused by summing these quantities in different orders. <p> For quantitative results, skip ahead to <a href="#SINGLE">single precision output</a>, or <a href="#DOUBLE">double precision output</a>. For a qualitative idea of what is being computed, here are some volume renderings made of a 128 cube simulation.<p>,<tt> <a href=d0000.jpg><img align=bottom src="d0000s.jpg" width=140 height=140 alt="d0000.jpg"></a> Ln(density), time = 0.0 This initial state has been perturbed by 'ppmimage128', and by a random noise texture image. The color mapping puts Ln(Density)=0 at the center of the scale, which is true of the other two Ln(Density) images as well.<br> <a href=d0005.jpg><img align=bottom src="d0005s.jpg" width=140 height=140 alt="d0005.jpg"></a> Ln(Density), time = 0.5 Here a cutting plane has cut away part of the departing shock wave, and exposes the center of the cube.<br> <a href=d0009.jpg><img align=bottom src="d0009s.jpg" width=140 height=140 alt="d0009.jpg"></a> Ln(Density), time = 0.9<br> <a href=v0006.jpg><img align=bottom src="v0006s.jpg" width=140 height=140 alt="v0006.jpg"></a> Vorticity magnitude, time = 0.6, center 1/3 of the cube.<br> <a href=s0006.jpg><img align=bottom src="s0006s.jpg" width=140 height=140 alt="s0006.jpg"></a> Density Laplacian, time = 0.6 This is an image of del^2(Density). The grayscale is centered, with zero at midscale. </tt> <p><pre><samp> <hr> <a name="SINGLE"> <em>Program output, SINGLE PRECISION.</em><br> ******* sPPM benchmark *************************** 1 Communication nodes, in a 1 1 1 array. 6 threads update a 128 128 128 tile. Total problem size = 128 128 128 Restart file size= 48754540. (bytes) Brick-of-bytes sz= 6291456. (bytes) ************************************************** timestamp: 815337295 elapsed: 0.00 Timer initialized Problem parameters: Perturbation amplitudes: 0.050 0.002 0.002 0.035 Interface z: 0.60, Shock z: 0.50 Density ratio: 10.00 Warning, no image perturbation. 'ppmimage':: No such file or directory Warning, no image perturbation. 'teximage':: No such file or directory nstep Time Dtime Courant Kinetic Heat Total Energy _____ ___________ ___________ ___________ ___________ ___________ ____________ 0 0.00000E+00 3.00000E-04 0.00000E+00 2.03857E+01 3.93347E+01 5.97204E+01 Writing bricks-of-bytes 0 2 6.00000E-04 3.00000E-04 1.35669E-01 2.04296E+01 3.94500E+01 5.98796E+01 4 1.20000E-03 3.00000E-04 1.46746E-01 2.04743E+01 3.95645E+01 6.00388E+01 6 1.80000E-03 3.00000E-04 1.23499E-01 2.05154E+01 3.96826E+01 6.01980E+01 8 2.40000E-03 3.00000E-04 1.31502E-01 2.05547E+01 3.98025E+01 6.03572E+01 10 3.00000E-03 3.00000E-04 1.38890E-01 2.05965E+01 3.99198E+01 6.05164E+01 12 3.60000E-03 3.00000E-04 1.45549E-01 2.06408E+01 4.00348E+01 6.06755E+01 14 6.87097E-03 1.63548E-03 7.93437E-01 2.08686E+01 4.06747E+01 6.15434E+01 16 1.01688E-02 1.64893E-03 8.12054E-01 2.10586E+01 4.13598E+01 6.24183E+01 18 1.34177E-02 1.62446E-03 8.25161E-01 2.11763E+01 4.21039E+01 6.32803E+01 20 1.65676E-02 1.57492E-03 8.07598E-01 2.12321E+01 4.28838E+01 6.41160E+01 Writing checkpoint file. Timestamp: 815337491 Total Elapsed time: 196 Seconds. 5.2496E+01 Mbytes written in 2.9 Seconds. <br><hr> <a name="DOUBLE" > <em>Program output, DOUBLE PRECISION</em><br><br> ******* sPPM benchmark *************************** 1 Communication nodes, in a 1 1 1 array. 6 threads update a 128 128 128 tile. Total problem size = 128 128 128 Restart file size= 97509080. (bytes) Brick-of-bytes sz= 6291456. (bytes) ************************************************** timestamp: 815336423 elapsed: 0.00 Timer initialized Problem parameters: Perturbation amplitudes: 0.050 0.002 0.002 0.035 Interface z: 0.60, Shock z: 0.50 Density ratio: 10.00 Warning, no image perturbation. 'ppmimage':: No such file or directory Warning, no image perturbation. 'teximage':: No such file or directory nstep Time Dtime Courant Kinetic Heat Total Energy _____ ___________ ___________ ___________ ___________ ___________ ____________ 0 0.00000E+00 3.00000E-04 0.00000E+00 2.03857E+01 3.93347E+01 5.97204E+01 Writing bricks-of-bytes 0 2 6.00000E-04 3.00000E-04 1.35669E-01 2.04296E+01 3.94500E+01 5.98796E+01 4 1.20000E-03 3.00000E-04 1.46745E-01 2.04743E+01 3.95645E+01 6.00388E+01 6 1.80000E-03 3.00000E-04 1.23444E-01 2.05154E+01 3.96825E+01 6.01980E+01 8 2.40000E-03 3.00000E-04 1.31478E-01 2.05547E+01 3.98025E+01 6.03571E+01 10 3.00000E-03 3.00000E-04 1.39018E-01 2.05965E+01 3.99198E+01 6.05163E+01 12 3.60000E-03 3.00000E-04 1.45561E-01 2.06408E+01 4.00347E+01 6.06755E+01 14 6.87097E-03 1.63549E-03 7.93443E-01 2.08686E+01 4.06747E+01 6.15433E+01 16 1.01686E-02 1.64879E-03 8.11972E-01 2.10585E+01 4.13597E+01 6.24182E+01 18 1.34175E-02 1.62448E-03 8.25185E-01 2.11763E+01 4.21039E+01 6.32802E+01 20 1.65673E-02 1.57490E-03 8.07589E-01 2.12321E+01 4.28838E+01 6.41159E+01 Writing checkpoint file. Timestamp: 815336674 Total Elapsed time: 251 Seconds. 9.8992E+01 Mbytes written in 5.0 Seconds. </pre></samp> <hr> <address> <a href="mailto:webmaster@lcse.umn.edu">webmaster@lcse.umn.edu</a> ~ Last Updated November 2, 1995 </body> </html>