[CIG-MC] Citcom runtime on Lonestar vs Ranger (Teragrid): Insights/experience?

Walter Landry walter at geodynamics.org
Sun Feb 21 18:04:37 PST 2010

Erin Burkett <erburkett at ucdavis.edu> wrote:
> Hi all,
> For those of you who have experience running versions of Citcom on the Teragrid
> (specifically Lonestar and Ranger), I am seeking some insight/advice to help
> understand and minimize/eliminate (if possible) the longer runtimes I'm
> currently encountering on Ranger (vs. Lonestar).  
> I'm attaching a table summarizing timing results of a small test I've run using
> various combinations of the different compilers, compiler options, mpicc
> versions.  This follows an earlier run on 480 processors resulting in a 20%
> longer runtime on Ranger vs Lonestar that first caused me to be aware of the
> difference.
> I have gathered that because Lonestar has 4 cores/machine and Ranger has 16
> cores/machine, the interconnect speed (Citcom uses interconnect heavily/is
> communications intensive?) may be limiting the speed despite Ranger being more
> powerful.

The individual cores on Ranger are actually slower than Lonestar.
According to the documentation, Ranger's core have a peak speed of 9.2
GFlops, while Lonestar's peak speed is 10.6 GFlops.  

Alsoe, the cores on Ranger are AMD's, and Intel has been known to
deliberately introduce logic into their compiler so that code runs
slower on AMD's.  That is why the default compiler on Ranger is the
PGI compiler.

Moreover, the Intel compiler may do a better job optimizing for Intel machines
than the PGI compiler does for the AMD.  A 5% difference would not be
out of the question.

The interconnects may have an additional effect, but I would have to
see scaling results for your problem for different numbers of cores
(e.g. the same problem with 16, 64, and 256 processors).
Theoretically, Ranger has 1 Infiniband connection per 16 cores, while
Lonestar has 1 Infiniband connection per 4 cores.

Finally, memory pressure may have an effect.  The AMD's generally have
better memory performance, but there is more contention since there
are 16 cores accessing the same memory, rather than 4 cores accessing
4 separate banks of memory.

Walter Landry
walter at geodynamics.org

More information about the CIG-MC mailing list