Thursday, December 16, 2010

Build successful: how to run?

This is new:


Thu Dec 16 13:27:36 CST 2010 /work/00671/tobis/CAM_3/run/ccsm.bldlog.101216-130725
- Locking file env_build.xml
- Locking file Macros.prototype_ranger
CCSM BUILDEXE SCRIPT HAS FINISHED SUCCESSFULLY


and has to be considered good news.

Now the "quick start" seems to have me in the scripts directory issuing

qsub $CASE.$MACH.run

but how could that work? CCSM doesn't know my account number. All of the hash commands to the runtime environment are missing in $CASE.$MACH.run . I will try just splicing them in manually.

How many PE's?

env_mach_pes.xml says (open angle brackets elided):


!-- -->
!-- These variables CANNOT be modified once configure -case has been -->
!-- invoked without first invoking configure -cleanmach. -->
!-- -->
!-- See README/readme_env and README/readme_general for details -->


...

entry id="TOTALPES" value="32" />
entry id="PES_LEVEL" value="1r" />
entry id="MAX_TASKS_PER_NODE" value="4" />
entry id="PES_PER_NODE" value="$MAX_TASKS_PER_NODE" />


but my prior qsubscript says


#$ -pe 16way 64


Second attempt, then, leave the -pe out; see if it compensates somehow.

so:


#$ -V
#$ -cwd
#$ -j y
#$ -A A-ig2
#$ -l h_rt=00:30:00
#$ -q normal
#$ -N spinup-CCSM
#$ -o ./$JOB_NAME.out


Not sure about the -cwd either...


------------> Rejecting job <------------
Please specify a parallel environment.
Syntax: -pe
Example: #$ -pe 16way 48
To see a list of defined pes: qconf -spl
-----------------------------------------


should I go for 4way 32 or 16way 32 ?

I though they had gotten somewhere on ranger.

Trying 4way 32 which will ask for 8 nodes when 2 would do, I think.

OK, it is in the queue now

find Juli's script example:


#$ -V
# {inherit submission environment}
#$ -cwd
# {use submission directory}
#$ -N myCCSM
# {jobname (myCCSM)}
#$ -j y
# {join stderr and stdout}
#$ -o $JOB_NAME.o$JOB_ID
# {output name jobname.ojobid
#$ -pe 16way 1024
# {use 16 cores/node, 1024 cores total}
#$ -q normal
# {queue name}
#$ -l h_rt=05:30:00
# {request 4 hours}
#$ -M juliana@ucar.edu
# {UNCOMMENT & insert Email address}
#$ -m be
# {UNCOMMENT email at Begin/End of job}
set echo #{echo cmds, use "set echo" in csh}
# {account number}
#$ -A TG-CCR090010

# ----------------------------------------
# PE LAYOUT:
# total number of tasks = 1024
# maximum threads per task = 1
# cpl ntasks=128 nthreads=1 rootpe=0
# cam ntasks=1024 nthreads=1 rootpe=0
# clm ntasks=128 nthreads=1 rootpe=0
# cice ntasks=160 nthreads=1 rootpe=0
# pop2 ntasks=32 nthreads=1 rootpe=0
#
# total number of hw pes = 1024
# cpl hw pe range ~ from 0 to 127
# cam hw pe range ~ from 0 to 1023
# clm hw pe range ~ from 0 to 127
# cice hw pe range ~ from 0 to 159
# pop2 hw pe range ~ from 0 to 31
# ----------------------------------------
#-----------------------------------------------------------------------
# Determine necessary environment variables
#-----------------------------------------------------------------------



her env_mach_pes:



setenv NTASKS_ATM 1024; setenv NTHRDS_ATM 1; setenv ROOTPE_ATM 0;
setenv NTASKS_LND 128; setenv NTHRDS_LND 1; setenv ROOTPE_LND 0;
setenv NTASKS_ICE 160; setenv NTHRDS_ICE 1; setenv ROOTPE_ICE 0;
setenv NTASKS_OCN 32; setenv NTHRDS_OCN 1; setenv ROOTPE_OCN 0;
setenv NTASKS_CPL 128; setenv NTHRDS_CPL 1; setenv ROOTPE_CPL 0;



alas, a different file format.

OK, looking in the wrong place.


!-- -->
!-- The following values should not be set by the user since they'll be -->
!-- overwritten by scripts. -->
!-- TOTALPES -->
!-- CCSM_PCOST -->
!-- CCSM_ESTCOST -->
!-- PES_LEVEL -->
!-- MAX_TASKS_PER_NODE -->
!-- PES_PER_NODE -->
!-- CCSM_TCOST -->
!-- CCSM_ESTCOST -->
!--


Looks like we should be going after


entry id="NTASKS_ATM" value="32" />
entry id="NTHRDS_ATM" value="1" />
entry id="ROOTPE_ATM" value="0" />

entry id="NTASKS_LND" value="32" />
entry id="NTHRDS_LND" value="1" />
entry id="ROOTPE_LND" value="0" />

entry id="NTASKS_ICE" value="32" />
entry id="NTHRDS_ICE" value="1" />
entry id="ROOTPE_ICE" value="0" />

entry id="NTASKS_OCN" value="32" />
entry id="NTHRDS_OCN" value="1" />
entry id="ROOTPE_OCN" value="0" />

entry id="NTASKS_CPL" value="32" />
entry id="NTHRDS_CPL" value="1" />
entry id="ROOTPE_CPL" value="0" />



and the NTASKS is really the variable we control. Unlike older CAM, we need to set these at build time, apparently.

I think I'll submit a 16way 32 as well as try3

priority is very low right now so won't find out for a while.

More tomorrow I guess.

No comments:

Post a Comment