Gail suggests going to 64 pes
Suspicion attends to "MAX_TASKS_PER_NODE" value="4" ; I changed it to 16. It shoul;d probably be either 16 or 1.
also the telltale failure to run "module" is a module load netcdf/3.6.2
I did this manually.
This adds three variables. So if it works there are three things to unwind.
bah - tried configure -cleanmach, but it remembered the old CASEROOT. Now I;ve clobbered both runs!
Monday, December 20, 2010
Sure enough, you can;t set the shell in a script called by the qsub script; you have to specifiy it in the qsubscript.
I now have sort of got it running:
To be clear, I have now loaded the executable, which promptly died without leaving a clue as to why anywhere that is obvious. Of course, who knows where it thinks it ought to leave the clue. I have set "find" the task of finding files created over the weekend. It is amazingly slow, though.
This still amounts to progress: after a week I have actually got the thing to lurch to life and die.
Life in the fast lane.
...
AHA!
$WORK/CAM_3/run/ccsm.log.101217-174114
it says, 32 times,
OK, before I go whining around, I will try to redo everything.
I now have sort of got it running:
CCSM PRESTAGE SCRIPT STARTING
- CCSM input data directory, DIN_LOC_ROOT_CSMDATA, is /work/00671/tobis/inputdata
- Case input data directory, DIN_LOC_ROOT, is /work/00671/tobis/inputdata
- Checking the existence of input datasets in DIN_LOC_ROOT
CCSM PRESTAGE SCRIPT HAS FINISHED SUCCESSFULLY
rm: No match.
Fri Dec 17 17:41:16 CST 2010 -- CSM EXECUTION BEGINS HERE
Fri Dec 17 17:41:20 CST 2010 -- CSM EXECUTION HAS FINISHED
ls: No match.
Model did not complete - no cpl.log file present - exiting
TACC: Cleaning up after job: 1731370
TACC: Done.
To be clear, I have now loaded the executable, which promptly died without leaving a clue as to why anywhere that is obvious. Of course, who knows where it thinks it ought to leave the clue. I have set "find" the task of finding files created over the weekend. It is amazingly slow, though.
This still amounts to progress: after a week I have actually got the thing to lurch to life and die.
Life in the fast lane.
...
AHA!
$WORK/CAM_3/run/ccsm.log.101217-174114
it says, 32 times,
MPI_Group_range_incl(170).........: MPI_Group_range_incl(group=0x88000000, n=1, ranges=0x48edec0, new_group=0x7fffc8173f2c) failed
MPIR_Group_check_valid_ranges(302): The 0th element of a range array ends at 31 but must be nonnegative and less than 1
MPI process terminated unexpectedly
OK, before I go whining around, I will try to redo everything.
Friday, December 17, 2010
Pretty weird
You didn;t expect it to actually run did you?
But the failure is damned peculiar
#!/bin/csh -f
...
foreach i (env_case.xml env_run.xml env_conf.xml env_build.xml env_mach_pes.xml)
...
fails with
TACC: Done.
./Tools/ccsm_getenv: line 9: syntax error near unexpected token `('
./Tools/ccsm_getenv: line 9: `foreach i (env_case.xml env_run.xml env_conf.xml
env_build.xml env_mach_pes.xml)'
TACC: Cleaning up after job: 1729536
TACC: Done.
The thing is, it's perfectly valid csh; the error message is the one bash would issue!
Thursday, December 16, 2010
Build successful: how to run?
This is new:
and has to be considered good news.
Now the "quick start" seems to have me in the scripts directory issuing
qsub $CASE.$MACH.run
but how could that work? CCSM doesn't know my account number. All of the hash commands to the runtime environment are missing in $CASE.$MACH.run . I will try just splicing them in manually.
How many PE's?
env_mach_pes.xml says (open angle brackets elided):
but my prior qsubscript says
Second attempt, then, leave the -pe out; see if it compensates somehow.
so:
Not sure about the -cwd either...
should I go for 4way 32 or 16way 32 ?
I though they had gotten somewhere on ranger.
Trying 4way 32 which will ask for 8 nodes when 2 would do, I think.
OK, it is in the queue now
find Juli's script example:
her env_mach_pes:
alas, a different file format.
OK, looking in the wrong place.
Looks like we should be going after
and the NTASKS is really the variable we control. Unlike older CAM, we need to set these at build time, apparently.
I think I'll submit a 16way 32 as well as try3
priority is very low right now so won't find out for a while.
More tomorrow I guess.
Thu Dec 16 13:27:36 CST 2010 /work/00671/tobis/CAM_3/run/ccsm.bldlog.101216-130725
- Locking file env_build.xml
- Locking file Macros.prototype_ranger
CCSM BUILDEXE SCRIPT HAS FINISHED SUCCESSFULLY
and has to be considered good news.
Now the "quick start" seems to have me in the scripts directory issuing
qsub $CASE.$MACH.run
but how could that work? CCSM doesn't know my account number. All of the hash commands to the runtime environment are missing in $CASE.$MACH.run . I will try just splicing them in manually.
How many PE's?
env_mach_pes.xml says (open angle brackets elided):
!-- -->
!-- These variables CANNOT be modified once configure -case has been -->
!-- invoked without first invoking configure -cleanmach. -->
!-- -->
!-- See README/readme_env and README/readme_general for details -->
...
entry id="TOTALPES" value="32" />
entry id="PES_LEVEL" value="1r" />
entry id="MAX_TASKS_PER_NODE" value="4" />
entry id="PES_PER_NODE" value="$MAX_TASKS_PER_NODE" />
but my prior qsubscript says
#$ -pe 16way 64
Second attempt, then, leave the -pe out; see if it compensates somehow.
so:
#$ -V
#$ -cwd
#$ -j y
#$ -A A-ig2
#$ -l h_rt=00:30:00
#$ -q normal
#$ -N spinup-CCSM
#$ -o ./$JOB_NAME.out
Not sure about the -cwd either...
------------> Rejecting job <------------
Please specify a parallel environment.
Syntax: -pe
Example: #$ -pe 16way 48
To see a list of defined pes: qconf -spl
-----------------------------------------
should I go for 4way 32 or 16way 32 ?
I though they had gotten somewhere on ranger.
Trying 4way 32 which will ask for 8 nodes when 2 would do, I think.
OK, it is in the queue now
find Juli's script example:
#$ -V
# {inherit submission environment}
#$ -cwd
# {use submission directory}
#$ -N myCCSM
# {jobname (myCCSM)}
#$ -j y
# {join stderr and stdout}
#$ -o $JOB_NAME.o$JOB_ID
# {output name jobname.ojobid
#$ -pe 16way 1024
# {use 16 cores/node, 1024 cores total}
#$ -q normal
# {queue name}
#$ -l h_rt=05:30:00
# {request 4 hours}
#$ -M juliana@ucar.edu
# {UNCOMMENT & insert Email address}
#$ -m be
# {UNCOMMENT email at Begin/End of job}
set echo #{echo cmds, use "set echo" in csh}
# {account number}
#$ -A TG-CCR090010
# ----------------------------------------
# PE LAYOUT:
# total number of tasks = 1024
# maximum threads per task = 1
# cpl ntasks=128 nthreads=1 rootpe=0
# cam ntasks=1024 nthreads=1 rootpe=0
# clm ntasks=128 nthreads=1 rootpe=0
# cice ntasks=160 nthreads=1 rootpe=0
# pop2 ntasks=32 nthreads=1 rootpe=0
#
# total number of hw pes = 1024
# cpl hw pe range ~ from 0 to 127
# cam hw pe range ~ from 0 to 1023
# clm hw pe range ~ from 0 to 127
# cice hw pe range ~ from 0 to 159
# pop2 hw pe range ~ from 0 to 31
# ----------------------------------------
#-----------------------------------------------------------------------
# Determine necessary environment variables
#-----------------------------------------------------------------------
her env_mach_pes:
setenv NTASKS_ATM 1024; setenv NTHRDS_ATM 1; setenv ROOTPE_ATM 0;
setenv NTASKS_LND 128; setenv NTHRDS_LND 1; setenv ROOTPE_LND 0;
setenv NTASKS_ICE 160; setenv NTHRDS_ICE 1; setenv ROOTPE_ICE 0;
setenv NTASKS_OCN 32; setenv NTHRDS_OCN 1; setenv ROOTPE_OCN 0;
setenv NTASKS_CPL 128; setenv NTHRDS_CPL 1; setenv ROOTPE_CPL 0;
alas, a different file format.
OK, looking in the wrong place.
!-- -->
!-- The following values should not be set by the user since they'll be -->
!-- overwritten by scripts. -->
!-- TOTALPES -->
!-- CCSM_PCOST -->
!-- CCSM_ESTCOST -->
!-- PES_LEVEL -->
!-- MAX_TASKS_PER_NODE -->
!-- PES_PER_NODE -->
!-- CCSM_TCOST -->
!-- CCSM_ESTCOST -->
!--
Looks like we should be going after
entry id="NTASKS_ATM" value="32" />
entry id="NTHRDS_ATM" value="1" />
entry id="ROOTPE_ATM" value="0" />
entry id="NTASKS_LND" value="32" />
entry id="NTHRDS_LND" value="1" />
entry id="ROOTPE_LND" value="0" />
entry id="NTASKS_ICE" value="32" />
entry id="NTHRDS_ICE" value="1" />
entry id="ROOTPE_ICE" value="0" />
entry id="NTASKS_OCN" value="32" />
entry id="NTHRDS_OCN" value="1" />
entry id="ROOTPE_OCN" value="0" />
entry id="NTASKS_CPL" value="32" />
entry id="NTHRDS_CPL" value="1" />
entry id="ROOTPE_CPL" value="0" />
and the NTASKS is really the variable we control. Unlike older CAM, we need to set these at build time, apparently.
I think I'll submit a 16way 32 as well as try3
priority is very low right now so won't find out for a while.
More tomorrow I guess.
Wednesday, December 15, 2010
Two changes
Two changes in Macros.prototype_ranger will probably correspond to leaping the latest hurdle. Whether that yields a useful result in the end remains to be seen.
Isn't this intellectually satisfying work? Far better than being at AGU.
Note that blank WAS IN THE DISTRIBUTION.
115c115
< INCLDIR := -I./usr/include
---
> INCLDIR := -I. /usr/include
152c152
< FFLAGS := $(CPPDEFS) -i4 -gopt -Mlist -time -Mextend -byteswapio
---
> FFLAGS := $(CPPDEFS) -i4 -target=linux -gopt -Mlist -time -Mextend -byteswapio
Isn't this intellectually satisfying work? Far better than being at AGU.
Note that blank WAS IN THE DISTRIBUTION.
Nope
CCSM BUILDEXE SCRIPT STARTING
- Build Libraries: mct pio csm_share
Wed Dec 15 17:03:34 CST 2010 /work/00671/tobis/CAM_A2/mct/mct.bldlog.101215-170331
Wed Dec 15 17:05:36 CST 2010 /work/00671/tobis/CAM_A2/pio/pio.bldlog.101215-170331
Wed Dec 15 17:06:51 CST 2010 /work/00671/tobis/CAM_A2/csm_share/csm_share.bldlog.101215-170331
Wed Dec 15 17:07:52 CST 2010 /work/00671/tobis/CAM_A2/run/cpl.bldlog.101215-170331
Wed Dec 15 17:07:52 CST 2010 /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
ERROR: cam.buildexe.csh failed, see /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
ERROR: cat /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
login4% cat /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
Wed Dec 15 17:07:52 CST 2010 /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
cat: Srcfiles: No such file or directory
/work/00671/tobis/CESM_SRC/ccsm4_0/scripts/CAM_A2/Tools/mkSrcfiles > /work/00671/tobis/CAM_A2/atm/obj/Srcfiles
cp -f /work/00671/tobis/CAM_A2/atm/obj/Filepath /work/00671/tobis/CAM_A2/atm/obj/Deppath
/work/00671/tobis/CESM_SRC/ccsm4_0/scripts/CAM_A2/Tools/mkDepends Deppath Srcfiles > /work/00671/tobis/CAM_A2/atm/obj/Depends
mpif90 -c -I. /usr/include -I/opt/apps/pgi7_1/netcdf/3.6.2/include -I/opt/apps/pgi7_1/netcdf/3.6.2/include -I/opt/apps/pgi7_1/mvapich2/1.0/include -I. -I/work/00671/tobis/CESM_SRC/ccsm4_0/scripts/CAM_A2/SourceMods/src.cam -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/chemistry/bulk_aero -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/chemistry/utils -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/physics/cam -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/dynamics/eul -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/cpl_mct -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/control -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/utils -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/advection/slt -I/work/00671/tobis/CAM_A2/lib/include -DCO2A -DMAXPATCH_PFT=numpft+1 -DLSMLAT=1 -DLSMLON=1 -DPLON=128 -DPLAT=64 -DPLEV=26 -DPCNST=3 -DPCOLS=16 -DPTRM=42 -DPTRN=42 -DPTRK=42 -DSPMD -DMCT_INTERFACE -DHAVE_MPI -DCO2A -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_SHR_VMATH -DNO_R16 -i4 -target=linux -gopt -Mlist -time -Mextend -byteswapio -O2 -Mvect=nosse -Kieee -O2 -Mvect=nosse -Kieee -Mfree /work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/control/cam_logfile.F90
pgf90-Error-Unknown switch: -target=linux
gmake: *** [cam_logfile.o] Error 1
Taking out the "-linux" and removing the space in "-I. /usr/include" does seem to create a .o file with no objections.
How this got to be in the distribution I don't know.
Now, apparently have to hack the Makefile...
But NCAR does this in some bizarre way too... Suppose I should look for FORTRANUNDERSCORE
Update
setenv DIN_LOC_ROOT_CSMDATA $WORK/inputdata # put it where it wants it
setenv DIN_LOC_ROOT $WORK/inputdata # have it both ways
setenv CCSMROOT `pwd`
setenv MACH prototype_ranger
setenv CASEROOT `pwd`/CAM_Alone
setenv CASE CAM_Alone # not mentioned in instructions
setenv RES T42_T42
setenv COMPSET F_2000
cd ccsm4_0/scripts
create_newcase -case $CASEROOT -mach $MACH -compset $COMPSET -res $RES
cd $CASEROOT # not mentioned in instructions
./configure -case
$CASE.$MACH.build # you may need to prepend a dot and a slash
OK, I have all the files I guess but the build still fails on the ocnvenient auto-download.
Oops, looks like I just missed one for some reason.
Haha, building at last. MCT done, PIO in progress.
Preusmably ESMF will kill me, right?
setenv DIN_LOC_ROOT $WORK/inputdata # have it both ways
setenv CCSMROOT `pwd`
setenv MACH prototype_ranger
setenv CASEROOT `pwd`/CAM_Alone
setenv CASE CAM_Alone # not mentioned in instructions
setenv RES T42_T42
setenv COMPSET F_2000
cd ccsm4_0/scripts
create_newcase -case $CASEROOT -mach $MACH -compset $COMPSET -res $RES
cd $CASEROOT # not mentioned in instructions
./configure -case
$CASE.$MACH.build # you may need to prepend a dot and a slash
OK, I have all the files I guess but the build still fails on the ocnvenient auto-download.
Oops, looks like I just missed one for some reason.
Haha, building at last. MCT done, PIO in progress.
Preusmably ESMF will kill me, right?
Tuesday, December 14, 2010
After much moaning
OK the slab model seems to be running. I'll give a complete play-by=play=[
Now to take on CCSM, a product which may be easier to use given that I have an account on an official target platform.
FIrst, I need to find the NAME of the machine. I saw it once. It was prototype_ranger or something. Grep may take forever.
Yep; I guess I still have some brain cells left.
so
Now snagged on auto-download of initial conditions files. AUthentication needed. As I recall it was wide open, but I don't remember what it was.
Found it in email. It appears to be the same for every user; but I'm not going to be the one to post it on a web page.
transcript has the following ugly appearance:
etc. etc. many times over. Is this fetching? WHo knows?
OK, no. In fact, unbelievably bad. It assumed (despite my name choice) that I wanted it in $WORK/inputdata. I do not know how this happened!
I have NO IDEA where it got $WORK/inputdata . I told it NOTHING about $WORK or inputdata !
Thjere seems to be some confusion in the docs with $DIN_LOC_ROOT in the files and $DIN_LOC_ROOT_CSMDATA in the docs.
"For supported machines this variable is preset". Does that include "prototype_ranger"?
Anyway, I try the alternative, using check_input_data (which really should be called checkin_input_data; it is not checking anything! Same error.
Googling for teh error message yields something about merges. So I try wget on one of the files.
d'oh
SOmebody tell me I am dealing with grownups here!
Eventually I succeed with
To my surprise nobody squawks.
So the next thing to do is to build a script to download all the stuff that check_input_data was supposed to get:
At least it handily reports:
Now to take on CCSM, a product which may be easier to use given that I have an account on an official target platform.
FIrst, I need to find the NAME of the machine. I saw it once. It was prototype_ranger or something. Grep may take forever.
Yep; I guess I still have some brain cells left.
> find . -name "*ranger*"
...
./scripts/ccsm_utils/Machines/mkbatch.prototype_ranger
./scripts/ccsm_utils/Machines/Macros.prototype_ranger
./scripts/ccsm_utils/Machines/env_machopts.prototype_ranger
so
setenv DIN_LOC_ROOT_CSMDATA $WORK/ccsmin
cd ccsm4_0
setenv CCSMROOT `pwd`
setenv MACH prototype_ranger
# mkdir CAM_Alone = Do NOT do this !!! => Caseroot directory /work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone already exists
setenv CASEROOT `pwd`/CAM_Alone
setenv RES T42_T42
setenv COMPSET F_2000
create_newcase -case $CASEROOT -mach $MACH -compset $COMPSET -res $RES
Now snagged on auto-download of initial conditions files. AUthentication needed. As I recall it was wide open, but I don't remember what it was.
Found it in email. It appears to be the same for every user; but I'm not going to be the one to post it on a web page.
transcript has the following ugly appearance:
export https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/atm/cam/physprops/dust2_camrt_c080918.nc /work/00671/tobis/inputdata/atm/cam/physprops/dust2_camrt_c080918.nc ..... svn: REPORT request failed on '/!svn/vcc/default'
svn:
Cannot replace a directory from within
export https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/atm/cam/physprops/dust3_camrt_c080918.nc /work/00671/tobis/inputdata/atm/cam/physprops/dust3_camrt_c080918.nc ..... svn: REPORT request failed on '/!svn/vcc/default'
svn:
Cannot replace a directory from within
etc. etc. many times over. Is this fetching? WHo knows?
OK, no. In fact, unbelievably bad. It assumed (despite my name choice) that I wanted it in $WORK/inputdata. I do not know how this happened!
I have NO IDEA where it got $WORK/inputdata . I told it NOTHING about $WORK or inputdata !
Thjere seems to be some confusion in the docs with $DIN_LOC_ROOT in the files and $DIN_LOC_ROOT_CSMDATA in the docs.
"For supported machines this variable is preset". Does that include "prototype_ranger"?
Anyway, I try the alternative, using check_input_data (which really should be called checkin_input_data; it is not checking anything! Same error.
Googling for teh error message yields something about merges. So I try wget on one of the files.
d'oh
ERROR: certificate common name `localhost.localdomain' doesn't match requested host name `svn-ccsm-inputdata.cgd.ucar.edu'.
To connect to svn-ccsm-inputdata.cgd.ucar.edu insecurely, use `--no-check-certificate'.
SOmebody tell me I am dealing with grownups here!
Eventually I succeed with
wget https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc --no-check-certificate --http-user=EASY_TO_GUESS --http-password=ALMOST_AS_EASY
To my surprise nobody squawks.
So the next thing to do is to build a script to download all the stuff that check_input_data was supposed to get:
At least it handily reports:
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/cam.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/clm.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/cice.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/cpl.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/docn.input_data_list
File is missing: /work/00671/tobis/inputdata/atm/cam/chem/trop_mozart_aero/aero/aero_1.9x2.5_L26_2000clim_c090803.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/inic/gaus/cami_0000-01-01_64x128_T42_L26_c031110.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/topo/USGS-gtopo30_64x128_c050520.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ozone/ozone_1.9x2.5_L26_2000clim_c090803.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/chem/trop_mozart/ub/clim_p_trop.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/sulfate_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust1_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust2_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust3_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust4_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/bcpho_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/bcphi_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/ocpho_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/ocphi_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/ssam_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/sscm_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/pftdata/pft-physiology.c100226
File is missing: /work/00671/tobis/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/surfdata/surfdata_64x128_simyr2000_c090928.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/griddata/griddata_64x128_060829.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/snicardata/aerosoldep_monthly_1990s_mean_64x128_c080410.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/griddata/fracdata_64x128_USGS_070110.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/rtmdata/rdirc.05.061026
File is missing: /work/00671/tobis/inputdata/ice/cice/aerosoldep_monthly_2000_mean_1.9x2.5_c090421.nc
File is missing: /work/00671/tobis/inputdata/ice/cice/aerosoldep_monthly_2000_mean_1.9x2.5_c090421.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc
File is missing: /work/00671/tobis/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc
Tuesday, December 7, 2010
prawn build
Am I gaining tolerance for this garbage?
Well, yesterday I couldn't face it at all. I just sort of cowered and avoided work.
Today, however, I managed the infamous prawn build on a new platform in only eight or nine tries.
First, find the files. Then type make. Fails per expectations. Set up missing environment variables for netcdf.
Fails cryptically. Discover that while pgf90 is obviously portland fortran, cc is not pgcc. Hack the makefile.
Fails, unable to include netcdf.inc . Mysterious, as the include path is correctly set from the first step. Find netcdf.inc and copy it to working directory
Success!
Your tax dollars at work. FML.
Thursday, December 2, 2010
Notes
It is also necessary to edit cam1/models/atm/cam/src/control
to replace
And similarly for the land model.
Also, restart files turn out NOT to be portable.
to replace
read (5,camexp,iostat=ierr)with
open(16,file='PATH_TO_FILE')because we can't read from stdin on ranger nodes (before doing the make of course)
read(16, camexp)
close(16)
And similarly for the land model.
Also, restart files turn out NOT to be portable.
Now moving onto the slab
Hey, that worked!
I'm actually running!
Things worth checking out:
- where is the land model getting its initialization, since I didn;t change that in the namelist.
- if I do change the namelist, does the result change?
- what is the proper way to configure the makefile, as opposed to going into line 190
- do I have restarts under control
- job names and all that
OK, now need to go back and change to the slab run. This is the old prawn fiasco:
here and here
I'm actually running!
Things worth checking out:
- where is the land model getting its initialization, since I didn;t change that in the namelist.
- if I do change the namelist, does the result change?
- what is the proper way to configure the makefile, as opposed to going into line 190
- do I have restarts under control
- job names and all that
OK, now need to go back and change to the slab run. This is the old prawn fiasco:
here and here
Recapitulating CAM3.1 on Ranger
Not sure why this works, or whether getting to the bottom of it is useful.
=========
HOW TO BUILD CAM UNDER PGF90 on RANGER:
use the default Portland Group settings for MPI
cd to the root of the CAM tree, then issue the following
then edit the Makefile, line 190, replacing $(FC) with mpif90 .
then type
NOTE: Build takes about 7 minutes.
===
RUNNING ON RANGER NODES
You'd think they'd provide an example.
Before running CAM, I try to establish how to run something. I got a random MPI source off the net; stupidly, it has interactive I/) so my first run was inconclusive.
Anyway, after several bashes at it, I got this script
which is passed to the queue submission command "qsub".
As far as I can figure
#$ -pe 16way 16
is the smallest possible allocation on ranger. And I'm only asking for ten minutes on the dev queue (also tried the normal queue). Yet it takes forever to get loaded.
qstat shows a job number but the queue is marked as empty. Should I worry about this?
SETTING UP THE RUN:
this namelist works for an initial run, based on a single CPU exepriment:
and using qsubscript
issue
obviously the input data set is in $WORK/inputdata
=========
HOW TO BUILD CAM UNDER PGF90 on RANGER:
use the default Portland Group settings for MPI
cd to the root of the CAM tree, then issue the following
unsetenv USER_FC
module load netcdf
setenv INC_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/include
setenv LIB_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/lib/
setenv INC_MPI /opt/apps/pgi7_2/mvapich/1.0.1/include
setenv LIB_MPI /opt/apps/pgi7_2/mvapich/1.0.1/lib
mkdir buildpar
cd buildpar
../cam1/models/atm/cam/bld/configure -spmd
then edit the Makefile, line 190, replacing $(FC) with mpif90 .
then type
make
NOTE: Build takes about 7 minutes.
===
RUNNING ON RANGER NODES
You'd think they'd provide an example.
Before running CAM, I try to establish how to run something. I got a random MPI source off the net; stupidly, it has interactive I/) so my first run was inconclusive.
Anyway, after several bashes at it, I got this script
#$ -V
#$ -cwd
#$ -j y
#$ -A A-ig2
#$ -l h_rt=00:10:00
#$ -q development
#$ -N test
#$ -o ./$JOB_NAME.out
#$ -pe 16way 16
ibrun ./a.out
which is passed to the queue submission command "qsub".
As far as I can figure
#$ -pe 16way 16
is the smallest possible allocation on ranger. And I'm only asking for ten minutes on the dev queue (also tried the normal queue). Yet it takes forever to get loaded.
qstat shows a job number but the queue is marked as empty. Should I worry about this?
SETTING UP THE RUN:
this namelist works for an initial run, based on a single CPU exepriment:
&camexp
absems_data = '/work/00671/tobis/inputdata/atm/cam/rad/abs_ems_factors_fastvx.c030508.nc'
aeroptics = '/work/00671/tobis/inputdata/atm/cam/rad/AerosolOptics_c040105.nc'
bnd_topo = '/work/00671/tobis/inputdata/atm/cam/topo/topo-from-cami_0000-09-01_64x128_L26_c030918.nc'
bndtvaer = '/work/00671/tobis/inputdata/atm/cam/rad/AerosolMass_V_64x128_clim_c031022.nc'
bndtvo = '/work/00671/tobis/inputdata/atm/cam/ozone/pcmdio3.r8.64x1_L60_clim_c970515.nc'
bndtvs = '/work/00671/tobis/inputdata/atm/cam/sst/sst_HadOIBl_bc_64x128_clim_c020411.nc'
caseid = 'camrun.bsi'
iyear_ad = 1950
mss_irt = 0
nrevsn = '/work/00671/tobis/camrun/restart/camrun.bsi.cam2.r.0021-01-01-00000'
rest_pfile = './cam2.camrun.bsi.rpointer'
ncdata = '../inputdata/init/control_initial.cam2.i.0013-01-01-00000.nc'
nestep = 586943
nsrest = 0
/
&clmexp
nrevsn = '/work/00671/tobis/camrun/restart/camrun.bsi.clm2.r.0021-01-01-00000'
rpntpath = './lnd.camrun.bsi.rpointer'
fpftcon = '/work/00671/tobis/inputdata/lnd/clm2/pftdata/pft-physiology'
fsurdat = '/work/00671/tobis/inputdata/lnd/clm2/srfdata/cam/clms_64x128_USGS_c030605.nc'
/
and using qsubscript
#$ -V
#$ -cwd
#$ -j y
#$ -A A-ig2
#$ -l h_rt=03:10:00
#$ -q normal
#$ -N testCAM3
#$ -o ./$JOB_NAME.out
#$ -pe 16way 64
ibrun ./cam
issue
qsub < qsubscript
obviously the input data set is in $WORK/inputdata
Wednesday, December 1, 2010
Many distractions today
But the single cpu version did actually run.
trick is to find the .nc files in inputdata, and set ncdata to point there. Then set the restart mode to zero.
WIll now need to save some restart files, and try to run in parallel.
For some reason the land model component didn't need the parallel fix to namelist. ???
To try: update the namelist for land model initialization; see if it makes any difference.
Subscribe to:
Posts (Atom)