Thursday, December 16, 2010

Build successful: how to run?

This is new:


Thu Dec 16 13:27:36 CST 2010 /work/00671/tobis/CAM_3/run/ccsm.bldlog.101216-130725
- Locking file env_build.xml
- Locking file Macros.prototype_ranger
CCSM BUILDEXE SCRIPT HAS FINISHED SUCCESSFULLY


and has to be considered good news.

Now the "quick start" seems to have me in the scripts directory issuing

qsub $CASE.$MACH.run

but how could that work? CCSM doesn't know my account number. All of the hash commands to the runtime environment are missing in $CASE.$MACH.run . I will try just splicing them in manually.

How many PE's?

env_mach_pes.xml says (open angle brackets elided):


!-- -->
!-- These variables CANNOT be modified once configure -case has been -->
!-- invoked without first invoking configure -cleanmach. -->
!-- -->
!-- See README/readme_env and README/readme_general for details -->


...

entry id="TOTALPES" value="32" />
entry id="PES_LEVEL" value="1r" />
entry id="MAX_TASKS_PER_NODE" value="4" />
entry id="PES_PER_NODE" value="$MAX_TASKS_PER_NODE" />


but my prior qsubscript says


#$ -pe 16way 64


Second attempt, then, leave the -pe out; see if it compensates somehow.

so:


#$ -V
#$ -cwd
#$ -j y
#$ -A A-ig2
#$ -l h_rt=00:30:00
#$ -q normal
#$ -N spinup-CCSM
#$ -o ./$JOB_NAME.out


Not sure about the -cwd either...


------------> Rejecting job <------------
Please specify a parallel environment.
Syntax: -pe
Example: #$ -pe 16way 48
To see a list of defined pes: qconf -spl
-----------------------------------------


should I go for 4way 32 or 16way 32 ?

I though they had gotten somewhere on ranger.

Trying 4way 32 which will ask for 8 nodes when 2 would do, I think.

OK, it is in the queue now

find Juli's script example:


#$ -V
# {inherit submission environment}
#$ -cwd
# {use submission directory}
#$ -N myCCSM
# {jobname (myCCSM)}
#$ -j y
# {join stderr and stdout}
#$ -o $JOB_NAME.o$JOB_ID
# {output name jobname.ojobid
#$ -pe 16way 1024
# {use 16 cores/node, 1024 cores total}
#$ -q normal
# {queue name}
#$ -l h_rt=05:30:00
# {request 4 hours}
#$ -M juliana@ucar.edu
# {UNCOMMENT & insert Email address}
#$ -m be
# {UNCOMMENT email at Begin/End of job}
set echo #{echo cmds, use "set echo" in csh}
# {account number}
#$ -A TG-CCR090010

# ----------------------------------------
# PE LAYOUT:
# total number of tasks = 1024
# maximum threads per task = 1
# cpl ntasks=128 nthreads=1 rootpe=0
# cam ntasks=1024 nthreads=1 rootpe=0
# clm ntasks=128 nthreads=1 rootpe=0
# cice ntasks=160 nthreads=1 rootpe=0
# pop2 ntasks=32 nthreads=1 rootpe=0
#
# total number of hw pes = 1024
# cpl hw pe range ~ from 0 to 127
# cam hw pe range ~ from 0 to 1023
# clm hw pe range ~ from 0 to 127
# cice hw pe range ~ from 0 to 159
# pop2 hw pe range ~ from 0 to 31
# ----------------------------------------
#-----------------------------------------------------------------------
# Determine necessary environment variables
#-----------------------------------------------------------------------



her env_mach_pes:



setenv NTASKS_ATM 1024; setenv NTHRDS_ATM 1; setenv ROOTPE_ATM 0;
setenv NTASKS_LND 128; setenv NTHRDS_LND 1; setenv ROOTPE_LND 0;
setenv NTASKS_ICE 160; setenv NTHRDS_ICE 1; setenv ROOTPE_ICE 0;
setenv NTASKS_OCN 32; setenv NTHRDS_OCN 1; setenv ROOTPE_OCN 0;
setenv NTASKS_CPL 128; setenv NTHRDS_CPL 1; setenv ROOTPE_CPL 0;



alas, a different file format.

OK, looking in the wrong place.


!-- -->
!-- The following values should not be set by the user since they'll be -->
!-- overwritten by scripts. -->
!-- TOTALPES -->
!-- CCSM_PCOST -->
!-- CCSM_ESTCOST -->
!-- PES_LEVEL -->
!-- MAX_TASKS_PER_NODE -->
!-- PES_PER_NODE -->
!-- CCSM_TCOST -->
!-- CCSM_ESTCOST -->
!--


Looks like we should be going after


entry id="NTASKS_ATM" value="32" />
entry id="NTHRDS_ATM" value="1" />
entry id="ROOTPE_ATM" value="0" />

entry id="NTASKS_LND" value="32" />
entry id="NTHRDS_LND" value="1" />
entry id="ROOTPE_LND" value="0" />

entry id="NTASKS_ICE" value="32" />
entry id="NTHRDS_ICE" value="1" />
entry id="ROOTPE_ICE" value="0" />

entry id="NTASKS_OCN" value="32" />
entry id="NTHRDS_OCN" value="1" />
entry id="ROOTPE_OCN" value="0" />

entry id="NTASKS_CPL" value="32" />
entry id="NTHRDS_CPL" value="1" />
entry id="ROOTPE_CPL" value="0" />



and the NTASKS is really the variable we control. Unlike older CAM, we need to set these at build time, apparently.

I think I'll submit a 16way 32 as well as try3

priority is very low right now so won't find out for a while.

More tomorrow I guess.

Wednesday, December 15, 2010

Two changes

Two changes in Macros.prototype_ranger will probably correspond to leaping the latest hurdle. Whether that yields a useful result in the end remains to be seen.


115c115
< INCLDIR := -I./usr/include
---
> INCLDIR := -I. /usr/include
152c152
< FFLAGS := $(CPPDEFS) -i4 -gopt -Mlist -time -Mextend -byteswapio
---
> FFLAGS := $(CPPDEFS) -i4 -target=linux -gopt -Mlist -time -Mextend -byteswapio


Isn't this intellectually satisfying work? Far better than being at AGU.

Note that blank WAS IN THE DISTRIBUTION.

Nope


CCSM BUILDEXE SCRIPT STARTING
- Build Libraries: mct pio csm_share
Wed Dec 15 17:03:34 CST 2010 /work/00671/tobis/CAM_A2/mct/mct.bldlog.101215-170331
Wed Dec 15 17:05:36 CST 2010 /work/00671/tobis/CAM_A2/pio/pio.bldlog.101215-170331
Wed Dec 15 17:06:51 CST 2010 /work/00671/tobis/CAM_A2/csm_share/csm_share.bldlog.101215-170331
Wed Dec 15 17:07:52 CST 2010 /work/00671/tobis/CAM_A2/run/cpl.bldlog.101215-170331
Wed Dec 15 17:07:52 CST 2010 /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
ERROR: cam.buildexe.csh failed, see /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
ERROR: cat /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
login4% cat /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
Wed Dec 15 17:07:52 CST 2010 /work/00671/tobis/CAM_A2/run/atm.bldlog.101215-170331
cat: Srcfiles: No such file or directory
/work/00671/tobis/CESM_SRC/ccsm4_0/scripts/CAM_A2/Tools/mkSrcfiles > /work/00671/tobis/CAM_A2/atm/obj/Srcfiles
cp -f /work/00671/tobis/CAM_A2/atm/obj/Filepath /work/00671/tobis/CAM_A2/atm/obj/Deppath
/work/00671/tobis/CESM_SRC/ccsm4_0/scripts/CAM_A2/Tools/mkDepends Deppath Srcfiles > /work/00671/tobis/CAM_A2/atm/obj/Depends
mpif90 -c -I. /usr/include -I/opt/apps/pgi7_1/netcdf/3.6.2/include -I/opt/apps/pgi7_1/netcdf/3.6.2/include -I/opt/apps/pgi7_1/mvapich2/1.0/include -I. -I/work/00671/tobis/CESM_SRC/ccsm4_0/scripts/CAM_A2/SourceMods/src.cam -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/chemistry/bulk_aero -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/chemistry/utils -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/physics/cam -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/dynamics/eul -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/cpl_mct -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/control -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/utils -I/work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/advection/slt -I/work/00671/tobis/CAM_A2/lib/include -DCO2A -DMAXPATCH_PFT=numpft+1 -DLSMLAT=1 -DLSMLON=1 -DPLON=128 -DPLAT=64 -DPLEV=26 -DPCNST=3 -DPCOLS=16 -DPTRM=42 -DPTRN=42 -DPTRK=42 -DSPMD -DMCT_INTERFACE -DHAVE_MPI -DCO2A -DLINUX -DSEQ_ -DFORTRANUNDERSCORE -DNO_SHR_VMATH -DNO_R16 -i4 -target=linux -gopt -Mlist -time -Mextend -byteswapio -O2 -Mvect=nosse -Kieee -O2 -Mvect=nosse -Kieee -Mfree /work/00671/tobis/CESM_SRC/ccsm4_0/models/atm/cam/src/control/cam_logfile.F90
pgf90-Error-Unknown switch: -target=linux
gmake: *** [cam_logfile.o] Error 1


Taking out the "-linux" and removing the space in "-I. /usr/include" does seem to create a .o file with no objections.

How this got to be in the distribution I don't know.

Now, apparently have to hack the Makefile...

But NCAR does this in some bizarre way too... Suppose I should look for FORTRANUNDERSCORE

Update

setenv DIN_LOC_ROOT_CSMDATA $WORK/inputdata # put it where it wants it
setenv DIN_LOC_ROOT $WORK/inputdata # have it both ways
setenv CCSMROOT `pwd`
setenv MACH prototype_ranger
setenv CASEROOT `pwd`/CAM_Alone
setenv CASE CAM_Alone # not mentioned in instructions
setenv RES T42_T42
setenv COMPSET F_2000
cd ccsm4_0/scripts
create_newcase -case $CASEROOT -mach $MACH -compset $COMPSET -res $RES
cd $CASEROOT # not mentioned in instructions
./configure -case
$CASE.$MACH.build # you may need to prepend a dot and a slash

OK, I have all the files I guess but the build still fails on the ocnvenient auto-download.

Oops, looks like I just missed one for some reason.

Haha, building at last. MCT done, PIO in progress.

Preusmably ESMF will kill me, right?

Hell

Hell is other people's code.

Tuesday, December 14, 2010

After much moaning

OK the slab model seems to be running. I'll give a complete play-by=play=[

Now to take on CCSM, a product which may be easier to use given that I have an account on an official target platform.

FIrst, I need to find the NAME of the machine. I saw it once. It was prototype_ranger or something. Grep may take forever.

Yep; I guess I still have some brain cells left.



> find . -name "*ranger*"
...
./scripts/ccsm_utils/Machines/mkbatch.prototype_ranger
./scripts/ccsm_utils/Machines/Macros.prototype_ranger
./scripts/ccsm_utils/Machines/env_machopts.prototype_ranger


so

setenv DIN_LOC_ROOT_CSMDATA $WORK/ccsmin
cd ccsm4_0
setenv CCSMROOT `pwd`
setenv MACH prototype_ranger

# mkdir CAM_Alone = Do NOT do this !!! => Caseroot directory /work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone already exists

setenv CASEROOT `pwd`/CAM_Alone
setenv RES T42_T42
setenv COMPSET F_2000
create_newcase -case $CASEROOT -mach $MACH -compset $COMPSET -res $RES


Now snagged on auto-download of initial conditions files. AUthentication needed. As I recall it was wide open, but I don't remember what it was.

Found it in email. It appears to be the same for every user; but I'm not going to be the one to post it on a web page.

transcript has the following ugly appearance:


export https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/atm/cam/physprops/dust2_camrt_c080918.nc /work/00671/tobis/inputdata/atm/cam/physprops/dust2_camrt_c080918.nc ..... svn: REPORT request failed on '/!svn/vcc/default'
svn:
Cannot replace a directory from within

export https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/atm/cam/physprops/dust3_camrt_c080918.nc /work/00671/tobis/inputdata/atm/cam/physprops/dust3_camrt_c080918.nc ..... svn: REPORT request failed on '/!svn/vcc/default'
svn:
Cannot replace a directory from within

etc. etc. many times over. Is this fetching? WHo knows?

OK, no. In fact, unbelievably bad. It assumed (despite my name choice) that I wanted it in $WORK/inputdata. I do not know how this happened!

I have NO IDEA where it got $WORK/inputdata . I told it NOTHING about $WORK or inputdata !

Thjere seems to be some confusion in the docs with $DIN_LOC_ROOT in the files and $DIN_LOC_ROOT_CSMDATA in the docs.

"For supported machines this variable is preset". Does that include "prototype_ranger"?

Anyway, I try the alternative, using check_input_data (which really should be called checkin_input_data; it is not checking anything! Same error.

Googling for teh error message yields something about merges. So I try wget on one of the files.

d'oh


ERROR: certificate common name `localhost.localdomain' doesn't match requested host name `svn-ccsm-inputdata.cgd.ucar.edu'.
To connect to svn-ccsm-inputdata.cgd.ucar.edu insecurely, use `--no-check-certificate'.


SOmebody tell me I am dealing with grownups here!

Eventually I succeed with

wget https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc --no-check-certificate --http-user=EASY_TO_GUESS --http-password=ALMOST_AS_EASY


To my surprise nobody squawks.

So the next thing to do is to build a script to download all the stuff that check_input_data was supposed to get:

At least it handily reports:


/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/cam.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/clm.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/cice.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/cpl.input_data_list
/work/00671/tobis/CESM_SRC/ccsm4_0/CAM_Alone/Buildconf/docn.input_data_list
File is missing: /work/00671/tobis/inputdata/atm/cam/chem/trop_mozart_aero/aero/aero_1.9x2.5_L26_2000clim_c090803.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/inic/gaus/cami_0000-01-01_64x128_T42_L26_c031110.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/topo/USGS-gtopo30_64x128_c050520.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ozone/ozone_1.9x2.5_L26_2000clim_c090803.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/chem/trop_mozart/ub/clim_p_trop.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/sulfate_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust1_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust2_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust3_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/dust4_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/bcpho_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/bcphi_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/ocpho_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/ocphi_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/ssam_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/physprops/sscm_camrt_c080918.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/pftdata/pft-physiology.c100226
File is missing: /work/00671/tobis/inputdata/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/surfdata/surfdata_64x128_simyr2000_c090928.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/griddata/griddata_64x128_060829.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/snicardata/aerosoldep_monthly_1990s_mean_64x128_c080410.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/griddata/fracdata_64x128_USGS_070110.nc
File is missing: /work/00671/tobis/inputdata/lnd/clm2/rtmdata/rdirc.05.061026
File is missing: /work/00671/tobis/inputdata/ice/cice/aerosoldep_monthly_2000_mean_1.9x2.5_c090421.nc
File is missing: /work/00671/tobis/inputdata/ice/cice/aerosoldep_monthly_2000_mean_1.9x2.5_c090421.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc
File is missing: /work/00671/tobis/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/atm/cam/ocnfrac/domain.camocn.64x128_USGS_070807.nc
File is missing: /work/00671/tobis/inputdata/ocn/docn7/SSTDATA/sst_HadOIBl_bc_64x128_clim_c050526.nc

Tuesday, December 7, 2010

prawn build

Am I gaining tolerance for this garbage?

Well, yesterday I couldn't face it at all. I just sort of cowered and avoided work.

Today, however, I managed the infamous prawn build on a new platform in only eight or nine tries.

First, find the files. Then type make. Fails per expectations. Set up missing environment variables for netcdf.

Fails cryptically. Discover that while pgf90 is obviously portland fortran, cc is not pgcc. Hack the makefile.

Fails, unable to include netcdf.inc . Mysterious, as the include path is correctly set from the first step. Find netcdf.inc and copy it to working directory

Success!

Your tax dollars at work. FML.