Tuesday, November 30, 2010

CESM Download

Download to home directory failed: file permissions

Download to $WORK ok.

Guessing about $COMPSET, using /work/00671/tobis/CESM_SRC/ccsm4_0/

machinefile looks dubious:
prototype_ranger (TACC Linux Cluster, Linux (pgi), 1 pes/node, batch system is SGE)

I thought it was 16 pes/node.


Meanwhile serial CAM3 reads namelist OK, fails on restart read. Need to change to initialization run.

Monday, November 22, 2010

Building CAM again

Trying to build CAM w/o reference to notes.

Haven't looked at this for two years.

unzipped and untarred

Here is the directory structure out of the box:


Configuration: here

Instructions say to run configure, but it's immediately not obvious where that is. There are four files called configure, but you have to guess that


is right.

So, of the options to configure, the first one to cause any difficulty would be

-cc name
name specifies the C compiler. This allows the user to override the default setting in the Makefile (Linux only). The C compiler can also be specified by setting the environment variable USER_CC. pgcc if using pgf90, otherwise use cc


-fc name
name specifies the Fortran compiler. This allows the user to override the default setting in the Makefile. The Fortran compiler can also be specified by setting the environment variable USER_FC. OS dependent

OK, let's see what f90 we have available. Hmm.

login4% man f90
No manual entry for f90
login4% which f90
f90: Command not found.
login4% which fortran
fortran: Command not found.
login4% which f77
login4% man f77
No manual entry for f77

umm? Here

we get ifort with icc, or pgf95 with pgcc or sunf90 or sunf95 with sun_cc

I haven't heard of NCAR components running under sun, so let's try the other two.
So we can set USER_CC and USER_FC

The next problem is the MPI version, always a head-scratcher.

-mpi_inc dir
dir is the directory that contains the MPI library include files. Only SPMD versions of CAM require MPI. The MPI include directory can also be specified by setting the environment variable INC_MPI. /usr/local/include except on IBM systems. The IBM Fortran compilers mpxlf90 and mpxlf90_r have the MPI include file location built in.
module avail yields
--------------------------------------- /opt/apps/pgi7_2/modulefiles ---------------------------------------
acml/4.1.0 gotoblas2/1.05 (default) mvapich2-debug/1.2
autodock/4.0.1 hdf5/1.6.5 mvapich2-new/1.2
fftw3/3.1.2 hecura-debug/1.5rc2 mvapich2/1.2
glpk/4.40 hecura/1.5.1 nco/3.9.5
gotoblas/1.26 (default) metis/4.0 netcdf/3.6.2
gotoblas/1.30 mvapich-old/1.0.1 openmpi/1.3
gotoblas2/1.00 mvapich/1.0.1

not helpful. But see "Compiling Parallel Programs with MPI" here. This also recommends intel or pgi, but seems inconsistent about which is installed.

login4% which mpif90
login4% which mpicc

So that we get pgi by default. OK, good enough for me, though we've been running intel10 on lonestar. "The compiler and MVAPICH library are selected according to the modules that have been loaded." But no modules loaded. Trying module load intel gives useful info:

Error: You can only have one compiler module loaded at time.
You already have pgi loaded.
To correct the situation, please enter the following command:

module swap pgi intel

This is tedious but going well so far. Still don;t know if the MPI modules will be found; time will tell, I guess, but it looks likely that $MPICH_HOME will need to be specified for this stuff

Similarly with netcdf

Saturday, November 20, 2010

The struggle recapitulated


Note: I am talking to myself. Will move these to another place soon.

setenv USER_CC pgcc
setenv USER_FC mpif90
module load netcdf
setenv INC_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/include
setenv LIB_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/lib/
cd cam1/models/atm/cam/bld

Trying WITHOUT setting LIB_MPI or INC_MPI ; for reference they are in


Basis for not setting them:

Directory containing the MPI include files. This is only required when CAM is built with SPMD enabled.

Directory containing the MPI library. This is only required when CAM is built with SPMD enabled.

-spmd enables an SPMD configuration of CAM (via MPI). -nospmd disables an SPMD configuration of CAM. SPMD is enabled by default only on IBM systems.


** Invalid build directory: /share/home/00671/tobis/CAM3Bld/cam1/models/atm/cam/bld
** The specified build directory is the same as the configuration script
** directory. This is not allowed because the Makefile produced by configure
** would overwrite the standard Makefile. Use a different build directory.

go back to CAM root

cd ~/CAM3Bld
mkdir build
cd build/

No complaints from configure. This makes 6 files and an empty esmf directory. One of the files is a Makefile. So



cd /share/home/00671/tobis/CAM3Bld/cam1/models/utils/esmf; \
echo "Build the ESMF library."; \
echo "ESMF is NOT supported by the CCSM project, but by the ESMF core team in NCAR/SCD"; \
echo "See http://www.esmf.ucar.edu"; \
gmake -j 1 BOPT=O ESMF_BUILD=/share/home/00671/tobis/CAM3Bld/build/esmf ESMF_DIR=/share/home/00671/tobis/CAM3Bld/cam1/models/utils/esmf ESMF_ARCH=;
Build the ESMF library.
ESMF is NOT supported by the CCSM project, but by the ESMF core team in NCAR/SCD
See http://www.esmf.ucar.edu
gmake[1]: Entering directory `/share/home/00671/tobis/CAM3Bld/cam1/models/utils/esmf'

makefile:15: /share/home/00671/tobis/CAM3Bld/cam1/models/utils/esmf/build//base: No such file or directory
make[1]: *** No rule to make target `/share/home/00671/tobis/CAM3Bld/cam1/models/utils/esmf/build//base'. Stop.

OK, so a couple of clues here. ESMF_ARCH=; and share/home/00671/tobis/CAM3Bld/cam1/models/utils/esmf/build//base

So THE LIST OF CAM ENVIRONMENT VARIABLES EXCLUDES ESMF! (You have to guess that. Maybe "ESMF is NOT supported by the CCSM project, but by the ESMF core team in NCAR/SCD" is supposed tp be helpful.)Next, look in cam1/models/utils/esmf/build.

Here we see

alpha common_g config Darwin_xlf linux_altix linux_pathscale rs6000_sp
base_variables.defs common_O cray_x1 ES linux_gnupgf90 linux_pgi solaris
common common_variables cray_x1_ssp IRIX linux_intel README solaris_hpc
common_ conf.defs Darwin_absoft IRIX64 linux_lf95 rs6000_64 SX6

README is singularly unhelpful:
# $Id: README,v 2002/04/27 15:38:57 erik Exp $
The build directory contains all the base makefiles that are
included in your actual makefile. See the users manual for a
description of all the flags and rules in the makefiles.

It doesn't even give me a clue where to find "the user's manual". A person not up on NCAR politics might assume this was the CAM or CCSM manual.

So candidates in my case are linux_pgi and linux_gnupgf90. Looking inside the latter, it calls gnucc, which I think I don't want. so

set ESMF_ARCH linux_pgi
cd ~/cd CAM3Bld/
mv build buildx01
mkdir build
cd build

The only diff at this point is two new files in the old directory left over from the failed buiuld, so the re-run of configure was not needed.

Make gets a lot further.

Now... gets past all the ESMF stuff (lots of it, which as we know has little purpose) and fails at

mpif90 -c -DHIDE_MPI /share/home/00671/tobis/CAM3Bld/cam1/models/atm/cam/src/control/string_utils.F90
mpif90 -c -DHIDE_MPI /share/home/00671/tobis/CAM3Bld/cam1/models/csm_share/shr/shr_kind_mod.F90
mpif90 -c -DHIDE_MPI /share/home/00671/tobis/CAM3Bld/cam1/models/csm_share/shr/shr_mpi_mod.F90
mpif90 -c -DHIDE_MPI /share/home/00671/tobis/CAM3Bld/cam1/models/csm_share/shr/shr_sys_mod.F90
mpif90 -c -DHIDE_MPI /share/home/00671/tobis/CAM3Bld/cam1/models/atm/cam/src/control/mpishorthand.F
PGF90-F-0226-Can't find include file misc.h (/share/home/00671/tobis/CAM3Bld/cam1/models/atm/cam/src/control/mpishorthand.F: 1)
PGF90/x86-64 Linux 7.2-5: compilation aborted

but, but, but misc.h is right there in the build directory. Why is PGF90 not looking in the current directory for includes?

Charles suggests using straight pgf90 rather than mpif90. This did nothing; then I thought to rerun configure. Haha! It built. First day out!

This is a serial CAM though. Not too much use. But interesting. if it can builf under gnu in serial, it could run on a Mac, couldn't it?

However, must rebuild for parallel run.


setenv USER_CC pgcc
setenv USER_FC mpif90
module load netcdf
setenv INC_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/include
setenv LIB_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/lib/
setenv INC_MPI
setenc LIB_MPI
mkdir buildpar
cd buildpar
../cam1/models/atm/cam/bld/configure -spmd

This fails because INC_MPI is being ignored, as well as the current working directory

The user guide says exatly nothing about command live invocation. Cannot find a PGI document that does. The man page says

Add directory to the compiler's search path for include files. For include files surrounded by < >, each -I
directory is searched followed by the standard area. For include files surrounded by " ", the directory containing
the file containing the #include directive is searched, followed by the -I directories, followed by the standard

So is it

-I dir1 -I dir2
-I dir1 dir2
or what?

I already set MPI_INC per instructions. SO why is that ignored??

The makefile has

ifeq ($(SPMD),TRUE)
LDFLAGS += -L$(LIB_MPI) -lmpi


By the time the routine in question comes arounf the FFLAGS are lost.

I hate Makefiles anyway.


# setenv USER_CC pgcc
# setenv USER_FC mpif90
module load netcdf
setenv INC_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/include
setenv LIB_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/lib/
#setenv INC_MPI
#setenv LIB_MPI
mkdir buildpar
cd buildpar
../cam1/models/atm/cam/bld/configure -spmd

** Cannot find mpif.h in specified directory: /usr/local/include

setenv USER_CC pgcc
setenv USER_FC mpif90
module load netcdf
setenv INC_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/include
setenv LIB_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/lib/
#setenv INC_MPI
#setenv LIB_MPI
mkdir buildpar
cd buildpar
../cam1/models/atm/cam/bld/configure -spmd


#setenv USER_CC pgcc
#setenv USER_FC mpif90
unsetenv USER_CC
unsetenv USER_FC
module load netcdf
setenv INC_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/include
setenv LIB_NETCDF /opt/apps/pgi7_2/netcdf/3.6.2/lib/
setenv INC_MPI /opt/apps/pgi7_2/mvapich/1.0.1/include
setenv LIB_MPI /opt/apps/pgi7_2/mvapich/1.0.1/lib
#unsetenv INC_MPI
#unsetenv LIB_MPI
mkdir buildpar
cd buildpar
../cam1/models/atm/cam/bld/configure -spmd

Builds but fails to link:

pgf90 -o /share/home/00671/tobis/CAM3Bld/buildpar/cam BalanceCheckMod.o BareGroundFluxesMod.o Biogeophysics1Mod.o Biogeophysics2Mod.o BiogeophysicsLakeMod.o CanopyFluxesMod.o DGVMAllocationMod.o DGVMEcosystemDynMod.o DGVMEstablishmentMod.o DGVMFireMod.o DGVMKillMod.o DGVMLightMod.o DGVMMod.o DGVMMortalityMod.o DGVMReproductionMod.o DGVMRestMod.o DGVMTurnoverMod.o DriverInitMod.o FracWetMod.o FrictionVelocityMod.o Hydrology1Mod.o Hydrology2Mod.o HydrologyLakeMod.o QSatMod.o RtmMod.o RunoffMod.o STATICEcosysDynMod.o SnowHydrologyMod.o SoilHydrologyMod.o SoilTemperatureMod.o SurfaceAlbedoMod.o SurfaceRadiationMod.o TridiagonalMod.o VOCEmissionMod.o abortutils.o acbnd.o accFldsMod.o accumulMod.o advnce.o aer_optics.o aerosol_intr.o albice.o albocean.o areaMod.o atm_lndMod.o atmdrvMod.o bandij.o basdy.o basdz.o basiy.o bilin.o binary_io.o bnddyi.o bndexch.o buffer.o caer.o caerbnd.o cam.o camice.o camoce.o carbon_intr.o carbonscales.o ccsm_msg.o check_energy.o chem_surfvals.o chemistry.o cldconst.o cldinti.o cldsav.o cldwat.o clm_csmMod.o clm_varcon.o clm_varctl.o clm_varpar.o clm_varsur.o clmtype.o clmtypeInitMod.o cloud_fraction.o cloudsimulator.o cmparray_mod.o comhd.o commap.o comspe.o comsrf.o comsrfdiag.o constituents.o controlMod.o convect_deep.o convect_shallow.o courlim.o cpslec.o cubxdr.o cubydr.o cubzdr.o dadadj.o datetime.o decompMod.o decompinit.o diag_dynvar_ic.o diagnostics.o difcor.o diffusion_solver.o dmsbnd.o do_close_dispose.o do_restwrite.o dp_coupling.o driver.o drydep_mod.o dust.o dust_intr.o dust_sediment_mod.o dycore.o dyn.o dyn_grid.o dynconst.o dyndrv.o dynpkg.o engy_tdif.o engy_te.o error_messages.o esinti.o extx.o extys.o extyv.o f_wrappers.o fft99.o filenames.o fileutils.o filterMod.o flxint.o flxoce.o gauaw_mod.o geopotential.o get_memusage.o getdatetime.o gffgch.o ghg_defaults.o gptl.o gptl_papi.o gptlutil.o grcalc.o grdxy.o grmult.o gw_drag.o hb_diff.o hdinti.o herxin.o heryin.o herzin.o histFileMod.o histFldsMod.o history.o hk_conv.o hordif.o hordif1.o hrintp.o hycoef.o icarus_scops.o ice_constants.o ice_data.o ice_dh.o ice_diagnostics.o ice_globalcalcs.o ice_kinds_mod.o ice_ocn_flux.o ice_sfc_flux.o ice_srf.o ice_tstm.o infnan.o iniTimeConst.o iniTimeVar.o inicFileMod.o inidat.o initGridCellsMod.o inital.o initcom.o initext.o initializeMod.o initindx.o inti.o intp_util.o ioFileMod.o iobinary.o iop.o kdpfnd.o lagyin.o lcbas.o lcdbas.o limdx.o limdy.o limdz.o linebuf_stdout.o linemsdyn.o lininterp.o lnd2atmMod.o lp_coupling.o marsaglia.o massfix.o mkglacier.o mkgridMod.o mklai.o mklanwat.o mkpft.o mkrank.o mksoicol.o mksoitex.o mksrfdatMod.o mkurban.o molec_diff.o mpiinc.o mpishorthand.o nanMod.o ncdio.o ncdio_atm.o omcalc.o ozone_data.o param_cldoptics.o pdelb0.o pft2colMod.o pftvarcon.o phcs.o phys_adiabatic.o phys_buffer.o phys_gmean.o phys_grid.o phys_idealized.o physconst.o physics_types.o physpkg.o pkg_cld_sediment.o pkg_cldoptics.o plevs0.o pmgrid.o ppgrid.o prescribed_aerosols.o print_coverage.o print_memusage.o prognostics.o program_csm.o program_off.o pspect.o qmassa.o qmassd.o qneg3.o qneg4.o quad.o quicksort.o rad_constituents.o radae.o radheat.o radiation.o radlw.o radsw.o ramp_scon.o readinitial.o realloc4.o realloc7.o reordp.o restFileMod.o restart.o restart_dynamics.o restart_physics.o rgrid.o rstwr.o rtcrate.o runtime_opts.o scan2.o scandyn.o scanslt.o scm0.o scyc.o seasalt_intr.o settau.o sgexx.o shr_alarm_mod.o shr_cal_mod.o shr_const_mod.o shr_date_mod.o shr_file_mod.o shr_kind_mod.o shr_mpi_mod.o shr_msg_mod.o shr_orb_mod.o shr_sys_mod.o shr_timer_mod.o shr_vmath_fwrap.o shr_vmath_mod.o snowdp2lev.o soxbnd.o spegrd.o spetru.o sphdep.o spmdGathScatMod.o spmdMod.o spmd_dyn.o spmd_phys.o spmd_utils.o spmdinit.o srchutil.o srfoce.o srfxfer.o sst_data.o stats.o stepon.o stratiform.o string_utils.o subgridAveMod.o sulbnd.o sulchem.o sulemis.o sulfur_intr.o surfFileMod.o swap_comm.o system_messages.o tfilt_massfix.o threadutil.o time_manager.o timeinterp.o tphysac.o tphysbc.o tphysidl.o tracers.o tracers_suite.o trb_mtn_stress.o trjmps.o trunc.o tsinti.o tstep.o units.o upper_bc.o vertical_diffusion.o vertinterp.o virtem.o volcanicmass.o volcemission.o volcrad.o vrtmap.o wetdep.o wrap_mpi.o wrap_nf.o wv_saturation.o xqmass.o zenith.o zm_conv.o -L/opt/apps/pgi7_2/netcdf/3.6.2/lib/ -lnetcdf -L/share/home/00671/tobis/CAM3Bld/buildpar/esmf/lib/libO/linux_pgi -lesmf -L/opt/apps/pgi7_2/mvapich/1.0.1/lib -lmpich

/opt/apps/pgi7_2/mvapich/1.0.1/lib/libmpich.a(dreg.o): In function `flush_dereg_mrs_external':
dreg.c:(.text+0x3b8): undefined reference to `ibv_dereg_mr'
/opt/apps/pgi7_2/mvapich/1.0.1/lib/libmpich.a(dreg.o): In function `dreg_new_entry':
dreg.c:(.text+0xa23): undefined reference to `ibv_reg_mr'
dreg.c:(.text+0xa4a): undefined reference to `ibv_reg_mr'

and many more, all related to links defined in mvapich

Now, is the linker actually looking in LIB_MPI ?

Have to cut and paste that mess into a file and grep for it. Sure is. Right at the end there.