Runtime error with TOUGH2-MP (Allocatable variable BCELEM)
Hello everyone,
I'm new to TOUGH2-MP although we have the TOUGH2-code for some time now. We received the TOUGH2-MP code just some weeks ago. However I tried to compile it in on our cluster under Linux and had some difficulties with that.
Also I don't know anything about metis and aztec. I hope compiling metis (4.0.3) and aztec (2.1) worked. I did get the libaztec.a and libmetis.a so I guess everything is correct.
Allocatable variable BCELEM
After successfully compiling TOUGH2-MP (Linux, Intel 12.0-64, OpenMPI 1.4.4) with the Flags
ifort -O0 -fpp -r8 -i4 -check all -g -traceback
I get the following error message during runtime:
forrtl: severe (408): fort: (8): Attempt to fetch from allocatable variable BCELEM when it is not allocated Image PC Routine Line Source t2eos7_mp_intel 00000000007E1A8A Unknown Unknown Unknown t2eos7_mp_intel 00000000007E0605 Unknown Unknown Unknown t2eos7_mp_intel 00000000007922C6 Unknown Unknown Unknown t2eos7_mp_intel 0000000000750CA5 Unknown Unknown Unknown t2eos7_mp_intel 00000000007510F9 Unknown Unknown Unknown t2eos7_mp_intel 00000000005DB2E2 allreplicom_ 2583 Paral_Subs.f t2eos7_mp_intel 00000000004C04FC cycit_ 240 Main_Comp.f t2eos7_mp_intel 00000000004EF633 MAIN__ 477 TOUGH2.f t2eos7_mp_intel 0000000000404A9C Unknown Unknown Unknown libc.so.6 00007F11A016AC36 Unknown Unknown Unknown t2eos7_mp_intel 0000000000404999 Unknown Unknown Unknown
When I use the gcc-4.3.4 instead of the intel compiler with the flags
gfortran -O0 -fdefault-real-8 -i4 -g -fbacktrace -Wall
I just get the error message:
Program received signal 11 (SIGSEGV): Segmentation fault. ../t2eos7_mp_gnu.sh: line 11: 28431 Segmentation fault
Any idea what the problem could be?
Holger
4 replies
-
Holger,
Could you send the errors directly to Noel Keen (ndkeen@lbl.gov)? He can help you in identifying the problem.
George
-
After some time I tried to replicate the error I described above. I did not manage to. Maybe they changed something on our cluster. I don't know. So I can't give a solution to this problem.
Thanks for the help anyways.
Holger
-
After testing different compiler flags last week, I found a solution to my previous post. I also want to present some results that could be useful for both, users and developers. It appears to me that the present code version needs some fixes regarding the occurrence of uninitialized values. It also should be made more robust against the compiler options that are used.
Here are my findings:
EOS7 and EOS7R with Intel compiler (12.0-64, Linux)
The runtime error mentioned in my previous post (Attempt to fetch from allocatable variable BCELEM when it is not allocated) occurs for both, EOS7 and EOS7R, if the flag
-check all
is set. With one of the following flags TOUGH2-MP it runs without error:
ifort -O0 -fpp -r8 -i4 -g -traceback ifort -O0 -fpp -r8 -i4 ifort -O0 -r8 -i4
EOS7R with GCC (gfortran) compiler (4.3.6, Linux)
Segmentation fault independent of the compiler flags I'm using
gfortran -O0 -fdefault-real-8 -i4 gfortran -O0 -fdefault-real-8 -i4 -g -fbacktrace gfortran -O0 -fdefault-real-8 -i4 -g -fbacktrace -Wall
The segmentation fault disappears if the following compiler option is set:
-fno-automatic
EOS7 with GCC (gfortran) compiler (4.3.6, Linux)
NaN in the TOUGH output after second timestep:
36 ( 1, 1) ST = 0.100000E-08 DT = 0.100000E-08 DX1= 0.000000E+00 DX2= 0.000000E+00 T = 25.000 P = 100000. S = 0.100000E+01 1 ( 2, 3) ST = 0.200000E+01 DT = 0.200000E+01 DX1= NaN DX2= NaN T = NaN P = NaN S = 0.100000E-05
The NaN errors disappear if one or two of the following compiler options are set:
-fno-automatic -finit-local-zero
Discussion
Using the gfortran –Wall option I noticed some “unitialized value” warnings. Since EOS7 was executed without error using the Intel compiler this brought up the idea that the two compilers might treat uninitialized values differently (by setting them 0 or NaN).
Solutions were found using the options -fno-automatic and -finit-local-zero. The compiler option -fno-automatic treats each program unit as if the SAVE statement were specified for every local variable and array referenced in it. Consequently, the previously unitialized values remain initialized. The same is true for the compiler option -finit-local-zero, which instructs the compiler to initialize local INTEGER, REAL, and COMPLEX variables to zero.
Using these compiler options, it seems that I get the correct TOUGH behavior with correct calculation results. However, which of both initialization methods (SAVE or =0) is correct cannot be answered from a user perspective. I therefore suggest that the code is revised in order to get a well defined model state independent from the compiler options that are used.
It also should be clarified why the Intel compiler produces a runtime error in debug mode.
I also send an email to Noel Keen with these results.
Holger
-
Hi Holger,
You are right. We always use a compiler flag that initializes all numbers to zero. This initialization issue will be dealt with in the future.