TOUGH2-MP : simulations halted after 1 day simulations
Hi every body,
I'm using TOUGH2-MP version 2.01 Under Linux (Debian). I've did some changes in the code and compile it without any problem (named later as testv2.1). Calculations are done for a sample problem of 55000 elements during 13 hours without any problem with 64 CPUs. However, when I'm trying with another sample problem of 90483 elements, simulations halted after 10 min without finishing the job and gave the message error shown below (even with 4 CPUs):
-------------------------------------------------------------------------------------------------------------
$ mpirun -n 4 ./testv2.1
nex2adj:: rank= 0 MNCON=NCON= 265472 MNEL=NEL= 90483
Calling METIS (Version5)METIS_Kway
rank= 0 done calling Metis istatus= 1 edgecut= 6896
0sending to pid= 1 22438 90484 22438 151973 151974 22437
1recving 1 22438 90484 22438 151973 151974 22437
0sending to pid= 2 22656 90484 22656 153425 153426 22655
2recving 2 22656 90484 22656 153425 153426 22655
0sending to pid= 3 22667 90484 22667 157812 157813 22666
3recving 3 22667 90484 22667 157812 157813 22666
time for preprocessing ---- including data input, partition, distribution: 36.350855827331543
*** Error in `./testv2.1': double free or corruption (out): 0x00000000017d7000 ***
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7F51AF0C4407
#1 0x7F51AF0C4A1E
#2 0x7F51AE3C41BF
#3 0x7F51AE3C4147
#4 0x7F51AE3C5527
#5 0x7F51AE402293
#6 0x7F51AE407A6D
#7 0x7F51AE408775
#8 0x4D5A0A in AZ_fact_ilut
#9 0x4A9743 in AZ_factor_subdomain
#10 0x4E278E in AZ_domain_decomp
#11 0x4D8EFC in AZ_precondition
#12 0x4C0E7D in AZ_pbicgstab
#13 0x4A40EE in AZ_oldsolve
#14 0x4A4BED in AZ_solve
#15 0x445FA5 in lineq_
#16 0x44C84B in cycit_
#17 0x427686 in MAIN__ at tough2.01-mp.f90:?
----------------------------------------
I think that the problem comes from AZTEC solver. May I have to make simulations with the file PARAL.prn by changing default options in Aztec solver ? is-it a problem of memory ?
Thanks for your help
5 replies
-
Hello there,
I got similar error (screenshot attached). I wonder if you figured out what the problem is?
Thanks,
Haiyan
-
The bug for AZTEC has been fixed long time ago. You seems have different problem (The AZTEC bug problem halts at the first time step, but your model runs to time step 155). This may be caused by a memory leaky in a subroutine for co2 properties. Please try continue the simulation by using SAVE as the initial condition.
-
Hi Kenny,
Thanks for looking into this.
I restarted the simulation but got the similar error immediately (screen attached). I also attached the input file as well as the INCON and GENER files. This is the command I used to run the model: mpiexec -n 16 tough3-eos7c INFILE_6w.txt OUTPUT_6wr
Regards,
Haiyan
-
You did not specify the linear solver in the input file. The program picks up AZTEC for solving linear equations, but it does not work properly for your model. Please use PETSC solver to solve your problem (insert SOLVR keyword with 8 input for solver)