TOUGH2-MP : simulations halted after 1 day simulations
Hi every body,
I'm using TOUGH2-MP version 2.01 Under Linux (Debian). I've did some changes in the code and compile it without any problem (named later as testv2.1). Calculations are done for a sample problem of 55000 elements during 13 hours without any problem with 64 CPUs. However, when I'm trying with another sample problem of 90483 elements, simulations halted after 10 min without finishing the job and gave the message error shown below (even with 4 CPUs):
-------------------------------------------------------------------------------------------------------------
 $ mpirun -n 4 ./testv2.1
   nex2adj:: rank=           0  MNCON=NCON=      265472  MNEL=NEL=        90483
  Calling METIS (Version5)METIS_Kway              
   rank=           0  done calling Metis istatus=           1  edgecut=        6896
    0sending to pid=   1   22438   90484   22438  151973  151974   22437
    1recving           1   22438   90484   22438  151973  151974   22437
    0sending to pid=   2   22656   90484   22656  153425  153426   22655
    2recving           2   22656   90484   22656  153425  153426   22655
    0sending to pid=   3   22667   90484   22667  157812  157813   22666
    3recving           3   22667   90484   22667  157812  157813   22666
  time for preprocessing ---- including data input, partition, distribution:   36.350855827331543    
 *** Error in `./testv2.1': double free or corruption (out): 0x00000000017d7000 *** 
Program received signal SIGABRT: Process abort signal.
 Backtrace for this error:
 #0  0x7F51AF0C4407
 #1  0x7F51AF0C4A1E
 #2  0x7F51AE3C41BF
 #3  0x7F51AE3C4147
 #4  0x7F51AE3C5527
 #5  0x7F51AE402293
 #6  0x7F51AE407A6D
 #7  0x7F51AE408775
 #8  0x4D5A0A in AZ_fact_ilut
 #9  0x4A9743 in AZ_factor_subdomain
 #10  0x4E278E in AZ_domain_decomp
 #11  0x4D8EFC in AZ_precondition
 #12  0x4C0E7D in AZ_pbicgstab
 #13  0x4A40EE in AZ_oldsolve
 #14  0x4A4BED in AZ_solve
 #15  0x445FA5 in lineq_
 #16  0x44C84B in cycit_
 #17  0x427686 in MAIN__ at tough2.01-mp.f90:? 
----------------------------------------
I think that the problem comes from AZTEC solver. May I have to make simulations with the file PARAL.prn by changing default options in Aztec solver ? is-it a problem of memory ?
Thanks for your help
5 replies
- 
  Hello there, I got similar error (screenshot attached). I wonder if you figured out what the problem is? Thanks, Haiyan 
- 
  The bug for AZTEC has been fixed long time ago. You seems have different problem (The AZTEC bug problem halts at the first time step, but your model runs to time step 155). This may be caused by a memory leaky in a subroutine for co2 properties. Please try continue the simulation by using SAVE as the initial condition. 
- 
  Hi Kenny, Thanks for looking into this. I restarted the simulation but got the similar error immediately (screen attached). I also attached the input file as well as the INCON and GENER files. This is the command I used to run the model: mpiexec -n 16 tough3-eos7c INFILE_6w.txt OUTPUT_6wr Regards, Haiyan 
- 
  You did not specify the linear solver in the input file. The program picks up AZTEC for solving linear equations, but it does not work properly for your model. Please use PETSC solver to solve your problem (insert SOLVR keyword with 8 input for solver) 

 
        