0

Problem with running the sample (r2dl) using TOUGH3/TMVOC

Hi, everyone,

 

I had been using TOUGH2 family code. Recently, I began to use TOUGH3, particularly using TMVOC module for my research.  I firstly tested the Problem No. 7 (*r2dl*) - NAPL Spill in the Unsaturated Zone using TOUGH3/TMVOC. The problem is run in four segments, (1) generation of steady flow prior to introduction of NAPL, (2) NAPL spill in the unsaturated zone, (3) redistribution of NAPL, and (4) extraction simultaneously in the saturated and unsaturated zones.

The input files corresponding to the above four segemnts are provided in the distributed packages (sample directory). It is surprised to find the running errors:

(1)  There are running errors for step 2 and 4, however, the results for step 1 and step3 are correct comparing those using TOUGH2/TMVOC. It is noted that for step3, I used the outputs file for step 2 from TOUGH2/TMVOC as the initial condiction for step3 using TOUGH3/TMVOC.

(2) Attached please find the main inputfile for step2. The error is:

 +++++++++ CONVERGENCE FAILURE ON TIME STEP #  59 WITH DT = 0.200000E+02 SECONDS, FOLLOWING TWO STEPS THAT CONVERGED ON ITER = 1
           STOP EXECUTION AFTER NEXT TIME STEP
 +++++++++ REDUCE TIME STEP AT (   59, 9) ++++++++ ++++++++   NEW DELT =0.500000E+01
 A6111(   59, 1) ST = 0.280040E+07 DT = 0.500000E+01

However, the similar inputfiles are run correctly using TOUGH2/TMVOC.

Note that :

(1) I firstly tested the sample in Windows using the distributed executable file. When I got the above errors, I tested the sample case in Linux platform using the compiling TOUGH3-TMVOC.  I got the same errors in windows and linux.

(2) I also noted the differences between AZTEC and PETSc solvers. To avoid the error possibly from PETSc solver, I tested the sample using serial solvers (serial executable TOUGH3).

Any suggestions to fix this, please?  Thanks very much. 

11 replies

null
    • Yue_Luo
    • 3 yrs ago
    • Reported - view

    the input file for step2

    • yqzhang
    • 3 yrs ago
    • Reported - view

    Hello Yue Luo,

    There are two things:

    1. Please use the SAVE file as the initial condition for r2dl2 since that provides the steady state condition.

    2. Please set MOMOP(1)=2. The total time step you had (999) does not seem to be enough (I used your 999 and it was at half a year when it reached 999). Make it bigger and test it out, let me know.

    MOMOP----1----*----2----*----3----*----4----*----5----*----6----*----7----*----8
    2

     

    Thanks

    Yingqi

    • Yue_Luo
    • 3 yrs ago
    • Reported - view

    Hi, Yingqi,

     

    Thanks a lot. The problem is solved when the MOMOP(1) =2 is added to the inputfile. I have tested step2 and step 4 in Windows and Linux platform according to your kind suggestion.

    (1) I noticed that this MOMOP data block is new comparing with the previous TOUGH2/TMVOC, which provides additional options. When we added MOMOP (1) =2, it means the program would perform at least two iterations; primary variables are always updated (accroding to the TOUGH3 guide). Would you like to explain more about this option? Why we need to add this constrains for step2 and step4? however, this constrain is not needed for step3. 

    Meanwhile, I also noticed that only 118 time steps are used to simulate 365 days using TOUGH2/TMVOC for step2, however, 2785 time steps are used using TOUGH3/TMVOC.

     

    (2) I compared the results from TOUGH2/TMVOC and TOUGH3/TMVOC. There are some slightly differences (taking the mass balance for example)  

     

    a. TOUGH3/TMVOC

                             COMPONENTS MOLES IN PLACE - BY PHASES                                  COMPONENTS IN PLACE - OVERALL

     COMPONENTS              GAS PHASE       AQUEOUS PHASE          NAPL            ADSORBED         TOTAL MOLES     TOTAL MASS (KG)
     WATER                 0.10449618E+03    0.18834684E+08    0.19587078E+01                      0.18834791E+08    0.33932759E+06
     AIR                   0.43468886E+04    0.18673571E+03    0.63915307E-01                      0.45336882E+04    0.13129561E+03
     n-DECANE              0.35713478E+00    0.95120313E-01    0.31860272E+04    0.00000000E+00    0.31864795E+04    0.45339142E+03
     BENZENE               0.64115634E+00    0.26438769E+01    0.10752195E+03    0.00000000E+00    0.11080698E+03    0.86555765E+01
     Toluene               0.11594680E+01    0.38773543E+01    0.58899346E+03    0.00000000E+00    0.59403028E+03    0.54734544E+02
     p-XYLENE              0.57813372E+00    0.18258852E+01    0.87949602E+03    0.00000000E+00    0.88190004E+03    0.93629564E+02
     n-propylbenzene       0.20986126E+00    0.51569879E+00    0.78282600E+03    0.00000000E+00    0.78355156E+03    0.94178980E+02
     n-pentane             0.63970423E+02    0.13199568E-01    0.21903393E+04    0.00000000E+00    0.22543229E+04    0.16265165E+03
     TOTAL VOCS            0.66916177E+02    0.89711352E+01    0.77352040E+04    0.00000000E+00    0.78110913E+04    0.86724174E+03

     *********************************************************************************************************************************

    b. TOUGH2/TMVOC
                             COMPONENTS MOLES IN PLACE - BY PHASES                                  COMPONENTS IN PLACE - OVERALL

     COMPONENTS              GAS PHASE       AQUEOUS PHASE          NAPL            ADSORBED         TOTAL MOLES     TOTAL MASS (KG)
     WATER                 0.10449393E+03    0.18834808E+08    0.19590235E+01                      0.18834915E+08    0.33932982E+06
     AIR                   0.43467221E+04    0.18673752E+03    0.63913415E-01                      0.45335235E+04    0.13129084E+03
     n-DECANE              0.35703551E+00    0.94671422E-01    0.31860844E+04    0.00000000E+00    0.31865361E+04    0.45339947E+03
     BENZENE               0.64252559E+00    0.26239548E+01    0.10756651E+03    0.00000000E+00    0.11083299E+03    0.86576078E+01
     Toluene               0.11616177E+01    0.38425836E+01    0.58908759E+03    0.00000000E+00    0.59409179E+03    0.54740212E+02
     p-XYLENE              0.57870692E+00    0.18046091E+01    0.87956551E+03    0.00000000E+00    0.88194882E+03    0.93634743E+02
     n-propylbenzene       0.20998018E+00    0.50963406E+00    0.78285752E+03    0.00000000E+00    0.78357713E+03    0.94182054E+02
     n-pentane             0.64037746E+02    0.13162394E-01    0.21912892E+04    0.00000000E+00    0.22553402E+04    0.16272505E+03
     TOTAL VOCS            0.66987612E+02    0.88886154E+01    0.77364507E+04    0.00000000E+00    0.78123269E+04    0.86733913E+03

    • Yue_Luo
    • 3 yrs ago
    • Reported - view

    Hi, Yingqi,

    For this case, I also tested the parallel execution using 4 CPUs and got the following errors:

    hydro-0-2
    hydro-0-2
    hydro-0-2
    hydro-0-2
     MESH: NEL=         680  NCON=        1303
    [hydro-0-2:31539] *** An error occurred in MPI_Bcast
    [hydro-0-2:31539] *** reported by process [465436673,2]
    [hydro-0-2:31539] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
    [hydro-0-2:31539] *** MPI_ERR_TRUNCATE: message truncated
    [hydro-0-2:31539] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
    [hydro-0-2:31539] ***    and potentially your MPI job)
    [3]PETSC ERROR: ------------------------------------------------------------------------
    [3]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
    [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
    [3]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
    [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
    [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
    [3]PETSC ERROR: to get more information on the crash.
    [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
    [3]PETSC ERROR: Signal received
    [3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
    [3]PETSC ERROR: Petsc Release Version 3.7.6, Apr, 24, 2017 
    [3]PETSC ERROR: /home/shi_xiaoqing/TOUGH3/esd-tough3/tough3-install/bin/tough3-tmvoc on a arch-linux2-c-opt named hydro-0-2.local by shi_xiaoqing Sun Apr 26 18:03:01 2020
    [3]PETSC ERROR: Configure options --with-debugging=0 --with-shared-libraries=0 --with-x=0 --with-ssl=0 --with-mpi-dir=/share/apps/openmpi-1.10.4 --download-metis=1 --download-parmetis=1 --prefix=/home/shi_xiao
    qing/TOUGH3/esd-tough3/tough3-install/tpls
    [3]PETSC ERROR: #1 User provided function() line 0 in  unknown file
    [hydro-0-2.local:31535] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
    [hydro-0-2.local:31535] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

     

    Can you test this case using parallel execution?  I am confused  for this error becaused I have also tested chen-5 problem using the parallel mode (2 CPUs) and gotten the correct results:

     TOUGH STATUS: Write SAVE file for time =    864000.00000000000     
     EEE  
     EEE  Number of processors =            2
     EEE Time perform model computation  =    1.7212919999146834     
     EEE   of which spent in lin. solv. =    4.9040999729186296E-002
     EEE   and spent on other           =    1.6722510001854971     
     EEE  
     EEE Total number of time steps =          110
     EEE Average time in solver per time step =    4.4582727026532996E-004
     EEE Average time spent on other per time step =    1.5202281819868156E-002
     EEE  
     EEE Total number Newton steps =          172
     EEE Average number of Newton steps per time step   1.5636363636363637     
     EEE Average time per Newton step =    2.8512209144875754E-004
     EEE Average time spent on other per Newton st =    9.7223895359621927E-003
     EEE  
     EEE Total number of iter =         1611
     EEE Average number of iter per call    9.3662790697674421     
     EEE Average time per iter =    3.0441340614020047E-005
     EEE  
     EEE  =============================================
     EEE  
     EEE  

    Any suggests are welcome, thanks very much.

    • yqzhang
    • 3 yrs ago
    • Reported - view

    I agree with you that when the same solver option is used, TOUGH3 should provide the same results as TOUGH2. I am not sure where the difference comes from. That will require some time to investigate. At the same time, I will run the problem in parallel and let you know where the error comes from.

    Thank you for your detailed report.

    Yingqi

      • Yue_Luo
      • 3 yrs ago
      • Reported - view

      Yingqi Zhang  Thanks to your attention. I really appreciate the effort you all made for solving this problem. I am looking forward to your follow-up reply.

      • Mikey_Hannon
      • 3 yrs ago
      • Reported - view

      Yue Luo 

      Yingqi Zhang

      I also ran into this issue when attempting to run from the cluster at our company, which runs on a Linux OS.  I successfully ran problems r2dl1 and r2dl2 in serial (the latter in only 179 time steps after I used the SAVE form r2dl1 as INCON).  However, when I attempted running the problem in parallel (mpiexec -n 4), I get the following error:

      Fatal error in PMPI_Bcast: Message truncated, error stack:
      PMPI_Bcast(2112)..................: MPI_Bcast(buf=0x1f27070, count=21, MPI_DOUBLE_PRECISION, root=0, comm=0x84000000) failed
      MPIR_Bcast_impl(1670).............:
      I_MPIR_Bcast_intra(1887)..........: Failure during collective
      MPIR_Bcast_intra(1461)............:
      MPIR_Bcast_binomial(147)..........:
      MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 2 truncated; 224 bytes received but buffer size is 168
      Fatal error in PMPI_Bcast: Message truncated, error stack:
      PMPI_Bcast(2112)..................: MPI_Bcast(buf=0x1adc070, count=21, MPI_DOUBLE_PRECISION, root=0, comm=0x84000000) failed
      MPIR_Bcast_impl(1670).............:
      I_MPIR_Bcast_intra(1887)..........: Failure during collective
      MPIR_Bcast_intra(1461)............:
      MPIR_Bcast_binomial(147)..........:
      MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 2 truncated; 224 bytes received but buffer size is 168
      rank 1 in job 19  node01.cluster_52385   caused collective abort of all ranks
        exit status of rank 1: return code 1


      It appears to be a mismatch between the messages sent and received from the processors.

    • yqzhang
    • 3 yrs ago
    • Reported - view

    Thank you Mikey and Yue Luo. I apologize for the delay. I have not forgotten the issue. I will let you know once it is solved. Thank you for your patience.

    Yingqi

      • Yue_Luo
      • 3 yrs ago
      • Reported - view

      Yingqi Zhang  Hi, Yingqi, I hope this email finds you well.  How's the issue going?  Please keep me informed. 

    • yqzhang
    • 3 yrs ago
    • Reported - view

    Hi Yue,

    The problem Mikey described is fixed but there were still other problems. Can you send me a private email at yqzhang@lbl.gov

    Thanks

    Yingqi

      • Haiyan_Zhou
      • 1 yr ago
      • Reported - view

      Yingqi Zhang Mikey Hannon

      Hi Yingqi and Mikey,

      I got the same error while running the problem 'r2dl1' in parallel mode. I am able to run the problem in serial mode but got errors in parallel mode (please see below for command and error screenshot). Yingqi mentioned that the problem was fixed. Can you please let me know how? 

      I also tried to run another example 'rblm' in the user's guide. I am able to run it in either serial or parallel mode. so confused.

      The command I use:  mpiexec -n 4 /tough3-install/bin/tough3-tmvoc INFILE_r2dl1 OUTPUT

      The error message

Content aside

  • 1 yr agoLast active
  • 11Replies
  • 199Views
  • 4 Following