Compiling TOUGHREACT V3.0-OMP using Cygwin and gfortran
I have so far been able to compile all EOS modules of TOUGHREACT V3.0-OMP using gfortran and the Cygwin Windows utility. This was done using the following options: "-mcmodel=large -fstack-arrays -fdefault-real-8 -fdefault-integer-8 -fdefault-double-8 -fompenmp -Wl,--stack,0x80000000"
However, when I attempted to run the executable for the first example problem using eos9 (P1_eos9_nacl_v3omp), I received the following error message:
"Program received signal SIGILL: Illegal instruction"
Using the gdb debugger tool, this error appears to occur on line 267 of the source file t2f_v3.f, which reads:
" 1700 NM=1".
It is preceded by a continuation line composed of lines 258 and 261. If I'm reading the gdb output correctly, it appears the instruction on line 248 "IF(WORD.EQ.VER(K)) GOTO 920" bypasses the first portion of the continuation line (258), which is:
" 920 GOTO(1000,1100,1200,1300,1900,1450,5019,1500,1600,1700,1800,"
and goes straight to the second portion (261), which is:
And this returns the illegal instruction error.
Does anyone have an idea what I can do about this? Thanks!
On second thought, the code execution may not be skipping line 258. Rather, gdb may only report the end of the continuation line, rather than showing each component. It looks like the code likely runs as expected up to this point, where the first value of K is 10, which proceeds to attempt to read from the ROCKS block of flow.inp. This is where the issue pops up. As far as I can tell, no other operations occur between the continuation line #258/261 and line 267(" 1700 NM=1"), but this is where the error message occurs.Reply
After starting from scratch from the original source code, I was able to compile it and execute the example problem "P1_eos9_nacl_v3omp"
I did this by running the command:
"gfortran -O3 -fstack-arrays -fdefault-integer-8 -fdefault-real-8 -fdefault-double-8 -malign-double -fopenmp -Wl,--stack,0x80000000"
This leads me to think that I may have inadvertently altered one of the source files in a way that cause the mysterious error I reported earlier. I apologize if anyone spent much time trying to get to the bottom of this.
A couple of additional notes:
1. On my system, I was only able to build an executable with a maximum number of elements (MNEL) of 50000 and maximum number of connections (MNCON) of 200000. This is a little less than half of what was originally written in the file flowpar_v3.inc that was provided from the LBL Software Center download.
2. I've yet to check the validity of the output files for this example problem, nor have I run the other example problems. Once I do, I'll post another update.Reply
On Windows, the maximum number of grid blocks I've been able to compile TOUGHREACT is about 64,000. I've never tried Cygwin, but it sounds like it is behaving like Windows. You may be able to increase the problem size by increasing the stack size -Wl,--stack,0x80000000 to 0x120000000. Also,, if you reduce the chemical system dimensions (maximum number of species, minerals, etc.) in chempar_v3.inc, then you will be able to increase the number of grid blocks and connections. On Linux or Mac, it can compiled to 1 million grid blocks and over 3 million connections.
One comment: I would not trust that all problems will run with -O3. It is safer to use -O2, which in some cases may be slightly slower, but in other cases in can be faster because -O3 can lead to convergence problems.
Thanks for the advice! I'll be sure to recompile with -O2. So far, I've been able to run problems P1 - P11 with -O3. P12 is running now and has been for a while. Perhaps it's running into the convergence issues you mentioned.
As for the stack size, I think I've maxed it out. I had already tried increasing it explicitly in the --stack option like you suggested. I also ran the command "ulimit -s unlimited" to maximize the default stack size. Neither of these tactics appeared to increase the max problem size. I'll stick with what I have for now and migrate to Linux or Mac if necessary.
I wanted to provide a couple of more updates:
- P12 appears to have run successfully, but took quite a while longer to complete. After digging into the input files, I could see why. Is there any documentation for this problem? I was able to find info from P1 to P11 in the Example Problem documentation, but nothing about P12.
- I have dual-booted my machine with Linux-Ubuntu, and I've since been able to increase the max problem size to 1 million elements/3 million connections. If anyone else tries this, I have a couple of suggestions:
- Rather than set your stack limit in the compile options, you should simply set it in the terminal window with the command "ulimit -s unlimited" to max out the stack size based on your computer's capability, or by replacing "unlimited" with a set stack size in kbytes.
- The inclusion of the "-mcmodel=medium" option significantly increased the maximum number of elements and connections. In fact, by not including it in the Linux environment, I was running into the same limit as I was with Windows/Cygwin, even when that option was included. It makes me wonder if the "-mcmodel=medium" or "-mcmodel=large" options have much effect in Windows.
Problem 12 is not documented. I included it since it was part of a series of ECO2n test problems that were passed on from Version 2. It is a larger problem that benefits greatly from the OpenMP parallelization. Since it was never documented, I dropped it from V3.3-OMP which was just released, and replaced it with a gas species injection problem.
Yes, on Linux it is not a problem to compile for 1 million grid blocks/3 million connections, which requires -mcmodel-medium, and ulimit -s unlimited. On Mac it is also easy to compile or 1 million grid blocks, with a different Intel Fortran option (in supplied makefiles). I've never been able to get it to work for large problems with Windows. I think it is the 2 GB limit with Windows. I rarely use (actually avoid at all costs!) Windows, so if anyone can figure out how to compile for a larger problem let me know.By reducing the maximum dimensions of the chemical system (in chempar_v3.inc), larger problems can be run, but probably not more than 200,000 grid blocks.
Thanks for passing along your tips to the group.
So I have a not-so-good update on my last post. It appears that, while gfortran compiles TOUGHREACT with very large physical domains (1 million elements/3 million connections), I run into a "Segmentation Fault" runtime error up to about 170000 maximum elements & 650000 maximum connections. While this is still about 3x more than I could get with the Cygwin compilation, it doesn't quite approach the 1000000 elements/3000000 connections I could compile.
I did a little digging into this, and found this could be caused by implementation of this code in OpemMP. I ran across this site: http://whoochee.blogspot.com/2009/11/segmentation-fault-while-using-openmp.html .
It states the problem could be that "all the variables declared in the parallel region will go into the process stack", whereas the mother process stores them in global memory.
It states that one possible solution is to use the keyword "save" when declaring all variables that should be shared among all of the processes.
Not knowing the makeup of the source code well enough to know if this is already done, or where the right places would be to include this, I thought I'd reach out to you guys to see if this strategy is already implemented. If not, could it be applicable to the TOUGHREACT source code?
Unfortunately, this is an issue with running Cygwin under Windows. The Windows 2GB static memory limit still applies. Just to make sure, I just started a 3-D EOS1 reactive transport problem with over 600,000 grid blocks on my Mac, and it has been running fine. So, if you have a PC, and want to run a very large problem, you would need to partition the disk and install Linux (Ubuntu is typical now), rather than using Cygwin under Windows.
Here are the problem specifications I am running now:
MNEL = 999999 MNCON =3000000 MNEQ = 4 MNK = 3 MNPH = 3 MNB = 12 MNOGN = 2000 MGTAB = 40000
600002 grid blocks, 1778002 connections
47 rock types
13 primary chemical components
2 equilibrium and 5 kinetic minerals
1 gas speciesReply
I guess I didn't read your earlier message, because now I see that you did use Ubuntu Linux! Maybe the memory on your machine is not large enough? I have 64GB on my Mac, and 128 GB on my Linux cluster. Gfortran is sometimes a little flakey. It seems that every time I get it to work, and then I install a newer version, then it doesn't work. I can only attest to TOUGHREACT V3.0-OMP working with Intel Fortran on Linux, since it yields much faster code than gfortran. I usually just do some basic testing with gfortran to catch any bugs that Intel Fortran misses. We just released TOUGHREACT V3.32-OMP, which has several improvements over V3.0-OMP, but there aren't any changes that would cause it to work differently in terms of the OpenMP memory allocations.
Yeah, the topic heading, which mentions compiling TOUGHREACT using Cygwin, doesn't help clarify some of the efforts I made setting up my machine with Ubuntu.
It's likely that memory shortage is my issue. I'm running this on my personal laptop that has 8 GB of memory. I have a desktop at my job that has 128 GB of RAM, which I just set up with Ubuntu. I'm going to see how much I can scale up my domain sizes with that and post an update later on.
As far as using the keyword "save" during variable declarations goes, do you think this could help? I noticed a lot of COMMON blocks in the source code. Would it be helpful to replace them with MODULEs? At the very least, it could cut back on the number of repetitive variable entries. Obviously, the source code is massive, so it wouldn't be a trivial effort to convert those, but do you think it could have an effect on its performance?Reply
Not sure whether this applies, but most compilers have a flag that allows you to SAVE all local variables (without the need to go into the source code). Note that it does not affect COMMON blocks, just local variables. Here is what I found for gfortran:
Treat each program unit (except those marked as RECURSIVE) as if the "SAVE" statement were specified for every local variable and array referenced in it. Does not affect common blocks. (Some Fortran compilers provide this option under the name -static or -save.) The default, which is -fautomatic, uses the stack for local variables smaller than the value given by -fmax-stack-var-size. Use the option -frecursive to use no static memory.
Disregard if irrelevant.
I compiled the TOUGHREACT code using my desktop workstation with 128 GB of RAM, and it appears to compile and run simulations with MNEL=1e6 and MNCON=6e6. Thus far, I've only run it with example P1, which has way fewer than MNEL elements and MNCON connections, but this is still an improvement from the executables built with my laptop. Note that I did have to change the "-mcmodel=medium" option to "-mcmodel=large" to get there.
I'll have to try out your suggestion. Note that the '-fopenmp' option that TOUGHREACT-v3-OMP implies the '-frecursive' option, which gets overwritten when using '-fno-automatic'.
I think the main issue I'm trying to address, though, is in the COMMON blocks, which generate large arrays with dimension lengths based on the maximum number of elements MNEL and maximum number of connections MNCON. It's my understanding that when variables that are shared across threads are declared this way, it adds each of them to the thread's stack rather than in global memory, where they would be stored if declared using MODULEs. I'm not adept enough in FORTRAN coding or the particulars of the TOUGHREACT source code, so I don't know if this would help or not.Reply
Unfortunately, this is a uncommon case where that particular problem I can't send out. At some point I'll create a large problem and add it to the Sample Problems. Although I updated TOUGHREACT to run up to 1 million grid blocks, most of my simulations are less than 200K grid blocks. With 128 Gbytes, I think you should be able to run at least 800K grid blocks, since I ran 600K grid blocks with 64 Gbytes.