hello, I am from IWS Uni Stuttgart. I am new to itough2. I have error while running itough on cluster. see details and attachment.
I am running my simulations on Dumux (Our inhouse simulator) and using itough2 for estimating parameters. I have three parameters (permeability, porosity and relative permeability n).
I am simulating on cluster. In itough input file I have assigned
>>> PARALLEL: 3 (JACOBIAN only) SLEEP: 1
When I start my simulation, it runs the Dumux simulation in the parent folder but in the other folders it give me memory allocation error (see attachment for error_msg). And do not starts the DUMUX simulation.
Steps which i am following to submit my executable
1) I am using script file, which starts the itough2 (itough2 -no_delete XYZ.input dummy 3)
2) I am running DUMUX simulation with executable tag (mpirun -np 80 -hostfile myhostfile Executable >output.log ;) in my itough input file.
3) Then i submit to cluster assigning the following command
(qsub -q xyz -l nodes=30:ppn=8,walltime=100:00:00 Submit_script)
Please help me in this regard. I will be really thankful for your help.
These types of problems are very difficult to resolve remotely. First, try to figure out where the issue is (can you run the mpirun by itself?; Can you do it through iTOUGH2-PEST, i.e., just a single forward run?). Next, check whether can you do it in serial, i.e., without >>> PARALLEL. Then try to find out more about the actual error message you got (it seems you simply do not have enough memory for your DUMUX run, but this would be revealed in your first testing step). Also make sure mpirun starts properly within your scheme.
Note that what you are doing is quite complex, i.e., you try run multiple (MPI) parallel runs in parallel through a scheduler. You have to make sure the mpirun sees the same nodes you assigned when launching iTOUGH2 through qsub, which is not trivial (at a minimum you need to invoke the mpirun option -hostfile with the node names and slots listed in a file, which needs to be created as part of the qsub script, etc., etc.). Again, all too complex to describe in this post, but we have solved theses issues in the past (actually for the first time in Stuttgart - try to get hold of Lena Walter's notes - or of Lena herself!).
Again, I encourage you to track the issue one step at a time as described above. For most steps, I recommend you replace your DUMUX run with a simple script or command (e.g., `date`), which allows you to test most of the steps.
In case you get stuck, I'll be in Stuttgart in a month - we may find the time to resolve it then and tehre.
Thanks a lot for the detailed reply.Today I also managed to talk to Lena, now she is not working in Stuttgart. She will provide me the documentation by Monday.
I will follow the steps suggested by you and will try to narrow down the problem. I will also try to look into the hostfile which Lena has as a backup, maybe it will give me some idea.
It will be great to sit with you and learn something about itough.