Fortran90 + OpenMP Program Hangs on Parallelized Nested Loop
Posted by ducks_over_IP@reddit | learnprogramming | View on Reddit | 3 comments
I have an inherited Fortran90 program that I'm attempting to parallelize using OpenMP. The main issue is a nested loop that calls a computationally expensive subroutine many times (specifically, it calculates the confluent hypergeometric function with complex parameters and argument). I first did some local testing with the following program:
PROGRAM Parallel_Loop_Test
USE OMP_LIB
INTEGER :: numprod(10, 10)
numprod(1,1) = 0
OPEN(10,file='mptest.dat',status='unknown')
! Enclose nestable loop in parallel, then !$OMP DO commands
!$OMP PARALLEL
PRINT *, "Hello from process: ", OMP_GET_THREAD_NUM()
!$OMP DO
DO i=1,10
DO j = 1,10
numprod(i,j) = i*j
ENDDO
ENDDO
!$OMP ENDDO
!$OMP END PARALLEL
! array structure preserves ordering, can do serial write
DO i = 1, 10
DO j = 1, 10
write(10,*) i,j, numprod(i,j)
ENDDO
ENDDO
CLOSE(10)
END
My actual code has 5 nested loops and runs on up to 112 cores; when I attempt to implement the above framework with that code, it never leaves the loop. I can tell this because I made it write to file in the parallelized loop for testing, and even though it writes all the values I expect it to, it never hits the print statement after the loop saying that all the values have been calculated. I suspect I just don't understand something about how OpenMP behaves with nested loops, but I'm having a hard time finding a clear explanation on that front.
esaule@reddit
isn't the data sharing by default in openmp in fortran that variables are shared? So aren't you getting your loops to share the j counter and so the progress is off? So it false shares j and so it technically makes progress but just WAY slower than you think?
Knarfnarf@reddit
Crazy thought; it’s not assigning weird numbers to i,j on that line, right?
ducks_over_IP@reddit (OP)
No, I had it write to file (for the full loop):
i, j, k, l, res(where i, j, k, and l are integer loop indices, and res is the complex result of the gross subroutine call), and sure enough, the output file shows 4 integers then a complex number every line.