Quantcast
Channel: Recent posts
Viewing all articles
Browse latest Browse all 415

Memory Leak in OpenMP Task parallel application with Ifort 17

$
0
0

I am trying to parallelize an algorithm using DAG-Scheduling via OpenMP tasks and there many programs are killed by the Linux kernel due to Out-Of-Memory after a calls to the parallelized code although the allocated memory is only 1% of the servers main memory. But this happens only if I use the Intel Compilers from 2015, 2016 or even the new 2017 edition.

Here is a small example building the same task dependency graph as the algorithm crashing:

    PROGRAM OMP_TASK_PROBLEM
        IMPLICIT NONE

        INTEGER M, N
        PARAMETER(M = 256, N=256)
        DOUBLE PRECISION X(M,N)

        X(1:M,1:N) = 0.0D0

        CALL COMPUTE_X(M, N, X, M)

        ! WRITE(*,*) X (1:M, 1:N)

    END PROGRAM


    SUBROUTINE COMPUTE_X(M,N, X, LDX)
        IMPLICIT NONE
        INTEGER M, N, LDX
        DOUBLE PRECISION X(LDX, N)
        INTEGER K, L, KOLD, LOLD


        !$omp parallel default(shared)
        !$omp master
        L = 1
        DO WHILE ( L .LE. N )
            K = M
            DO WHILE (K .GT. 0)
                IF ( K .EQ. M .AND. L .EQ. 1) THEN
                    !$omp  task depend(out:X(K,L)) firstprivate(K,L) default(shared)
                    X(K,L) = 0
                    !$omp end task
                ELSE IF ( K .EQ. M .AND. L .GT. 1) THEN
                    !$omp  task depend(out:X(K,L)) depend(in:X(K,LOLD)) firstprivate(K,L,LOLD) default(shared)
                    X(K,L) = 1 + X(K,LOLD)
                    !$omp end task
                ELSE IF ( K .LT. M .AND. L .EQ. 1) THEN
                    !$omp  task depend(out:X(K,L)) depend(in:X(KOLD,L)) firstprivate(K,L,KOLD) default(shared)
                    X(K,L) = 2 + X(KOLD,L)
                    !$omp end task
                ELSE
                    !$omp  task depend(out:X(K,L)) depend(in:X(KOLD,L),X(K,LOLD)) firstprivate(K,L,KOLD, LOLD) default(shared)
                    X(K,L) = X(KOLD, L) + X(K,LOLD)
                    !$omp end task
                END IF

                KOLD = K
                K = K - 1
            END DO

            LOLD = L
            L = L + 1
        END DO
        !$omp end master
        !$omp taskwait
        !$omp end parallel
    END SUBROUTINE

After compiling it using `ifort -qopenmp -g omp_test.f90` and running it via `valgrind` it reports:

==23255== 1,048,576 bytes in 1 blocks are possibly lost in loss record 22 of 27
==23255==    at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23255==    by 0x516AF47: bget(kmp_info*, long) (kmp_alloc.c:741)
==23255==    by 0x516AC6D: ___kmp_fast_allocate (kmp_alloc.c:2012)
==23255==    by 0x51CFAC7: __kmp_task_alloc (kmp_tasking.c:997)
==23255==    by 0x51CFA36: __kmpc_omp_task_alloc (kmp_tasking.c:1134)
==23255==    by 0x40367E: compute_x_ (omp_task_problem.f90:31)
==23255==    by 0x51DC412: __kmp_invoke_microtask (in /scratch/software/intel-2017/compilers_and_libraries_2017.0.098/linux/compiler/lib/intel64_lin/libiomp5.so)
==23255==    by 0x51AC186: __kmp_invoke_task_func (kmp_runtime.c:7055)
==23255==    by 0x51AD229: __kmp_fork_call (kmp_runtime.c:2361)
==23255==    by 0x5184EE7: __kmpc_fork_call (kmp_csupport.c:339)
==23255==    by 0x4034D3: compute_x_ (omp_task_problem.f90:24)
==23255==    by 0x4030CC: MAIN__ (omp_task_problem.f90:10)
==23255==
==23255== 1,048,576 bytes in 1 blocks are possibly lost in loss record 23 of 27
==23255==    at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==23255==    by 0x516AF47: bget(kmp_info*, long) (kmp_alloc.c:741)
==23255==    by 0x516AC6D: ___kmp_fast_allocate (kmp_alloc.c:2012)
==23255==    by 0x51CE30A: __kmp_add_node (kmp_taskdeps.cpp:204)
==23255==    by 0x51CE30A: __kmp_process_deps (kmp_taskdeps.cpp:320)
==23255==    by 0x51CE30A: __kmp_check_deps (kmp_taskdeps.cpp:365)
==23255==    by 0x51CE30A: __kmpc_omp_task_with_deps (kmp_taskdeps.cpp:523)
==23255==    by 0x403A11: compute_x_ (omp_task_problem.f90:35)
==23255==    by 0x51DC412: __kmp_invoke_microtask (in /scratch/software/intel-2017/compilers_and_libraries_2017.0.098/linux/compiler/lib/intel64_lin/libiomp5.so)
==23255==    by 0x51AC186: __kmp_invoke_task_func (kmp_runtime.c:7055)
==23255==    by 0x51AD229: __kmp_fork_call (kmp_runtime.c:2361)
==23255==    by 0x5184EE7: __kmpc_fork_call (kmp_csupport.c:339)
==23255==    by 0x4034D3: compute_x_ (omp_task_problem.f90:24)
==23255==    by 0x4030CC: MAIN__ (omp_task_problem.f90:10)

 

The line numbers of the `compute_x_` function in the backtrace correspond to the `!$omp task` statements. These memory leaks accumulated rapidly to an amount of memory such that the program crashes.

Using gcc-6.2 for this `valgrind` ends up with:

   ==21246== LEAK SUMMARY:
    ==21246==    definitely lost: 0 bytes in 0 blocks
    ==21246==    indirectly lost: 0 bytes in 0 blocks
    ==21246==      possibly lost: 8,640 bytes in 15 blocks
    ==21246==    still reachable: 4,624 bytes in 4 blocks
    ==21246==         suppressed: 0 bytes in 0 blocks
    ==21246==

where the leaks are only from the first initialization of the OpenMP runtime system.

So my question is: Why does the Intel Compiler/Intel OpenMP runtime system produce theses leaks or alternatively is there an error in the way I have
designed the task parallelism.

 


Viewing all articles
Browse latest Browse all 415

Trending Articles