Quantcast
Channel: Recent posts
Viewing all articles
Browse latest Browse all 415

Segfault in omp program. Need support from Intel!!

$
0
0

Hi,

one of my programs is crashing when runnig a threaded version. When running it inside gdb the output left me helpless:

[New LWP 397493]

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 397493]
0x0000000001dad557 in _INTERNAL_25_______src_kmp_barrier_cpp_5de9139b::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) ()

gdb bt yielded:

#0  0x0000000001dad557 in _INTERNAL_25_______src_kmp_barrier_cpp_5de9139b::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) ()
#1  0x0000000001dae38b in __kmp_fork_barrier(int, int) ()
#2  0x0000000001d150c0 in __kmp_launch_thread ()
#3  0x0000000001d5d341 in _INTERNAL_26_______src_z_Linux_util_cpp_47afea4b::__kmp_launch_worker(void*) ()
#4  0x0000000001eb3ff7 in start_thread ()
#5  0x0000000001f2507b in clone ()

To get an idea about parts of the structure of the program a code snippet which mimics what the program is doing is given below. However, this is just for examplification, I have not tested whether the snippet will produce the same segfaut.

Module Mod_Root
  Implicit none
  Type :: root
  End type root
End Module Mod_Root
Module Mod_Sigma
  use Mod_Root, only: root
  Implicit None
  Type, abstract, extends(root) :: Sigma
    Real, Pointer, contiguous :: PreMult(:,:), PostMult(:,:)
  contains
    Procedure(SubMult), PAss, Public, Deferred :: Mult
  end type Sigma
  Abstract Interface
    Subroutine SubMult(this)
      Import Sigma
      Class(Sigma), Intent(In) :: this
    End Subroutine SubMult
  End Interface
  Private :: SubMult
End Module Mod_Sigma
Module Mod_Sigma_Type_A
  use Mod_Sigma, only: Sigma
  Type, extends(Sigma) :: Sigma_Type_A
    Real, Allocatable :: Mat(:,:,:)
  contains
    Procedure, Pass, Public :: Mult=>SubMult
  End type Sigma_Type_A
  Private :: SubMult
contains
  Subroutine SubMult(this)
    Implicit None
    Class(Sigma_Type_A), Intent(In) :: this
    Integer :: i
    Do i=1,size(this%Mat,3)
      this%PostMult(i,:)=matmul(this%PreMult(i,:),this%Mat(:,:,i))
    End Do
  End Subroutine SubMult
End Module Mod_Sigma_Type_A
Module Mod_Sigma_Type_B
  use Mod_Sigma, only: Sigma
  Type, extends(Sigma) :: Sigma_Type_B
    Real, Allocatable :: Mat(:,:)
  contains
    Procedure, Pass, Public :: Mult=>SubMult
  End type Sigma_Type_B
  Private :: SubMult
contains
  Subroutine SubMult(this)
    Implicit None
    Class(Sigma_Type_B), Intent(In) :: this
    this%PostMult=matmul(this%PreMult,this%Mat)
  End Subroutine SubMult
End Module Mod_Sigma_Type_B
Module Mod_Struct
  use Mod_Root, only: root
  use Mod_Sigma, only: sigma
  Type,extends(root), abstract :: Struct
    Class(Sigma), Allocatable :: Sigma
  Contains
    Procedure(SubMult), Public, PAss, Deferred :: Mult
  End type Struct
  Type :: StructPt
    CLass(Struct), Pointer :: pt
  end type StructPt
  Abstract interface
    Subroutine SubMult(this)
      Import Struct
      Class(Struct), Intent(InOut), Target :: this
    end Subroutine SubMult
  End interface
End Module Mod_Struct
Module Mod_Struct_A
  use Mod_Struct
  Type, extends(Struct) :: Struct_Type_A
    Real, Allocatable :: Mat1(:,:), Mat2(:,:)
  Contains
    Procedure, Pass, Public :: Mult => SubMultSigma
  End type Struct_Type_A
  Private :: SubMultSigma
contains
  Subroutine SubMultSigma(this)
    Implicit None
    Class(Struct_Type_A), Intent(InOut), Target :: this
    this%Sigma%PreMult=>this%Mat1
    this%Sigma%PostMult=>this%Mat2
    call this%Sigma%Mult()
  End Subroutine SubMultSigma
End Module Mod_Struct_A
Program Test
  use Mod_Struct
  use Mod_Struct_A
  use Mod_Sigma_Type_A
  use Mod_Sigma_Type_B
  Type(Struct_Type_A), Target :: a, b
  Class(StructPt), Allocatable :: x(:)
  Integer :: i
  allocate(Sigma_Type_A::a%sigma)
  allocate(Sigma_Type_B::b%sigma)
  Allocate(x(2))
  x(1)%pt=>a;x(2)%pt=>b
  !$OMP PARALLEL DO PRIVATE(i)
  Do i=1,2
    call x(i)%pt%Mult()
  End Do
  !$OMP END PARALLEL DO
End Program Test

The segfault in my progrram occurs in a location similar to when calling x(i)%pt%Mult, but only if b%sigma has been allocated as type "Sigma_Type_B". If both, a and b, has been allocated as type "Sigma_Type_A", the program runs fine invaribaly of the size of the relevant arrays. Moreover, threaded or unthreaded the pogram always runs when the involved arrays are small. However, when arrays occupy up to 200GB of RAM and different type allocations are used, it crashes.

ifort version is 17.01, linux version is centos 7 kerner 3.10, stack size is set to unlimited, omp_stacksize to 32MB.

compiler flags were

-assume byterecl -warn nounused -warn declarations -O0 -static -check all -traceback -warn interface -check noarg_temp_created -mkl=parallel -qopenmp

Neither at compile time nor at run time any errors or warnings occured. The pogram ran on a machine with 56 "Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz" processors and 512GB RAM.

Given the compiler flags I used and running the program inside gdb I am running out of ideas at this point. It would be great if one form Intel could look into this. I could suppliy an executable and a data set which triggers the segfault.

Thanks a lot.


Viewing all articles
Browse latest Browse all 415

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>