Hi,
one of my programs is crashing when runnig a threaded version. When running it inside gdb the output left me helpless:
[New LWP 397493]
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 397493]
0x0000000001dad557 in _INTERNAL_25_______src_kmp_barrier_cpp_5de9139b::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) ()
gdb bt yielded:
#0 0x0000000001dad557 in _INTERNAL_25_______src_kmp_barrier_cpp_5de9139b::__kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) ()
#1 0x0000000001dae38b in __kmp_fork_barrier(int, int) ()
#2 0x0000000001d150c0 in __kmp_launch_thread ()
#3 0x0000000001d5d341 in _INTERNAL_26_______src_z_Linux_util_cpp_47afea4b::__kmp_launch_worker(void*) ()
#4 0x0000000001eb3ff7 in start_thread ()
#5 0x0000000001f2507b in clone ()
To get an idea about parts of the structure of the program a code snippet which mimics what the program is doing is given below. However, this is just for examplification, I have not tested whether the snippet will produce the same segfaut.
Module Mod_Root Implicit none Type :: root End type root End Module Mod_Root Module Mod_Sigma use Mod_Root, only: root Implicit None Type, abstract, extends(root) :: Sigma Real, Pointer, contiguous :: PreMult(:,:), PostMult(:,:) contains Procedure(SubMult), PAss, Public, Deferred :: Mult end type Sigma Abstract Interface Subroutine SubMult(this) Import Sigma Class(Sigma), Intent(In) :: this End Subroutine SubMult End Interface Private :: SubMult End Module Mod_Sigma Module Mod_Sigma_Type_A use Mod_Sigma, only: Sigma Type, extends(Sigma) :: Sigma_Type_A Real, Allocatable :: Mat(:,:,:) contains Procedure, Pass, Public :: Mult=>SubMult End type Sigma_Type_A Private :: SubMult contains Subroutine SubMult(this) Implicit None Class(Sigma_Type_A), Intent(In) :: this Integer :: i Do i=1,size(this%Mat,3) this%PostMult(i,:)=matmul(this%PreMult(i,:),this%Mat(:,:,i)) End Do End Subroutine SubMult End Module Mod_Sigma_Type_A Module Mod_Sigma_Type_B use Mod_Sigma, only: Sigma Type, extends(Sigma) :: Sigma_Type_B Real, Allocatable :: Mat(:,:) contains Procedure, Pass, Public :: Mult=>SubMult End type Sigma_Type_B Private :: SubMult contains Subroutine SubMult(this) Implicit None Class(Sigma_Type_B), Intent(In) :: this this%PostMult=matmul(this%PreMult,this%Mat) End Subroutine SubMult End Module Mod_Sigma_Type_B Module Mod_Struct use Mod_Root, only: root use Mod_Sigma, only: sigma Type,extends(root), abstract :: Struct Class(Sigma), Allocatable :: Sigma Contains Procedure(SubMult), Public, PAss, Deferred :: Mult End type Struct Type :: StructPt CLass(Struct), Pointer :: pt end type StructPt Abstract interface Subroutine SubMult(this) Import Struct Class(Struct), Intent(InOut), Target :: this end Subroutine SubMult End interface End Module Mod_Struct Module Mod_Struct_A use Mod_Struct Type, extends(Struct) :: Struct_Type_A Real, Allocatable :: Mat1(:,:), Mat2(:,:) Contains Procedure, Pass, Public :: Mult => SubMultSigma End type Struct_Type_A Private :: SubMultSigma contains Subroutine SubMultSigma(this) Implicit None Class(Struct_Type_A), Intent(InOut), Target :: this this%Sigma%PreMult=>this%Mat1 this%Sigma%PostMult=>this%Mat2 call this%Sigma%Mult() End Subroutine SubMultSigma End Module Mod_Struct_A Program Test use Mod_Struct use Mod_Struct_A use Mod_Sigma_Type_A use Mod_Sigma_Type_B Type(Struct_Type_A), Target :: a, b Class(StructPt), Allocatable :: x(:) Integer :: i allocate(Sigma_Type_A::a%sigma) allocate(Sigma_Type_B::b%sigma) Allocate(x(2)) x(1)%pt=>a;x(2)%pt=>b !$OMP PARALLEL DO PRIVATE(i) Do i=1,2 call x(i)%pt%Mult() End Do !$OMP END PARALLEL DO End Program Test
The segfault in my progrram occurs in a location similar to when calling x(i)%pt%Mult, but only if b%sigma has been allocated as type "Sigma_Type_B". If both, a and b, has been allocated as type "Sigma_Type_A", the program runs fine invaribaly of the size of the relevant arrays. Moreover, threaded or unthreaded the pogram always runs when the involved arrays are small. However, when arrays occupy up to 200GB of RAM and different type allocations are used, it crashes.
ifort version is 17.01, linux version is centos 7 kerner 3.10, stack size is set to unlimited, omp_stacksize to 32MB.
compiler flags were
-assume byterecl -warn nounused -warn declarations -O0 -static -check all -traceback -warn interface -check noarg_temp_created -mkl=parallel -qopenmp
Neither at compile time nor at run time any errors or warnings occured. The pogram ran on a machine with 56 "Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz" processors and 512GB RAM.
Given the compiler flags I used and running the program inside gdb I am running out of ideas at this point. It would be great if one form Intel could look into this. I could suppliy an executable and a data set which triggers the segfault.
Thanks a lot.