LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 41584 - Assertion failure at kmp_runtime.cpp(4277): new_thr->th.th_active == (!0)
Summary: Assertion failure at kmp_runtime.cpp(4277): new_thr->th.th_active == (!0)
Status: RESOLVED FIXED
Alias: None
Product: OpenMP
Classification: Unclassified
Component: Runtime Library (show other bugs)
Version: unspecified
Hardware: PC Linux
: P normal
Assignee: Andrey Churbanov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-24 10:10 PDT by Joel E. Denny
Modified: 2019-05-28 09:05 PDT (History)
1 user (show)

See Also:
Fixed By Commit(s): r360784, r360919


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joel E. Denny 2019-04-24 10:10:16 PDT
Using clang and openmp built from r359012, I see the following assertion failure:

$ cat test.c
int main() {
  #pragma omp target teams num_teams(3)
  ;
  return 0;
}
$ clang -fopenmp test.c
$ ./a.out
Assertion failure at kmp_runtime.cpp(4270): new_thr->th.th_active == (!0).
OMP: Error #13: Assertion failure at kmp_runtime.cpp(4270).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Aborted (core dumped)

If I remove num_threads(3), I see the following instead:

$ clang -fopenmp test.c
$ ./a.out
Assertion failure at z_Linux_util.cpp(1469): (__kmp_thread_pool_active_nth) >= 0.
OMP: Error #13: Assertion failure at z_Linux_util.cpp(1469).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Aborted (core dumped)

These assertions do not fail every time, so there's a race.  I also see all this at r357927 but so far never at r357926, so r357927 is the likely culprit.

I built clang and openmp using:

$ clang --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ cat /etc/issue
Ubuntu 18.04.2 LTS \n \l
Comment 1 Andrey Churbanov 2019-05-08 04:35:02 PDT
Issue reproduced, working on a fix.
Comment 2 Joel E. Denny 2019-05-15 10:20:23 PDT
(In reply to Andrey Churbanov from comment #1)
> Issue reproduced, working on a fix.

As discussed in the review, D61944 fixes my second reproducer (the one without `num_teams(3)`), but it does not fix the first reproducer for me.  I tried with the patch applied on top of r360778.

I'm noticing now that I don't always see the same assert fail from the first reproducer.  Currently, I more often see:

a.out: ../nptl/pthread_mutex_lock.c:79: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted (core dumped)

But I still frequently see the one I originally reported:

Assertion failure at kmp_runtime.cpp(4294): new_thr->th.th_active == (!0).
OMP: Error #13: Assertion failure at kmp_runtime.cpp(4294).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Aborted (core dumped)

Again, it's racy.  If I run the compiled executable in a shell while loop, I usually see the failure within just a few seconds.

If there's any more information I can provide, please let me know.
Comment 3 Andrey Churbanov 2019-05-15 11:01:58 PDT
(In reply to Joel E. Denny from comment #2)
> (In reply to Andrey Churbanov from comment #1)
> > Issue reproduced, working on a fix.
> 
> As discussed in the review, D61944 fixes my second reproducer (the one
> without `num_teams(3)`), but it does not fix the first reproducer for me.  I
> tried with the patch applied on top of r360778.
> 
> I'm noticing now that I don't always see the same assert fail from the first
> reproducer.  Currently, I more often see:
> 
> a.out: ../nptl/pthread_mutex_lock.c:79: __pthread_mutex_lock: Assertion
> `mutex->__data.__owner == 0' failed.
> Aborted (core dumped)
> 
> But I still frequently see the one I originally reported:
> 
> Assertion failure at kmp_runtime.cpp(4294): new_thr->th.th_active == (!0).
> OMP: Error #13: Assertion failure at kmp_runtime.cpp(4294).
> OMP: Hint Please submit a bug report with this message, compile and run
> commands used, and machine configuration info including native compiler and
> operating system versions. Faster response will be obtained by including all
> program sources. For information on submitting this issue, please see
> https://bugs.llvm.org/.
> Aborted (core dumped)
> 
> Again, it's racy.  If I run the compiled executable in a shell while loop, I
> usually see the failure within just a few seconds.
> 
> If there's any more information I can provide, please let me know.

Yes, please provide some more info.
Which HW are you using? Is there real offload happen in your execution, or target region runs on host?

To me the pthreads failure looks like a memory corruption.  But I am not 100% sure. 

Anyway, I will try to reproduce the failure once more.

Thanks,
Andrey
Comment 4 Joel E. Denny 2019-05-15 11:14:59 PDT
(In reply to Andrey Churbanov from comment #3)
> Yes, please provide some more info.
> Which HW are you using? Is there real offload happen in your execution, or
> target region runs on host?

Host (x86_64).  I'm compiling with only -fopenmp.

I just tried with -fopenmp-targets=nvptx64, and so far it doesn't reproduce then.
Comment 5 Andrey Churbanov 2019-05-16 11:03:14 PDT
Joel,

I've just committed second fix for another assertion (that was indeed different problem).  Thanks to Johnny Peyton for the investigation and the fix provided.  Please check if it works for you, when you have time.
Comment 6 Joel E. Denny 2019-05-16 15:39:00 PDT
(In reply to Andrey Churbanov from comment #5)
> Joel,
> 
> I've just committed second fix for another assertion (that was indeed
> different problem).  Thanks to Johnny Peyton for the investigation and the
> fix provided.  Please check if it works for you, when you have time.

That assert seems to be fixed.  Thanks.

However, the same test case (with `num_teams(3)`) targeting host now sometimes fails a nearby assert:

Assertion failure at kmp_runtime.cpp(4300): new_thr->th.th_active == 0.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(4300).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Aborted (core dumped)

Again, I run the executable in a shell while loop.  Sometimes it fails in a few seconds.  One time it took nearly 20 minutes.
Comment 7 Joel E. Denny 2019-05-28 09:05:17 PDT
Last problem fixed at <https://reviews.llvm.org/D62251>.  Thanks.