Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HIP] Support managed variables using the new driver #123437

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 18, 2025

Summary:
Previously, managed variables didn't work in rdc mode using the new
driver because we just didn't register them. This was previously ignored
because we didn't have enough space in the current struct format. This
patch amends that by just emitting a struct pair for the two variables
and using the single pointer.

In the future, a more extensible entry format would be nice, but that
can be done later.

Summary:
Previously, managed variables didn't work in rdc mode using the new
driver because we just didn't register them. This was previously ignored
because we didn't have enough space in the current struct format. This
patch amends that by just emitting a struct pair for the two variables
and using the single pointer.

In the future, a more extensible entry format would be nice, but that
can be done later.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:codegen labels Jan 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 18, 2025

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
Previously, managed variables didn't work in rdc mode using the new
driver because we just didn't register them. This was previously ignored
because we didn't have enough space in the current struct format. This
patch amends that by just emitting a struct pair for the two variables
and using the single pointer.

In the future, a more extensible entry format would be nice, but that
can be done later.


Patch is 22.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123437.diff

6 Files Affected:

  • (modified) clang/lib/CodeGen/CGCUDANV.cpp (+25-6)
  • (modified) clang/test/CodeGenCUDA/offloading-entries.cu (+36-42)
  • (modified) clang/test/Driver/linker-wrapper-image.c (+18-10)
  • (modified) llvm/include/llvm/Frontend/Offloading/Utility.h (+4)
  • (modified) llvm/lib/Frontend/Offloading/OffloadWrapper.cpp (+16)
  • (modified) llvm/lib/Frontend/Offloading/Utility.cpp (+10)
diff --git a/clang/lib/CodeGen/CGCUDANV.cpp b/clang/lib/CodeGen/CGCUDANV.cpp
index ae14d74f2d9151..0fc81491c40855 100644
--- a/clang/lib/CodeGen/CGCUDANV.cpp
+++ b/clang/lib/CodeGen/CGCUDANV.cpp
@@ -1221,12 +1221,31 @@ void CGNVCUDARuntime::createOffloadingEntries() {
              ? static_cast<int32_t>(llvm::offloading::OffloadGlobalNormalized)
              : 0);
     if (I.Flags.getKind() == DeviceVarFlags::Variable) {
-      llvm::offloading::emitOffloadingEntry(
-          M, I.Var, getDeviceSideName(I.D), VarSize,
-          (I.Flags.isManaged() ? llvm::offloading::OffloadGlobalManagedEntry
-                               : llvm::offloading::OffloadGlobalEntry) |
-              Flags,
-          /*Data=*/0, Section);
+      if (I.Flags.isManaged()) {
+        assert(I.Var->getName().ends_with(".managed") &&
+               "HIP managed variables not transformed");
+
+        // Create a struct to contain the two variables.
+        auto *ManagedVar = M.getNamedGlobal(
+            I.Var->getName().drop_back(StringRef(".managed").size()));
+        llvm::Constant *StructData[] = {ManagedVar, I.Var};
+        llvm::Constant *Initializer = llvm::ConstantStruct::get(
+            llvm::offloading::getManagedTy(M), StructData);
+        auto *Struct = new llvm::GlobalVariable(
+            M, llvm::offloading::getManagedTy(M),
+            /*IsConstant=*/true, llvm::GlobalValue::PrivateLinkage, Initializer,
+            I.Var->getName());
+
+        llvm::offloading::emitOffloadingEntry(
+            M, Struct, getDeviceSideName(I.D), VarSize,
+            llvm::offloading::OffloadGlobalManagedEntry | Flags,
+            /*Data=*/static_cast<uint32_t>(I.Var->getAlignment()), Section);
+      } else {
+        llvm::offloading::emitOffloadingEntry(
+            M, I.Var, getDeviceSideName(I.D), VarSize,
+            llvm::offloading::OffloadGlobalEntry | Flags,
+            /*Data=*/0, Section);
+      }
     } else if (I.Flags.getKind() == DeviceVarFlags::Surface) {
       llvm::offloading::emitOffloadingEntry(
           M, I.Var, getDeviceSideName(I.D), VarSize,
diff --git a/clang/test/CodeGenCUDA/offloading-entries.cu b/clang/test/CodeGenCUDA/offloading-entries.cu
index 259e3324e8ac94..d46a25969e3ecd 100644
--- a/clang/test/CodeGenCUDA/offloading-entries.cu
+++ b/clang/test/CodeGenCUDA/offloading-entries.cu
@@ -1,4 +1,4 @@
-// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --global-value-regex ".offloading.entry.*"
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --global-value-regex ".offloading.entry.*" "managed.*"
 // RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fgpu-rdc \
 // RUN:   --offload-new-driver -emit-llvm -o - -x cuda  %s | FileCheck \
 // RUN:   --check-prefix=CUDA %s
@@ -14,50 +14,68 @@
 
 #include "Inputs/cuda.h"
 
+#define __managed__ __attribute__((managed))
+
 //.
+// CUDA: @managed = global i32 undef, align 4
 // CUDA: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
 // CUDA: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z21__device_stub__kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
 // CUDA: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // CUDA: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
-// CUDA: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
-// CUDA: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "cuda_offloading_entries", align 1
-// CUDA: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
-// CUDA: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "cuda_offloading_entries", align 1
+// CUDA: @.offloading.entry_name.3 = internal unnamed_addr constant [8 x i8] c"managed\00", section ".llvm.rodata.offloading", align 1
+// CUDA: @.offloading.entry.managed = weak constant %struct.__tgt_offload_entry { ptr @managed, ptr @.offloading.entry_name.3, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries", align 1
+// CUDA: @.offloading.entry_name.4 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
+// CUDA: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.4, i64 4, i32 2, i32 1 }, section "cuda_offloading_entries", align 1
+// CUDA: @.offloading.entry_name.5 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
+// CUDA: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.5, i64 4, i32 3, i32 1 }, section "cuda_offloading_entries", align 1
 //.
+// HIP: @managed.managed = global i32 0, align 4
+// HIP: @managed = externally_initialized global ptr null
 // HIP: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1
 // HIP: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z6kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries", align 1
 // HIP: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // HIP: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries", align 1
-// HIP: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
-// HIP: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "hip_offloading_entries", align 1
-// HIP: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
-// HIP: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "hip_offloading_entries", align 1
+// HIP: @managed.managed.3 = private constant %struct.__managed_var { ptr @managed, ptr @managed.managed }
+// HIP: @.offloading.entry_name.4 = internal unnamed_addr constant [8 x i8] c"managed\00", section ".llvm.rodata.offloading", align 1
+// HIP: @.offloading.entry.managed = weak constant %struct.__tgt_offload_entry { ptr @managed.managed.3, ptr @.offloading.entry_name.4, i64 4, i32 1, i32 4 }, section "hip_offloading_entries", align 1
+// HIP: @.offloading.entry_name.5 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
+// HIP: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.5, i64 4, i32 2, i32 1 }, section "hip_offloading_entries", align 1
+// HIP: @.offloading.entry_name.6 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
+// HIP: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.6, i64 4, i32 3, i32 1 }, section "hip_offloading_entries", align 1
 //.
+// CUDA-COFF: @managed = dso_local global i32 undef, align 4
 // CUDA-COFF: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z18__device_stub__foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
 // CUDA-COFF: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z21__device_stub__kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
 // CUDA-COFF: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // CUDA-COFF: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
-// CUDA-COFF: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
-// CUDA-COFF: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "cuda_offloading_entries$OE", align 1
-// CUDA-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
-// CUDA-COFF: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.offloading.entry_name.3 = internal unnamed_addr constant [8 x i8] c"managed\00", section ".llvm.rodata.offloading", align 1
+// CUDA-COFF: @.offloading.entry.managed = weak constant %struct.__tgt_offload_entry { ptr @managed, ptr @.offloading.entry_name.3, i64 4, i32 0, i32 0 }, section "cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
+// CUDA-COFF: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.4, i64 4, i32 2, i32 1 }, section "cuda_offloading_entries$OE", align 1
+// CUDA-COFF: @.offloading.entry_name.5 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
+// CUDA-COFF: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.5, i64 4, i32 3, i32 1 }, section "cuda_offloading_entries$OE", align 1
 //.
+// HIP-COFF: @managed.managed = dso_local global i32 0, align 4
+// HIP-COFF: @managed = dso_local externally_initialized global ptr null
 // HIP-COFF: @.offloading.entry_name = internal unnamed_addr constant [8 x i8] c"_Z3foov\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry._Z3foov = weak constant %struct.__tgt_offload_entry { ptr @_Z3foov, ptr @.offloading.entry_name, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
 // HIP-COFF: @.offloading.entry_name.1 = internal unnamed_addr constant [11 x i8] c"_Z6kernelv\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry._Z6kernelv = weak constant %struct.__tgt_offload_entry { ptr @_Z6kernelv, ptr @.offloading.entry_name.1, i64 0, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
 // HIP-COFF: @.offloading.entry_name.2 = internal unnamed_addr constant [4 x i8] c"var\00", section ".llvm.rodata.offloading", align 1
 // HIP-COFF: @.offloading.entry.var = weak constant %struct.__tgt_offload_entry { ptr @var, ptr @.offloading.entry_name.2, i64 4, i32 0, i32 0 }, section "hip_offloading_entries$OE", align 1
-// HIP-COFF: @.offloading.entry_name.3 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
-// HIP-COFF: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.3, i64 4, i32 2, i32 1 }, section "hip_offloading_entries$OE", align 1
-// HIP-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
-// HIP-COFF: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.4, i64 4, i32 3, i32 1 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @managed.managed.3 = private constant %struct.__managed_var { ptr @managed, ptr @managed.managed }
+// HIP-COFF: @.offloading.entry_name.4 = internal unnamed_addr constant [8 x i8] c"managed\00", section ".llvm.rodata.offloading", align 1
+// HIP-COFF: @.offloading.entry.managed = weak constant %struct.__tgt_offload_entry { ptr @managed.managed.3, ptr @.offloading.entry_name.4, i64 4, i32 1, i32 4 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.offloading.entry_name.5 = internal unnamed_addr constant [5 x i8] c"surf\00", section ".llvm.rodata.offloading", align 1
+// HIP-COFF: @.offloading.entry.surf = weak constant %struct.__tgt_offload_entry { ptr @surf, ptr @.offloading.entry_name.5, i64 4, i32 2, i32 1 }, section "hip_offloading_entries$OE", align 1
+// HIP-COFF: @.offloading.entry_name.6 = internal unnamed_addr constant [4 x i8] c"tex\00", section ".llvm.rodata.offloading", align 1
+// HIP-COFF: @.offloading.entry.tex = weak constant %struct.__tgt_offload_entry { ptr @tex, ptr @.offloading.entry_name.6, i64 4, i32 3, i32 1 }, section "hip_offloading_entries$OE", align 1
 //.
 // CUDA-LABEL: @_Z18__device_stub__foov(
 // CUDA-NEXT:  entry:
@@ -91,6 +109,7 @@ __global__ void foo() {}
 __device__ int var = 1;
 const __device__ int constant = 1;
 extern __device__ int external;
+__device__ __managed__ int managed = 0;
 
 // CUDA-LABEL: @_Z21__device_stub__kernelv(
 // CUDA-NEXT:  entry:
@@ -137,28 +156,3 @@ template <typename T, int dim = 1, int mode = 0>
 struct __attribute__((device_builtin_texture_type)) texture : public textureReference {};
 
 texture<void> tex;
-//.
-// CUDA: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
-// CUDA: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
-// CUDA: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
-// CUDA: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
-// CUDA: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
-//.
-// HIP: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
-// HIP: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
-// HIP: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
-// HIP: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
-// HIP: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
-//.
-// CUDA-COFF: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
-// CUDA-COFF: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
-// CUDA-COFF: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
-// CUDA-COFF: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
-// CUDA-COFF: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
-//.
-// HIP-COFF: [[META0:![0-9]+]] = !{ptr @.offloading.entry_name}
-// HIP-COFF: [[META1:![0-9]+]] = !{ptr @.offloading.entry_name.1}
-// HIP-COFF: [[META2:![0-9]+]] = !{ptr @.offloading.entry_name.2}
-// HIP-COFF: [[META3:![0-9]+]] = !{ptr @.offloading.entry_name.3}
-// HIP-COFF: [[META4:![0-9]+]] = !{ptr @.offloading.entry_name.4}
-//.
diff --git a/clang/test/Driver/linker-wrapper-image.c b/clang/test/Driver/linker-wrapper-image.c
index f553f20f7ee897..7f96f629e91272 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -87,7 +87,7 @@
 // CUDA-NEXT:   br i1 %1, label %while.entry, label %while.end
 
 //      CUDA: while.entry:
-// CUDA-NEXT:   %entry1 = phi ptr [ @__start_cuda_offloading_entries, %entry ], [ %12, %if.end ]
+// CUDA-NEXT:   %entry1 = phi ptr [ @__start_cuda_offloading_entries, %entry ], [ %13, %if.end ]
 // CUDA-NEXT:   %2 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 0
 // CUDA-NEXT:   %addr = load ptr, ptr %2, align 8
 // CUDA-NEXT:   %3 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 1
@@ -125,7 +125,11 @@
 // CUDA-NEXT:   br label %if.end
 
 //      CUDA: sw.managed:
-// CUDA-NEXT:   br label %if.end
+// CUDA-NEXT:  %managed.addr = load ptr, ptr %addr, align 8
+// CUDA-NEXT:  %12 = getelementptr inbounds ptr, ptr %addr, i64 1
+// CUDA-NEXT:  %managed.addr2 = load ptr, ptr %12, align 8
+// CUDA-NEXT:  call void @__cudaRegisterManagedVar(ptr %0, ptr %managed.addr, ptr %managed.addr2, ptr %name, i64 %size, i32 %textype)
+// CUDA-NEXT:  br label %if.end
 
 //      CUDA: sw.surface:
 // CUDA-NEXT:   br label %if.end
@@ -134,9 +138,9 @@
 // CUDA-NEXT:   br label %if.end
 
 //      CUDA: if.end:
-// CUDA-NEXT:   %12 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 1
-// CUDA-NEXT:   %13 = icmp eq ptr %12, @__stop_cuda_offloading_entries
-// CUDA-NEXT:   br i1 %13, label %while.end, label %while.entry
+// CUDA-NEXT:   %13 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 1
+// CUDA-NEXT:   %14 = icmp eq ptr %13, @__stop_cuda_offloading_entries
+// CUDA-NEXT:   br i1 %14, label %while.end, label %while.entry
 
 //      CUDA: while.end:
 // CUDA-NEXT:   ret void
@@ -187,7 +191,7 @@
 // HIP-NEXT:   br i1 %1, label %while.entry, label %while.end
 
 //      HIP: while.entry:
-// HIP-NEXT:   %entry1 = phi ptr [ @__start_hip_offloading_entries, %entry ], [ %12, %if.end ]
+// HIP-NEXT:   %entry1 = phi ptr [ @__start_hip_offloading_entries, %entry ], [ %13, %if.end ]
 // HIP-NEXT:   %2 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 0
 // HIP-NEXT:   %addr = load ptr, ptr %2, align 8
 // HIP-NEXT:   %3 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 0, i32 1
@@ -225,7 +229,11 @@
 // HIP-NEXT:   br label %if.end
 
 //      HIP: sw.managed:
-// HIP-NEXT:   br label %if.end
+// HIP-NEXT:  %managed.addr = load ptr, ptr %addr, align 8
+// HIP-NEXT:  %12 = getelementptr inbounds ptr, ptr %addr, i64 1
+// HIP-NEXT:  %managed.addr2 = load ptr, ptr %12, align 8
+// HIP-NEXT:  call void @__hipRegisterManagedVar(ptr %0, ptr %managed.addr, ptr %managed.addr2, ptr %name, i64 %size, i32 %textype)
+// HIP-NEXT:  br label %if.end
 
 //      HIP: sw.surface:
 // HIP-NEXT:   call void @__hipRegisterSurface(ptr %0, ptr %addr, ptr %name, ptr %name, i32 %textype, i32 %extern)
@@ -236,9 +244,9 @@
 // HIP-NEXT:   br label %if.end
 
 //      HIP: if.end:
-// HIP-NEXT:   %12 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 1
-// HIP-NEXT:   %13 = icmp eq ptr %12, @__stop_hip_offloading_entries
-// HIP-NEXT:   br i1 %13, label %while.end, label %while.entry
+// HIP-NEXT:   %13 = getelementptr inbounds %struct.__tgt_offload_entry, ptr %entry1, i64 1
+// HIP-NEXT:   %14 = icmp eq ptr %13, @__stop_hip_offloading_entries
+// HIP-NEXT:   br i1 %14, label %while.end, label %while.entry
 
 //      HIP: while.end:
 // HIP-NEXT:   ret void
diff --git a/llvm/include/llvm/Frontend/Offloading/Utility.h b/llvm/include/llvm/Frontend/Offloading/Utility.h
index abaea843848b21..b30fd1bd8e56a0 100644
--- a/llvm/include/llvm/Frontend/Offloading/Utility.h
+++ b/llvm/include/llvm/Frontend/Offloading/Utility.h
@@ -45,6 +45,10 @@ enum OffloadEntryKindFlag : uint32_t {
 /// globals that will be registered with the offloading runtime.
 StructType *getEntryTy(Module &M);
 
+/// Returns the struct type we store the two pointers for CUDA / HIP managed
+/// variables in. Necessary until we widen the offload entry struct.
+StructType *getManagedTy(Module &M);
+
 /// Create an offloading section struct used to register this global at
 /// runtime.
 ///
diff --git a/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp b/llvm/lib/Frontend/Offloading/OffloadWrapper.cpp
index...
[truncated]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants