kThreads: 128
RegA: RowMajor(16, 16)
RegB: ColMajor(16, 8)
RegC: RowMajor(16, 16)
IteratorA: numel = 524288, ChunkShape = (256, 64), stripe count = (1, 32)
IteratorB: numel = 262144, ChunkShape = (64, 128), stripe count = (32, 1)
blocks: [4, 8]
terminate called after throwing an instance of 'thrust::THRUST_200400_860_NS::system::system_error'
what(): trivial_device_copy D->H failed: cudaErrorIllegalAddress: an illegal memory access was encountered
Aborted (core dumped)
Example 1 for GEMM
01_gemm_global_regcrashes during kernel for default problem size:System:
OS: Ubuntu 24.04.2
Device: RTX 3070 Mobile (SM86)
Driver: 560.35.03
CUDA Version: 12.6
Output: