Random123 alignment violation leads to crash
Here the code that copies a vector into an array of scalars:
// output bulk
unsigned long idx = get_global_id(0)*4;
while (idx + 4 < out_size)
{
*(global output_vec_t *) (output + idx) =
GET_RANDOM_NUM(RANLUX_FUNC(&ranluxclstate));
idx += 4*NUM_WORKITEMS;
}
output is only guaranteed to be aligned to scalar boundary, but this is doing a vector assignment which may have stricter alignment rules. The following script will crash on my system (POCL 1.0 + AVX). For some reason, this only fails on Darwin and not on Linux.
import pyopencl as cl
import pyopencl.clrandom as clrandom
import numpy as np
if __name__ == "__main__":
c = cl.create_some_context(0)
rng = clrandom.PhiloxGenerator(c)
q = cl.CommandQueue(c)
rng.normal(q, 10, np.float64)
The stack trace indicates an alignment violation (the data should be aligned to 32 bytes):
(inteq) $ lldb python test.py
(lldb) target create "python"
Current executable set to 'python' (x86_64).
(lldb) settings set -- target.run-args "test.py"
(lldb) run
Process 93926 launched: '/Users/matt/miniconda3/envs/inteq/bin/python' (x86_64)
Process 93926 stopped
* thread #3, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x000000018e7f69e0 rng_gen_philox4x32_normal.so`_pocl_launcher_rng_gen_philox4x32_normal + 560
rng_gen_philox4x32_normal.so`_pocl_launcher_rng_gen_philox4x32_normal:
-> 0x18e7f69e0 <+560>: vmovapd %ymm0, (%r12)
0x18e7f69e6 <+566>: addq 0xc8(%rbx), %r15
0x18e7f69ed <+573>: addq 0xd0(%rbx), %r12
0x18e7f69f4 <+580>: leaq 0x4(%r15), %rax
Target 0: (python) stopped.
There is also another spot in the same file that uses the vector-copy pattern and might also be susceptible to alignment violations.