Allocate memory dynamically from a fixed-size heap in global memory.
The CUDA in-kernel
malloc() function allocates at least
from the device heap and returns a pointer to the allocated memory
NULL if insufficient memory exists to fulfill the request.
The returned pointer is guaranteed to be aligned to a 16-byte boundary.
The memory allocated by a given CUDA thread via
malloc() remains allocated
for the lifetime of the CUDA context, or until it is explicitly released
by a call to
free(). It can be used by any other CUDA threads
even from subsequent kernel launches.