nvptx64-nvidia-cuda
Tier: 2
This is the target meant for deploying code for Nvidia® accelerators based on their CUDA platform.
Target maintainers
Requirements
This target is no_std and will typically be built with crate-type cdylib and -C linker-flavor=llbc, which generates PTX.
The necessary components for this workflow are:
rustup toolchain add nightlyrustup component add llvm-tools --toolchain nightlyrustup component add llvm-bitcode-linker --toolchain nightly
There are two options for using the core library:
rustup component add rust-src --toolchain nightlyand build using-Z build-std=core.rustup target add nvptx64-nvidia-cuda --toolchain nightly
Target and features
It is generally necessary to specify the target, such as -C target-cpu=sm_89, because the default is very old. This implies two target features: sm_89 and ptx78 (and all preceding features within sm_* and ptx*). Rust will default to using the oldest PTX version that supports the target processor (see this table), which maximizes driver compatibility.
One can use -C target-feature=+ptx80 to choose a later PTX version without changing the target (the default in this case, ptx78, requires CUDA driver version 11.8, while ptx80 would require driver version 12.0).
Later PTX versions may allow more efficient code generation.
Although Rust follows LLVM in representing ptx* and sm_* as target features, they should be thought of as having crate granularity, set via (either via -Ctarget-cpu and optionally -Ctarget-feature).
While the compiler accepts #[target_feature(enable = "ptx80", enable = "sm_89")], it is not supported, may not behave as intended, and may become erroneous in the future.
Building Rust kernels
A no_std crate containing one or more functions with extern "ptx-kernel" can be compiled to PTX using a command like the following.
$ RUSTFLAGS='-Ctarget-cpu=sm_89' cargo +nightly rustc --target=nvptx64-nvidia-cuda -Zbuild-std=core --crate-type=cdylib -- -Clinker-flavor=llbc -Zunstable-options
Intrinsics in core::arch::nvptx may use #[cfg(target_feature = "...")], thus it's necessary to use -Zbuild-std=core with appropriate RUSTFLAGS. The following components are needed for this workflow:
$ rustup component add rust-src --toolchain nightly
$ rustup component add llvm-tools --toolchain nightly
$ rustup component add llvm-bitcode-linker --toolchain nightly