nvidia-cutlass 3.8.0.0-gfbf-2024a-CUDA-12.6.0

CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-matrix multiplication (GEMM) and related computations at all levels and scales within CUDA. It incorporates strategies for hierarchical decomposition and data movement similar to those used to implement cuBLAS and cuDNN. CUTLASS decomposes these "moving parts" into reusable, modular software components abstracted by C++ template classes. Primitives for different levels of a conceptual parallelization hierarchy can be specialized and tuned via custom tiling sizes, data types, and other algorithmic policy. The resulting flexibility simplifies their use as building blocks within custom kernels and applications.

Accessing nvidia-cutlass 3.8.0.0-gfbf-2024a-CUDA-12.6.0

To load the module for nvidia-cutlass 3.8.0.0-gfbf-2024a-CUDA-12.6.0 please use this command on the BEAR systems (BlueBEAR and BEAR Cloud VMs):

📋 module load bear-apps/2024a module load nvidia-cutlass/3.8.0.0-gfbf-2024a-CUDA-12.6.0

BEAR Apps Version

2024a

Architectures

EL8-icelake (GPUs: NVIDIA A100, NVIDIA A30)

The listed architectures consist of two parts: OS-CPU. The OS used is represented by EL and there are several different processor (CPU) types available on BlueBEAR. More information about the processor types on BlueBEAR is available on the BlueBEAR Job Submission page.

Extensions

nvidia-cutlass-3.8.0.0
treelib 1.8.0

More Information

For more information visit the nvidia-cutlass website.

Dependencies

This version of nvidia-cutlass has a direct dependency on: CUDA/12.6.0 CUDA-Python/12.6.2.post1-gfbf-2024a-CUDA-12.6.0 gfbf/2024a networkx/3.4.2-gfbf-2024a pydot/3.0.3-GCCcore-13.3.0 Python/3.12.3-GCCcore-13.3.0 Python-bundle-PyPI/2024.06-GCCcore-13.3.0 SciPy-bundle/2024.05-gfbf-2024a

Required By

This version of nvidia-cutlass is a direct dependent of: PyTorch/2.7.1-foss-2024a-CUDA-12.6.0

Last modified on 19th May 2026