Repository logo
Communities & Collections
Research Outputs
Fundings & Projects
People
Statistics
User Manual
Have you forgotten your password?
  1. Home
  2. Faculty of Computer Science and Engineering
  3. Faculty of Computer Science and Engineering: Conference papers
  4. Loop Unrolling Impact on CUDA Matrix Multiplication Operations
Details

Loop Unrolling Impact on CUDA Matrix Multiplication Operations

Date Issued
2024-11-26
Author(s)
Stefkovski, Vojdan
Mileski, Dimitar
Gusev, Marjan
DOI
10.1109/telfor63250.2024.10819077
Abstract
This paper investigates the impact of loop unrolling on CUDA matrix multiplication operations’ performance across NVIDIA GPUs. We benchmarked both basic and unrolled kernels with varying unroll factors (2, 4, 8, and 16) and CUDA block sizes (8, 16, and 32) on matrices ranging from 128 × 128 to 4096 × 4096. Using two GPUs, the GeForce RTX 4060 and GTX TITAN X, we analyze how unrolling factors impact execution time. Our findings indicate that loop unrolling, particularly with factors of 8 and 16 and a block size of 32, yields significant performance gains on larger matrices. These results confirm loop unrolling as an effective optimization technique for CUDA matrix operations, providing insights for developers to enhance computational efficiency across different GPU architectures.
Subjects

Processor scheduling ...

File(s)
Loading...
Thumbnail Image
Name

Loop Unrolling Impact on CUDA Matrix Multiplication Operations - accepted version.pdf

Description
Accepted version
Size

217.67 KB

Format

Adobe PDF

Checksum

(MD5):22be5bc806c3588309197c1d37429481

⠀

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Accessibility settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify