Hacker News new | ask | show | jobs
user: skidrow
created: 2024-07-02
karma: 364

submissions:

Creating custom kernels for the AMD MI300
2 points | 0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
4 points | 0 comments
Matrix Core Programming on AMD GPUs
116 points | 5 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
3 points | 0 comments
Matrix Core Programming on AMD GPUs
2 points | 0 comments
Creating custom kernels for the AMD MI300
1 points | 0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points | 0 comments
Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture
24 points | 3 comments
Creating custom kernels for the AMD MI300
2 points | 0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points | 0 comments
Advanced Matrix Multiplication Optimization on Multi-Core Processors (2024)
85 points | 3 comments
Creating custom kernels for the AMD MI300
2 points | 0 comments
Introduction to Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture
2 points | 0 comments
Creating custom kernels for the AMD MI300
2 points | 0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points | 0 comments
Creating custom kernels for the AMD MI300
1 points | 0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
4 points | 0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points | 1 comments
Compiler Explorer: An Essential Kernel Playground for CUDA Developers
2 points | 0 comments
Creating custom kernels for the AMD MI300
1 points | 0 comments
DeepSeek-R1 and FP8 Mixed-Precision Training
2 points | 0 comments
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024)
147 points | 17 comments
DeepSeek-R1 and FP8 Mixed-Precision Training
2 points | 0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
1 points | 0 comments
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores
2 points | 0 comments
Understanding Peak, Max-Achievable and Delivered FLOPs
1 points | 0 comments
DeepSeek-R1 and FP8 Mixed-Precision Training
1 points | 0 comments
Outperforming cuBLAS on H100: A Worklog
3 points | 0 comments