Y
Hacker News
new
|
ask
|
show
|
jobs
user:
skidrow
created:
2024-07-02
karma:
364
submissions:
Creating custom kernels for the AMD MI300
2 points
|
0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
4 points
|
0 comments
Matrix Core Programming on AMD GPUs
116 points
|
5 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
3 points
|
0 comments
Matrix Core Programming on AMD GPUs
2 points
|
0 comments
Creating custom kernels for the AMD MI300
1 points
|
0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points
|
0 comments
Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture
24 points
|
3 comments
Creating custom kernels for the AMD MI300
2 points
|
0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points
|
0 comments
Advanced Matrix Multiplication Optimization on Multi-Core Processors (2024)
85 points
|
3 comments
Creating custom kernels for the AMD MI300
2 points
|
0 comments
Introduction to Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture
2 points
|
0 comments
Creating custom kernels for the AMD MI300
2 points
|
0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points
|
0 comments
Creating custom kernels for the AMD MI300
1 points
|
0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
4 points
|
0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
2 points
|
1 comments
Compiler Explorer: An Essential Kernel Playground for CUDA Developers
2 points
|
0 comments
Creating custom kernels for the AMD MI300
1 points
|
0 comments
DeepSeek-R1 and FP8 Mixed-Precision Training
2 points
|
0 comments
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024)
147 points
|
17 comments
DeepSeek-R1 and FP8 Mixed-Precision Training
2 points
|
0 comments
Implementing a Fast Tensor Core Matmul on the Ada Architecture
1 points
|
0 comments
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores
2 points
|
0 comments
Understanding Peak, Max-Achievable and Delivered FLOPs
1 points
|
0 comments
DeepSeek-R1 and FP8 Mixed-Precision Training
1 points
|
0 comments
Outperforming cuBLAS on H100: A Worklog
3 points
|
0 comments