User: skidrow | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

user: skidrow
created: 2024-07-02
karma: 408

submissions:

Matrix Multiplication on Blackwell

3 points | 0 comments

Toward Better Hip Kernel Generation for AMD GPUs

2 points | 0 comments

Matrix Multiplication on Blackwell

2 points | 0 comments

Toward Better Hip Kernel Generation for AMD GPUs

2 points | 0 comments

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design

1 points | 0 comments

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design

6 points | 0 comments

Toward Better Hip Kernel Generation for AMD GPUs

3 points | 0 comments

Matrix Multiplication on Blackwell

2 points | 0 comments

FlashAttention-4: Algorithm and Kernel Pipelining

4 points | 0 comments

Toward Better Hip Kernel Generation for AMD GPUs

3 points | 0 comments

Matrix Multiplication on Blackwell

2 points | 0 comments

FlashAttention-4: Algorithm and Kernel Pipelining

2 points | 0 comments

Toward Better Hip Kernel Generation for AMD GPUs

3 points | 0 comments

Toward Better Hip Kernel Generation for AMD GPUs

2 points | 0 comments

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design

1 points | 0 comments

Occupancy Math on the AMD MI355X: A From-First-Principles Guide

2 points | 0 comments

Computer Vision – Lecture 1.1 (Introduction: Organization) [video]

2 points | 0 comments

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design

2 points | 0 comments

Cutlass Tutorial: Efficient GEMM Kernel Designs with Pipelining

2 points | 0 comments

Toward Better Hip Kernel Generation for AMD GPUs

2 points | 0 comments

FP8 GEMM Optimization on AMD CDNA4 Architecture

2 points | 0 comments

Occupancy Math on the AMD MI355X: A From-First-Principles Guide

2 points | 0 comments

FP8 GEMM Optimization on AMD CDNA4 Architecture

1 points | 0 comments

Occupancy Math on the AMD MI355X

1 points | 0 comments

FP8 GEMM Optimization on AMD CDNA4 Architecture

1 points | 0 comments

Occupancy Math on the AMD MI355X: A From-First-Principles Guide

1 points | 0 comments

FP8 GEMM Optimization on AMD CDNA4 Architecture

4 points | 0 comments

Occupancy Math on the AMD MI355X: A From-First-Principles Guide

50 points | 8 comments

FP8 GEMM Optimization on AMD CDNA4 Architecture

3 points | 0 comments

Deep Dive into 4-Wave Interleave FP8 GEMM

3 points | 0 comments