| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Artgor 1015 days ago

> I wonder how hard it would be to modify this code to run on a 64GB M2 Mac.

It isn't that hard, I was able to run in on M1. The changes are:

remove or modify multiprocessing - it doesn't work on Mac the same way as in the code;

replace `device = "cuda"` with `device = "mps"`

In this line ` att_idxs = (torch.clamp(torch.arange(context_size)[None, :] - torch.arange(context_size)[:, None], -pos_emb_radius, pos_emb_radius-1) % pos_emb_size).to("cuda")` replace cuda with "mps"

in `optim.AdamW` remove `fused=True` - we can't do it without CUDA

Replace ```with autocast(device_type='cuda', dtype=torch.float16): _, loss = mlm_head(bert(batch_data_torch_xs[mb_start_idx:mb_end_idx]), batch_data_torch_ys[mb_start_idx:mb_end_idx]) ```

with simply `_, loss = mlm_head(bert(batch_data_torch_xs[mb_start_idx:mb_end_idx]), batch_data_torch_ys[mb_start_idx:mb_end_idx])`

replace `scaler.scale(corrected_loss).backward()` with `corrected_loss.backward()`

replace ``` scaler.unscale_(optimizer) scaler.step(optimizer) scaler.update() ``` with `optimizer.step()`

It should work.