Hacker News new | ask | show | jobs
by jackcook 1000 days ago
Yes, you're right, I should have mentioned it in the post, but I used pure greedy sampling for the GPT-2 outputs since I couldn't do anything but that for the Apple model. So temperature was set to zero, and there was no repetition penalty.