| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by njkumarr 577 days ago

Thank you for taking the time to read my article!

For your 2nd point, to clarify I actually generate 300 new tokens on top of that initial prompt, not just using the short prompt, so with precomputation of the prompt + token generation it should come out to about 306 tokens.

For your 1st and 3rd point you are definitely correct, looking back, I should've focused probably on using the torch profiler to track what point my CPU overhead started to decrease in order to assess compute-bound regions in my workflow better, rather than napkin math on A100 specs.