|
|
|
|
|
by emacs28
1222 days ago
|
|
One good approach could be to base the architecture on the TPU v1 from [1]. There are also open-source accelerators you could get inspiration from, for example [2][3]. If you want to do less work/not hand code the RTL yourself then you could look into methods for automatically mapping OpenCL to an FPGA accelerator architecture (or a service like [3] provides pre-designed architectures for multiple FPGAs). [1] https://arxiv.org/abs/1704.04760 [2] https://github.com/jofrfu/tinyTPU [3] https://github.com/tensil-ai/tensil |
|