|
|
|
|
|
by icyfox
1814 days ago
|
|
Makes sense. A lot of my work involves parsing crawled pages, ie. html -> structured dom tree -> some heuristic search or other algorithm to tag or extract elements. So in this case the compute per page is relatively high but I still think the overall latency of having to sync to a remote endpoint is too heavy. One hybrid approach would be some hosted data product that is colocated with your compute cluster. On nonsensitive datasets I'd be happy to upload my initial data upfront if it means I can get linear scalability when I need to run compute. |
|
Whether or not that's possible with the realities of your application is another question of course :).
If you're curious, I'd be happy to provide some compute time (say 100 vCPU-hours) in exchange for feedback. Email me if that's at all interesting.