|
|
|
|
|
by e-kayrakli
889 days ago
|
|
Re Intel support: That's definitely in our plans. However, there are also many other areas that we are actively working on to add more features, fix bugs and improve performance. When prioritizing we typically make decisions based on what our current and potential users might need in the language. Frankly, we are not seeing a big push for Intel GPU support so far. So, currently it is not near the top of our priorities. If you (or other readers) have any input on that matter where lack of Intel support might be a blocker for testing Chapel and/or its GPU support out, definitely let us know. Re implicit serialization: To clarify; the serialization based on order-dependence is not implicit. The users should use `for` loop if their loop is order-dependent and `foreach` (and `forall`) if their loop is order-independent. In other words, the Chapel compiler doesn't make decisions about order-dependence. In particular for GPU execution: a `for` loop will never turn into a GPU kernel. There are however some cases where a `foreach` does not turn into a kernel. You may be referring to those cases, but that's not related to order-dependence. Some Chapel features cannot execute on GPU. If your `foreach` loop's body uses any of those features then it will not be launched as a kernel even though `foreach` signals order-independence. Now, a subset of such features that makes an order-independent loop GPU-ineligible are there because we haven't gotten a chance to properly address them, yet. Another subset of such features will remain thwarters for a longer time and maybe forever. For example, your `foreach` loop could be calling an extern host function. |
|
Most AI in the press is done on expensive NVIDIA’s. Many papers have techniques with lower cost or higher effectiveness. Their algorithms are described in high-level form with little or limited implementation. Many in OSS and non-DL are using smaller models that can run on diverse hardware, if one has expertise to program it. It would be helpful to have a language that maps high-level techniques in papers to diverse hardware for use in training or review.
Can Chapel currently implement the concepts in papers on NN’s and LLM’s to run on multicore, clusters, and GPU’s? If so, can it implement hybrids where specific functions are GPU optimized but the overall design is split across machines? If not, what is missing for using Chapel for rapid prototyping of AI concepts?