|
|
|
|
|
by cgarciae
2264 days ago
|
|
I think the ML community really needs a better language than Python but not because of the ML part, that works really good, its because of the Data Engineering part (which is 80-90% of most projects) where python really struggles for being slow and not having true parallelism (multiprocessing is suboptimal). That said I love Python as a language, but if it doesn't fix its issues, on the (very) long run its inevitable the data science community will move to a better solution. Python 4 should focus 100% of JIT compilation. |
|
I really try to take advantage of the database to avoid ever having to munge very large CSVs in pandas. So like 80-90% of my work is done in query languages in a database, the remaining 10-20% is in Python (or sometimes R) once my data is cooked down to a small enough size to easily fit in local RAM. If the data is still too big, I will just sample it.