|
|
|
|
|
by kod
3839 days ago
|
|
If I'm reading this correctly, the Kafka topic only had 5 partitions, but they had 10 workers. With the Spark direct stream, kafka partitions are 1:1 with spark partitions, which means at most half of the workers would be doing work without a shuffle. Seems like a pretty basic oversight that should be addressed. |
|