|
|
|
|
|
by dwagnerkc
481 days ago
|
|
If you want to try it out. Can lazily load from HF and apply filtering this way. df = (
pl.scan_parquet('hf://datasets/minimaxir/mtg-embeddings/mtg_embeddings.parquet')
.filter(
pl.col("type").str.contains("Sorcery"),
pl.col("manaCost").str.contains("B"),
)
.collect()
)Polars is awesome to use, would highly recommend. Single node it is excellent at saturating CPUs, if you need to distribute the work put it in a Ray Actor with some POLARS_MAX_THREADS applied depending on how much it saturates a single node. |
|