Hacker News new | ask | show | jobs
by lmeyerov 763 days ago
We mostly use via pygraphistry, and the demo folders have a bunch of examples close to what we do in the wild: https://github.com/graphistry/pygraphistry

Ex:

``` import graphistry

graphistry.nodes(alerts_df).umap().plot()

```

That's smart library sugar for:

```

g = graphistry.nodes(alerts_df)

g2 = g.featurize(*cfg) # print('encoded', g._node_features.shape)

g3 = g2.umap() # print('similarity graph', g._nodes.shape, g._edges.shape)

url = g3.plot(render=False)

print(f'<iframe src={url}/>')

```

If automatic cpu/gpu feature engineering happens across heterogeneous dataframe columns, that's via pygraphistry's automation calls to our lower-level library cu_cat: https://github.com/graphistry/cu-cat

We've been meaning to write about cu_cat with the Nvidia RAPIDS team, it's a cool GPU fork of dirty cat. We see anywhere from 2-100X speedups on cpu -> gpu.

It already has sentence_transformers built in. Due to our work with louie.ai <> various vector DBs, we're looking at revisiting how to make it even easier to plug in outside embeddings. Would be curious if any patterns would be useful there. Prior to this thread, we weren't even thinking folks would want images built-in as we find that so context-dependent...