|
|
|
|
|
by riku_iki
2427 days ago
|
|
gloves are stored in table: token -> vector.
Function tokenizes text and store in another table:
texd_id, token Then you join first and second table. Also, I think typical scenario is to resolve embeddings in your model code or data input pipeline. |
|
Correct. PG has no place in this workload other than being the final store for the model output. And even then, you'd be using a column store like Redshift or Clickhouse. PG not even suitable for the ngram counters because its ingest rates are way too slow to keep up with a fanned out model spitting out millions of ngrams per second in addition to everything else going on in the pipeline.
You -could- probably do it all in PG. But that'd be a silly esoteric challenge exercise and not something anyone would try on a project. I am sure you recognise that.