|
|
|
|
|
by ivansavz
2540 days ago
|
|
Thx for posting this. Very interesting dataset:
https://bigquery.cloud.google.com/table/patents-public-data:... Do you know by any chance how the `embedding_v1` vectors were generated? The data field description says "Machine-learned vector embedding based on document contents and metadata, where two documents that have similar technical content have a high dot product score of their embedding vectors." Could this be word2vec, GloVe, or something else like that? Maybe produced from the tf-idf-transformed sum of the word tokens in the title+abstract of each patent? |
|