We’d like to present you the .veml [Vector Embedding Markup Language] that has the potential to greatly enhance the editing, use, and sharing of vector embeddings in various applications.
Adoption of VEML brings many benefits, like:
1. Standardization: VEML provides a standardized format for pre-processing and editing vector embeddings.
2. Interoperability: It ensures better interoperability among different applications and systems that utilize vector embeddings.
3. Extensibility: Just like XML, VEML has the potential to be extensible, allowing users to add new tags and attributes to represent additional properties or metadata associated with the vector embeddings.
5. Machine Readability: A well-defined markup language would also be easily parseable by ML, ensuring efficient processing and manipulation of vector embeddings by various software applications.
Which docs? I see just three files in that one repository…. One of which is an example file that doesn’t specify what embedding it is using. Do we have to use some specific embedding that is specified in your docs somewhere?
apologies for that. Let me check with my co-founder. It will be there. Would be great to know your thoughts about our GUI for editing and embeddings + joining and splitting chunks, as well as filtering out punctuations and stop-words with one word. You can have a look on it in the /embedditor repo or in our web embedditor. ai.
My take is you need to do more work on the value proposition.
My first take is that I can compute embeddings with one line of Python using sbeet.net and from there it is an automated process, I have a script that generates embeddings for 80,000 documents that runs every day and I barely think about it.
I think of GUI and I think somebody has to click through 80,000 documents to do this same and to get the same throughout I’d have to raise venture capital and hire an army of people to go click… click… click… That is it takes something easy and scalable and makes it difficult and expensive, It makes me think if the text retrieval experiments that Salton did with documents on IBM cards in the 1960s.
I know there is more to it than that, this simple approach is not so simple when you consider chunking and other choices that could make a big difference but i still think there would be some programming language function that takes a document and gives an embedding but some kind of suite to determine the parameters of that function (on the level of a document collection not individual documents) could be quite useful but I think a lot of people will want something that doesn’t have many knobs to turn.
Adoption of VEML brings many benefits, like:
1. Standardization: VEML provides a standardized format for pre-processing and editing vector embeddings.
2. Interoperability: It ensures better interoperability among different applications and systems that utilize vector embeddings.
3. Extensibility: Just like XML, VEML has the potential to be extensible, allowing users to add new tags and attributes to represent additional properties or metadata associated with the vector embeddings.
5. Machine Readability: A well-defined markup language would also be easily parseable by ML, ensuring efficient processing and manipulation of vector embeddings by various software applications.
We'd love to hear your thoughts.