|
|
|
|
|
by teaearlgraycold
1022 days ago
|
|
So you could place each project on a 2D map with X and Y being obvious dimensions (size and purpose) as another commenter mentioned. Or you could have something like an LLM's embedding model that converts a description of a github project into a series of numbers. Each number will be from 0 to 1. Hypothetically, the first number could represent the commercial-ness of the project. The second number could be how closely related the project is to web development, the third a relation to AI, etc. Embeddings for use with LLMs often have between 100 and 10,000 dimensions. You could then use a dimensionality reduction technique like t-SNE (https://www.datacamp.com/tutorial/introduction-t-sne) that will take these hundreds or thousand of dimensions and squish them down to 2 or 3. That allows humans to explore the space intuitively. This way you get related projects close together. PyTorch and SciKit Learn will be near each other because they are both AI/ML related. And then Pandas will be in a different but nearby cluster. Ruby on Rails would be farther away. In the end you get a 2D map where similar projects are grouped. This makes it much more like GeoGuessr. GeoGuessr only works because similar parts of the world tend to be near each other. The world doesn't have a square kilometer of desert in the middle of a rainforest. Locality is extremely relevant and it makes GueGuessr fun and intuitive. Just throwing a bunch of projects in a list here ruins the game because you can at most have one dimension along with projects are associated. But, given the list of projects is small it would be easier to simply have a person curate a map. |
|