|
|
|
|
|
by hofrogs
397 days ago
|
|
This is really cool and actually useful for peeking behind those annoying login walls. What software do you use to store/index/search in so much data? How did you get the data in the first place? Discord isn't exactly known for letting its data be available easily. Have the administrators of the guilds asked you for this? Have you contacted them and made them aware after the fact? |
|
Thanks for your feedback.
For software, I use ScyllaDB and Elasticsearch. It's split across 6 physical nodes (8 including the CDN). Data collection is handled using standard user accounts, accessing only public, discoverable servers. I plan to write a blog post about the technical aspect of how this was done soon.
Admins of these servers weren't contacted, as the content indexed is already publicly accessible, comparable to a forum like this or public subreddit. That said, I understand the sensitivity around data visibility, and I've made it very simple for any user to opt out of indexing at any time. Private or invite-only servers are, of course, completely excluded.