Hacker News new | ask | show | jobs
by throwaway_19sz 40 days ago
Funny story no one will believe, but it’s true. A good friend of mine joined a startup as CTO 10 years ago, high growth phase, maybe 200 devs… In his first week he discovered the company had a microservice for generating new UUIDs. One endpoint with its own dedicated team of 3 engineers …including a database guy (the plot thickens). Other teams were instructed to call this service every time they needed a new ‘safe’ UUID. My pal asked wtf. It turned out this service had its own DB to store every previously issued UUID. Requests were handled as follows: it would generate a UUID, then ‘validate’ it by checking its own database to ensure the newly generated UUID didn’t match any previously generated UUIDs, then insert it, then return it to the client. Peace of mind I guess. The team had its own kanban board and sprints.
15 comments

> One endpoint with its own dedicated team of 3 engineers

> The team had its own kanban board and sprints.

My early jobs were at startups startups with limited resources. Every decision to build something or hire someone was carefully made after much consideration. This story would have looked like fiction to me at the time.

Later in my career I joined a startup like this where every new concern someone could think up turned into a new microservice with new hires to form a new team. It didn't matter how small it was, everything was a reason to hire new people and form a new team. I sat in meetings where the express goal of the quarter was communicated as growing the engineering team.

It was as weird time. We had this same situation where there were 3-4 person teams who had their own sprints and planning sessions where they would come up with more ways to make work for themselves. Some of them moved so slow that they could spend entire sprints doing tiny changes. Others were working on the most over-engineered solutions you'd ever seen for trivial problems.

There was one meeting where I suggested we re-assign some people on a stable project to work on something that we needed urgently, but I got shut down. That would have removed another excuse to hire more people, which would have conflicted with someone's KPIs to grow the engineering team to a specific number

> My early jobs were at startups startups with limited resources. Every decision to build something or hire someone was carefully made after much consideration. This story would have looked like fiction to me at the time.

This was pre-2015

> Later in my career I joined a startup like this where every new concern someone could think up turned into a new microservice with new hires to form a new team. It didn't matter how small it was, everything was a reason to hire new people and form a new team. I sat in meetings where the express goal of the quarter was communicated as growing the engineering team.

This was post-2015

---

Am I right?

You're describing exactly what I've tried to express in various comments. There was a point in the latter half of the 2010s when it became genuinely hard to find tech work where you were building useful stuff. Startups become increasingly absurd and the focuses of their engineering teams even more so.

In 2019 I was working for a company who were so desperate to hire new engineers at one point they decided to just start offering jobs to candidates which failed interviews. It was absolutely insane.

Ah, the heady days when we shipped a new AWS service with a team of 40, and when I came into work the next day we had 120 people and 80 of them were just inventing work out of whole cloth…
I need to hear more stories, I'm begging
That timeline correlates with new cell phones having fewer features than the previous generations, and was when Intel started releasing generation after generation of processor on the same fabrication process, with minimal improvement between generations.

Peak technology was 2015.

> someone's KPIs to grow the engineering team to a specific number

Sigh!

Specific numbers!

I believe a more common specific number is the yearly EBITDA or ARR (or some other acronyms in this alley I care zero about to memorize) nowadays, for investor's sake. Like in our company. Since we were acquired - and some time before - the only talk in company meetings are EBITDA, ARR, compared to a number dreamed up by someone and to be reached in 5 years time. Specific financial results in specific timeframe. Our goals are specific numbers being above today's numbers by a chosen margin. The company talk are marketing campaigns and reach, campaign efficiency measurements, pricing strategies, subscription centric licensing, sales strategies, churn, and other slang around customer bullying I also do not care about, also organizational streamlining - what a loaded word! -, bla bla bla, all for the specific sacred number put up on the pedestal.

What we have zero talk about? Functionality, engineering.

I seriously do not understand these people. Why are they fiddling around with selling software in a niche sensitive to global economic fluctuations insted of selling ... I don't know. Shoes? Or better yet sugary water ... no, better is vitamin water ... no, the trendiest is protein water. That is something that needs no balanced functionality and engineering that is laborous so it is resource intensive to achieve. And is in the way of reaching the sacred number put up there. Engineers are in the way towards our goals. We are pulling back the cart! We are cost center now!!

I do not stay long.

I had an internship at HP that taught me to never get lost in a large conglomerate. I worked with people who's career was working on a piece of a piece of a design, only ever touching the same tiny aspect of every new project.

I can't imagine getting a job like that at a startup.

At some point someone optimizes the system to a global company-wide incrementing 128 bit counter. Instead of needing a costly database lookup against a growing database the microservice just fetches the current counter, increments it by one and hands out the new value. Easy, fast O(1) operation.

This even allows you to shard the service to provide high availability and distribute the service globally to reduce latency. Just give each instance a dedicated id range it can hand out. I'd suggest reserving some of the high bits to indicate data center id, and a couple more bits for id-generator instance within that dc.

Wait a second, this starts to look familiar ... does Twitter still do that, or did they eventually switch?

Define a random 128 bit key that you will never change. Use that key to encrypt 128 bit integers in sequence using AES-128, each one comes out as a, for all practical purposes, unique unpredictable ID.
> each one comes out as a, for all practical purposes, unique unpredictable ID

I don't have much cryptography experience, but this seems _suuuuper_ suspicious. I think the "for all practical purposes" is doing a lot of lifting here? If it was this easy, surely this is what we'd use, and there wouldn't be UUID v4 to begin with.

The value of uuid is the lack of coordination. “…integers in sequence…” requires quite a bit of coordination if you have more than one computer ;)
Twitter snowflakes haven't changed. Most of the bits go to the timestamp, which I guess is a global incrementing counter as you described
> At some point someone optimizes the system to a global company-wide incrementing 128 bit counter.

Some UUID versions include time, so there's a bit of a counter in that.

What is the arrow of time if not a single global monotonically increasing sequence?
I've seen similar, buried deep within a major SV tech co.

Their process was a bit more complex because the master list of in-use UUIDs was stored in an external CMDB service run by a different department. They got a daily dump of that db, so were able to check that when generating a "provisional" id. Only once it had been properly submitted to the CMDB did it became "confirmed".

They had guardrails in place to prevent "provisional" ids being used in production, and a process for recycling unused "confirmed" ids. Oh, and they did regular audits which were taken very seriously by management.

Last I heard, they were 18 months into a 6 month project to move their local database cache to Zookeeper...

They should upgrade to Zookeeper II: Zookeepier.

https://www.youtube.com/watch?v=_F-RyuDLR4o

I can believe it, and I often wondered "can I win the UUID misfortune lottery" I wonder if this is equally common with Microsoft's flavor aka GUIDs.
GUIDs are UUIDs are effectively the same thing... the issues often come down the the means of generation and storage... where UUID have versions with specific implementation details that aren't always followed, MS has internal implementations that also aren't always followed. Also worth being aware of are COMB, SequencialIDs (MS-SQL) and other serialization approaches as well as how they affect indexes in practice.

Alternatives include sequencial number generator services, or sequence services that may be entirely sequencial, etc, but may lead to out of order inserts in practice.

Also, generally worth considering UUIDv7 assuming your sotrage and indexing use the time portion at the front of the index process.

You would think they could automate the entire process by “creating-ahead” a certain number of UUID values in the DB, storing them in memory to reduce DB latency, and then recording the assignment to the DB once it had been assigned.

And the microservice could easily be crafted to only accept assignment requests from other known endpoints.

We have had a service to add two numbers. What make you think this is not realistic? :-)
I too have witnessed a "add two numbers" service! Turns out you can be too extreme with rules for isolating out business logic..
Same! It had validation on each number before adding them. Poor design, but that's how it worked.
I find this so hard to believe, but I've nearly always worked in small groups/companies. Can you, or any of the commenters above, explain why the reasoning that leads to such a service isn't rejected by, well, common sense? Some super-special requirements?
In the case I mentioned at https://news.ycombinator.com/item?id=48062322, it was because the Infrastructure org had grown out of what had previously been Datacenter Operations.

So they had a team of SWEs who knew the system they were responsible for was absurd, but they weren't able to adequately explain that to the senior management folk who came from that DCOps culture and held asset management & configuration tracking to be paramount. The uniqueness was seen less as an inherent property, and more as a constraint that needed to be enforced.

My team of DevOps-y proto-Platform Engineers struggled with the org's culture in similar ways, so I had a lot of sympathy for the situation they found themselves in and how they were handling it. I believe their Zookeeper-based system was intended to be more of generic lightweight config registry which would eventually have replaced the gigantic SOAP-based CMDB nightmare - basically Consul a year or two before Consul existed.

The reason why they struggled to get it into production was that it would have been so obviously useful that they kept having additional requirements and use cases forced into their "MVP". That sort of scope creep, driven by tech leadership wanting to make their mark on a successful project, is also pretty common in large orgs.

Fortunately, I've neverencountered that. But still, I can see the usefulness of a guaranteed globally unique UUID, at least for certain purposes. However, a service to add numbers baffles me. The operations needed to create, send, receive and check the message are so much more complex than addition...

I must say, I did experience some lousy tech+sales leadership in one company, which was indeed the biggest I ever worked at. A decent product with a well understood scope was completely scrapped and rewritten. Some team spent more than a year on the (waterfall) design of the new system, which was then scrapped too. When I joined, there was an 8 man team for just the message bus for the new new system. Which didn't even work correctly. The whole was flexible, but in nearly every other aspect inferior to the original product. And it needed much heavier hardware.

Sure. In this case, this started as a method with two parameters; each were validated internally before addition.

The validation was long running, as it required checking two other services to confirm both of the numbers were OK.

Because of issues calling those services, instead of two nasty synchronous calls, it turned into calling a microservice asynchronously and using a callback. Then that microservice was owned by the team that owned those two other services.

Don't underestimate the power of Conway's law.

I get the microservice to ensure this. But 3 people dedicated to it? I guarantee you they spent their days trudging dungeons, playing CoD and ping pong.
You need at least 3 for this. People go on vacation, turnover, can’t risk losing that critical institutional knowledge.
I'd believe it.

What I'd find harder to believe is that it wasn't really a table with more information than just "list of assigned UUIDs". I'd be really surprised (pleasantly!) if it was only that. I'd figure most startups would make sure that table links to customer info so that they know which customer has a specific UUID, for easy searching and crossreferencing with the main db

That sort of table can be quite handy when every entity in the business's data stew is identified with a UUID, and there is no way of telling just from looking at an identifier what kind of entity it is. Particularly when the business has disparate databases and/or microservices with their own sets of UUIDs.

In such businesses, inevitably, someone will ask you to run process X for widget 8dbcd950-14c1-4877-a8b0-90c081ce033c, and that particular identifier will actually be an ID of some associated data, not the widget. You can push back and say, "That isn't a widget identifier, can you please look up the widget identifier?" It's better to be able to look that ID up in your ID ⮕ entity type lookup table, and say "the ID you provided is a widget production run ID, which produced a copy of widget a84969be-137a-41ca-97c4-515497184df9. Can you confirm this is the widget you need process X done for?", with a link to the product-facing widget page.

(Also handy for the case where some code was intended to log an ID for one entity, but actually logs the ID for an associated entity with the wrong entity type indicated.)

Stripe handle this interestingly, with a prefix to the ID indicating the type of entity.

https://dev.to/4thzoa/designing-apis-for-humans-object-ids-3...

AWS as well, and I've done this myself, too.

It really helps debugging.

I worked in Dell EMC when it got aquired. Most of the work was duplicated between Dell and EMC so every EMC project was stalled for further decision. In India Bangalore office situation was such that to keep jobs all used to file for patents write patents all day and they even used have patent lawyers visit office and pitch patent ideas. Patent ideas were approved and filed were sometimes using better algorithms in previous patents, like literally adding null checks.
Senior, Staff and Principal UUID Engineer.

UUID Database Admin.

At one of my previous jobs, there was a function `createEntityWithRandomUUID` which would basically do the same thing as a light wrapper around database inserts. If a conflict occurred, it would generate a new ID and try again, up to 5 times I think. No logging to indicate whether any conflict actually ever happened.
No that kind of critical data would be sent to pendo so it could be reported on or shown in a dashboard!
Who has the balls to form that team? Were they disbanded?
I will gladly assume that this team was formed after several collisions with UUID's my assumption is that they had tremendous amount of data and enough revenue to justify all of this at least financially. I would have re-evaluated the UUID version used or if adopting Snowflakes would be better at some point.
Pffft - they didn't need to store the whole UUID, just a hash. Dummies.
They thought of that, but they were still working on hiring a team to maintain the hashing microservice.
Hashing microservice deployment was blocked by random generator microservice stuck in Pending because it needed an UUID from UUID microservice which was blocked by hashing.
"Learned a lot today, love Galactus"
already laughing from parent comment this is well done
one hash is insufficent, they need k-hashes.

i get the joke, but seriously a bloomfilter would be a good idea.

This is the software industry version of "The Onion".
Any chance this company managed cap tables?