Hacker News new | ask | show | jobs
by justifier 3282 days ago
though a clever way to be able to ignore the real problem eventually time will force you to revisit from base principles

user_id%numb_server may work early on when user activity and uptake are consistent,

but what happens when user activity becomes more complex: increase in users, some users abandoning the platform, others using it more; and that complexity lacks homogeneous distribution through this only concerned property: 'user id';

what if over time you gain more users but the majority of people who drop the platform have a user_id%numb_servers==2|11|13|17

in this case you would have some servers working hard while others sitting dormant

what is the real distribution of the relation between activity and user_id over time? asymptotic(o)? similar to the prime numbers(i)? a gaussian distribution(ii)? a benford distribution(iii)?

whichever future dada will show to be the best fit, most distributions show a strong trend toward eventual favoring of values

which i think implies, to ensure an even distribution of work across servers, the problem requires something with greater dimensionality than modulo on an immutable value that is defined serially

(o) https://en.wikipedia.org/wiki/Asymptotic_analysis

(i) https://en.wikipedia.org/wiki/Prime_number_theorem

(ii) https://en.wikipedia.org/wiki/Probability_density_function

(iii) https://en.wikipedia.org/wiki/Benford's_law

1 comments

I can think of a few ways to rebalance things on the fly, but I would probably just hash some immutable values for the user, like Id and name, together with a nonce. If a server get's overloaded, slowly move people from it by changing their nonce.
if you are introducing a nonce why use a hash? with a performance reflective mutating nonce on the user id modulo works as is

if you are introducing monitoring on a per user precision why use modulo? with a per user scheduled monitoring moving users based on user ids works as is

maybe i was unclear in the above but i like the gp's simple solution.. especially because i personally have an affection for the modulo operator, but also because.. it only requires an operator that performs in a scale dependent finitely specific number of cycles and works as designed without any monitoring

the above was intended to bring attention to shortcomings and probable failures in an otherwise elegant attempt

the method is flawed but the direction is superb

You could use nonce to determine what server to use. But I didn't want to choose directly, just the ability to chance the output of the hash for whatever reason.

I did not want to use just any non random rebalancing mechanics to avoid advesaries attacking that implementation. With a hash the output is deterministic, but unpredictable.

What are your adversarial concerns?