Hacker News new | ask | show | jobs
by BrendanEich 2812 days ago
> If your users must rely on you obeying a policy, then please just say that.

I did just say that, several times -- but with conditions that do not make it a matter of "policy" only. We agree IP address should be masked for self-funded users. Working on it!

We are very familiar with how any user log can be used as a history, but anonized proofs that can't link to a user id except illegally to IP address are not on the same level of threat as the Blendle, Flattrplus, etc. histories taken in the clear -- never mind Google et al. surveillance. To equate the tech and not make any distinctions does a disservice to us in my view. I'm not sure you did equate, but see next paragraph.

Tech alone is never enough for anything like what we are doing. Addresses matter, if not IP then on blockchain. There are side channels. There will be bugs. IMHO you have to include the social and legal constraints, too. Even a p2p with ZKP solution has some risk due to the blockchain addresses, which need purpose-limited terms under GDPR too.

On road, will get you links to code for randomizing time between ANONIZE sessions as soon as I can.

2 comments

> Tech alone is never enough for anything like what we are doing.

You'd be surprised how far you can get. For example, protocol design exist that provide strong message anonymity: mixnets, DC-nets, and secure multiparty computation (MPC). Tor is great at its goals, but it accepts weaker security for low latency that Brave doesn't need. Tor is also unfortunately blocked in many companies and countries by technology and/or policy. Mixnets are freely available [0]. MPC is sold commercially [1]. (I have no personal or professional connection to either project.)

> On road, will get you links to code for randomizing time between ANONIZE sessions as soon as I can.

Looking forward to it.

Many thanks for the serious engagement. I look forward to recommending Brave to my friends and colleagues in the not-distant future!

[0] <https://katzenpost.mixnetworks.org> [1] <https://www.unboundtech.com/>

Here's a non-tech issue that already causes some idealists to scorn us: most publishers want to be paid in fiat. That means AML/KYC/anti-sanction-list/etc. Pretty unideal but we are not waiting for others to achieve Utopia. We want to help creators get users funding them sooner, and while some take crypto, most (esp. of size) want fiat.
Honestly, I don't know why you need a blockchain in the first place. Just run your own accounting servers, which you already are doing for the Anonize ballots. It is certainly possible to take in money from identified (i.e. non-anonymous) users and pay out money to identified publishers without being able to determine which users are responsible for which payments to the publishers. However, I also don't see any problem with using Ethereum for transferring money.
Oh, that q ("why you need a blockchain") is easy. Fiat only would require us to be an MSB or MTA, heavy licensing lift and no ability to grant users for free from the user growth pool we precreated before the BAT sale. The user growth pool is the number one reason in my view.

Also we like macro-auditability from funding wallet to omnibus settlement, to show we took the %ages we promised.

BTW I'm aware of mixnets, talking to Harry Halpin and others, but also cautious about new tech. Will look at your refs, thx.
Code links:

JS implementation in Muon-based https://github.com/brave/browser-laptop product:

1. https://github.com/brave-intl/bat-client/blob/master/index.j...

2. https://github.com/brave-intl/bat-client/blob/master/index.j...

3. https://github.com/brave-intl/bat-client/blob/master/index.j...

From Marshall Rose: "the first deals with the delay in before asking for ballots (after a contribution) and the second deals with each delay between submitting each ballot."

New "brave-core" (chromium refork to get front end but w/o Google accounts/sync), the new implementation; narration by Serg Zhukovsky:

"""

here we make a first call https://github.com/brave-intl/bat-native-ledger/blob/reconci...

which calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...

after the random wait it calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...

and in it’s callback it goes all over again if there are still votes to send

if you are interested in the function that does the randomization, it’s here: https://github.com/brave-intl/bat-native-ledger/blob/reconci... """

The C++ is all new code, not yet released -- bug reports welcome! Thanks.

Thanks! The protection of the delay-based ballot-mixing looks somewhat weak.

I see that the delay from one ballot batch to the next is set to a uniformly-random time between 10 and 60 seconds. I also see that the ballot batch size is 10.

Let's assume that you have fixed the problem of the ballots being submitted in the clear and that they are instead send through a proxy on different TCP connections (or through Tor on different circuits). Let's also say that the news reports are correct and Brave has 4 million active users, and moreover that all have enabled payments (this is generous). Furthermore, assume that (1) there is no reason payments would start being submitted on any particular day or time, and (2) say, 20 publishers are paid 20 tokens each on average. Then the average number of users uploading ballots in any given minute - let's call them "neighbors" - is roughly (20 ballots/user / 10 ballots/batch)(20 publishers/month)(4e6 users)/(30 days/month)/(24 hours/day)/(60 minute/hour) = 3704 neighbors. That's not a very big anonymity set, but it's not nothing. Early adopters got screwed, though.

However, it gets worse. Based on BatClient::prepareVoteBatch() (bat_client.cc), it looks like each batch has ballots for a single publisher (if I'm wrong, privacy erodes even further). How many of those 3704 simultaneous uses are uploading to the exact same domain? I don't know how to guess (to be conservative, we should assume none). Moreover, unless the last batch happens to be a multiple of 10, that one will be unique in being a size less than 10. Both of these things make it easy in many cases to determine when you are finished uploading ballots for one publisher and are moving onto the next. If many of your neighbors have split a publisher's votes across multiple batches, then it seems unlikely that they will move to the next publisher at the same time, making your ballot submissions more linkable across publishers, exactly the problem we were worried about.

In addition, the time of each batch isn't independent because the delay is applied after the last one. For example, if one batch appears is sent quickly by chance, then the next one is more likely to be sent at a time close to the initial batch. Thus, to link two batches, you only need to consider the batches that end in the 10-60 seconds preceding the start of the second one. How many of your 3704 neighbors ended a batch then? The longer a batch upload takes, (due to latency & bandwidth), the less likely that the batch of any other user has ended in that short time frame.

There is also the issue of semantic linking. If I'm a relatively rare group, the sites I visit are likely to be linkable to each other as distinct from those of my neighbors. For example, suppose I speak Catalán and am involved in regional politics, or that I am a teenage boy in Iceland. How many of my sites are linkable based on those characteristics? And this linking can happen across "gaps" in the ballot-upload stream, where there is uncertainly about how the uploads are linked together, allowing you to get the inference "back on track".

Fortunately, the timing stuff seems fixable. Hiding the IP and separating ballots across connections is an essential first step, of course. but to handle timing issues after that, simply do the following: (1) submit all ballots individually; (2) schedule each ballot upload independently of the other ballots and at a uniformly-random time over, say, the next week or two, and if the user isn't online, reschedule.

The semantic issues with small populations seems harder. I think Brave should prevent itself from learning about votes for publishers unless enough people vote for the publisher. I don't see how to do this without a more powerful cryptographic protocol (it is absolutely doable, though, using MPC for example).

Thanks. Marshall Rose said in reply "That is a very good analysis about the traffic pattern. You are right about the batching of ballots for a similar publisher. That is an optimization to reduce the total voting period."

We will work on improving the system. Please mail me first at brave dot com if you want to correspond further. Thanks again.