Discussion:
[whispersystems]
Chris Dew
2016-03-15 09:53:48 UTC
Permalink
This would be a great problem to really solve well. If anyone develops
any insight into a practical
privacy preserving mechanism for achieving this type of contact
discovery, or notices something
we’ve missed, please let us know here, on twitter
<https://twitter.com/whispersystems>, or on our development mailing list
<https://lists.riseup.net/www/subscribe/whispersystems>.
– Moxie Marlinspike <https://twitter.com/moxie>, 03 January 2014
This idea is probably full of holes, but there might be something useful
here...


Could you use layered system of bloom filters of shards of the hash(telno)
space?

(Figures are made up, but are roughly the right order of magnitude for
millions of service users.)

The filters could be made smaller, at the expense of having more of them.

Layer 0 is a 1MB bloom filter with a 10% error rate, for *all* hashes of
telnos
Layer 1 are 16 1MB bloom filters with 1% error rates, sharded by the last
nibble of the telno's hash
Layer 2 are 256 1MB bloom filters with 0.1% error rates, sharded by the
last byte of the telno's hash
Layer 3 are 4096 1MB bloom filters with 0.01% error rates, sharded by the
last byte of the telno's hash
...


A newly installed uses the following algorithm to determine which of its
contacts are already users of the service, without disclosing the phone's
contact list to the service:

The app makes a copy of the list of contacts' telephone numbers (E164
format).

Request the L0 bloom filter.
Using L0, 90% of non-member telnos are removed.

Request only the required L1 shards, for the remaining telnos.
Using the necessary L1 shards, more non-member telnos are removed (99% are
gone now).

Request only the required L2 shards, for the remaining telnos..
Using the necessary L2 shards, more non-member telnos are removed (99.9%
are gone now).

Repeat until required error rate is reached. i.e. The list only contains
telnos of other service users.


The benefit of this method is that bloom filter shards are only requested
for numbers which are *probably* service users, thus disclosing much less
information to the service provider.

90% of contacts' numbers will never be used to request a Layer 1 bloom
filter, and those that are, will only disclose 4 bits of the hash to the
service provider, narrowing them to 6% of the worlds telnos.

99% of contacts' numbers will never be used to request a Layer 2 bloom
filter, and those that are, will only disclose 8 bits of the hash,
narrowing them to 0.4% of the worlds telnos.

99.9% of contacts' numbers will never be used to request a Layer 3 bloom
filter, and those that are, will only disclose 12 bits of the hash,
narrowing them to 0.02% of the worlds telnos. (240 million telnos, if
there are 10 billion telnos, as per the webpage.)

All the best,

Chris.
Chris Dew
2016-03-15 09:55:46 UTC
Permalink
typo:

Layer 3 are 4096 1MB bloom filters with 0.01% error rates, sharded by the
last *12 bits* of the telno's hash
Post by Chris Dew
This would be a great problem to really solve well. If anyone develops
any insight into a practical
privacy preserving mechanism for achieving this type of contact
discovery, or notices something
we’ve missed, please let us know here, on twitter
<https://twitter.com/whispersystems>, or on our development mailing list
<https://lists.riseup.net/www/subscribe/whispersystems>.
– Moxie Marlinspike <https://twitter.com/moxie>, 03 January 2014
This idea is probably full of holes, but there might be something useful
here...
Could you use layered system of bloom filters of shards of the hash(telno)
space?
(Figures are made up, but are roughly the right order of magnitude for
millions of service users.)
The filters could be made smaller, at the expense of having more of them.
Layer 0 is a 1MB bloom filter with a 10% error rate, for *all* hashes of
telnos
Layer 1 are 16 1MB bloom filters with 1% error rates, sharded by the last
nibble of the telno's hash
Layer 2 are 256 1MB bloom filters with 0.1% error rates, sharded by the
last byte of the telno's hash
Layer 3 are 4096 1MB bloom filters with 0.01% error rates, sharded by the
last byte of the telno's hash
...
A newly installed uses the following algorithm to determine which of its
contacts are already users of the service, without disclosing the phone's
The app makes a copy of the list of contacts' telephone numbers (E164
format).
Request the L0 bloom filter.
Using L0, 90% of non-member telnos are removed.
Request only the required L1 shards, for the remaining telnos.
Using the necessary L1 shards, more non-member telnos are removed (99% are
gone now).
Request only the required L2 shards, for the remaining telnos..
Using the necessary L2 shards, more non-member telnos are removed (99.9%
are gone now).
Repeat until required error rate is reached. i.e. The list only contains
telnos of other service users.
The benefit of this method is that bloom filter shards are only requested
for numbers which are *probably* service users, thus disclosing much less
information to the service provider.
90% of contacts' numbers will never be used to request a Layer 1 bloom
filter, and those that are, will only disclose 4 bits of the hash to the
service provider, narrowing them to 6% of the worlds telnos.
99% of contacts' numbers will never be used to request a Layer 2 bloom
filter, and those that are, will only disclose 8 bits of the hash,
narrowing them to 0.4% of the worlds telnos.
99.9% of contacts' numbers will never be used to request a Layer 3 bloom
filter, and those that are, will only disclose 12 bits of the hash,
narrowing them to 0.02% of the worlds telnos. (240 million telnos, if
there are 10 billion telnos, as per the webpage.)
All the best,
Chris.
Loading...