• Primarily0617@kbin.social
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    anonymising data is a treadmill problem

    what might work now won’t hold up to the de-anonymising techniques of a few years from now

    so no, you can’t really

    • mannycalavera
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing. So instead of Max Reboo has purchased a subscription to jugs and hooters it’s user 12345678901234576 has purchased jugs and hooters. How can a future treadmill de-anonymise this? For sure if the storage is done badly then you can track back to a particular user.

      Also, once again, can you link to the netflix issue you quoted above please. Thanks.

      • Primarily0617@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        1 year ago

        Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing

        which is more or less exactly what netflix did -> the whole thing’s not that hard to find on google

        but you need something to distinguish users at least a bit or the data’s equivalent to sales figures

        you combine that “not-quite-pii” with other independent data sources that have similar “not-quite-pii” and build a complete picture

        the treadmill effect comes from active research in this exact area trying to de-anonymise data sets finding new techniques to get around old ones