• mannycalavera
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    1 year ago

    You absolutely can anonymise data.

    However it’s also true that of you don’t do it correctly users can be identified. Sounds like Netflix didn’t do it properly. I don’t know, do you have a link I could look at?

    • Primarily0617@kbin.social
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      anonymising data is a treadmill problem

      what might work now won’t hold up to the de-anonymising techniques of a few years from now

      so no, you can’t really

      • mannycalavera
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing. So instead of Max Reboo has purchased a subscription to jugs and hooters it’s user 12345678901234576 has purchased jugs and hooters. How can a future treadmill de-anonymise this? For sure if the storage is done badly then you can track back to a particular user.

        Also, once again, can you link to the netflix issue you quoted above please. Thanks.

        • Primarily0617@kbin.social
          link
          fedilink
          arrow-up
          3
          ·
          edit-2
          1 year ago

          Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing

          which is more or less exactly what netflix did -> the whole thing’s not that hard to find on google

          but you need something to distinguish users at least a bit or the data’s equivalent to sales figures

          you combine that “not-quite-pii” with other independent data sources that have similar “not-quite-pii” and build a complete picture

          the treadmill effect comes from active research in this exact area trying to de-anonymise data sets finding new techniques to get around old ones