• mannycalavera
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      1 year ago

      You absolutely can anonymise data.

      However it’s also true that of you don’t do it correctly users can be identified. Sounds like Netflix didn’t do it properly. I don’t know, do you have a link I could look at?

      • Primarily0617@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        1 year ago

        anonymising data is a treadmill problem

        what might work now won’t hold up to the de-anonymising techniques of a few years from now

        so no, you can’t really

        • mannycalavera
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing. So instead of Max Reboo has purchased a subscription to jugs and hooters it’s user 12345678901234576 has purchased jugs and hooters. How can a future treadmill de-anonymise this? For sure if the storage is done badly then you can track back to a particular user.

          Also, once again, can you link to the netflix issue you quoted above please. Thanks.

          • Primarily0617@kbin.social
            link
            fedilink
            arrow-up
            3
            ·
            edit-2
            1 year ago

            Create anonymous UUID, store interactions against this in a separate table, ensure PII is removed prior to storing

            which is more or less exactly what netflix did -> the whole thing’s not that hard to find on google

            but you need something to distinguish users at least a bit or the data’s equivalent to sales figures

            you combine that “not-quite-pii” with other independent data sources that have similar “not-quite-pii” and build a complete picture

            the treadmill effect comes from active research in this exact area trying to de-anonymise data sets finding new techniques to get around old ones