“The implication here is that any code committed to a public repository may be accessible forever as long as there is at least one fork of that repository,” the report’s authors claim.

Am I dumb or is this exactly the purpose of forks? I feel like I’m missing something.

  • Morphit
    link
    fedilink
    English
    arrow-up
    7
    ·
    6 months ago

    I think Github keeps all the commits of forks in a single pool. So if someone commits a secret to one fork, that commit could be looked up in any of them, even if the one that was committed to was private/is deleted/no references exist to the commit.

    The big issue is discovery. If no-one has pulled the leaky commit onto a fork, then the only way to access it is to guess the commit hash. Github makes this easier for you:

    What’s more, Ayrey explained, you don’t even need the full identifying hash to access the commit. “If you know the first four characters of the identifier, GitHub will almost auto-complete the rest of the identifier for you,” he said, noting that with just sixty-five thousand possible combinations for those characters, that’s a small enough number to test all the possibilities.

    I think all GitHub should do is prune orphaned commits from the auto-suggestion list. If someone grabbed the complete commit ID then they probably grabbed the content already anyway.

    • Dave@lemmy.nz
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 months ago

      Thanks, I think that explains it a bit more. It is unexpected to me, as a non-git expert, and I’m sure many others.

      • Morphit
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        I guess the funny thing is that each Git commit is internally just a file. Branches and tags are just links to specific commit files and of course commits link to their parents. If a branch gets deleted or jumped back to a previous commit, the orphaned commits are still left in the filesystem. Various Git actions can trigger a garbage collection, but unless you generate huge diffs, they usually stick around for a really long time. Determining if a commit is orphaned is work that Git usually doesn’t bother doing. There’s also a reflog that can let you recover lost commits if you make a mistake.