• jet@hackertalks.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 month ago

    I’m afraid as long as you have shared architecture you will always have side channel data leaks. The only true mitigation is dedicated resources per compute item. So dedicated cores, dedicated cache etc

    • tburkhol@lemmy.world
      link
      fedilink
      arrow-up
      11
      ·
      1 month ago

      CPUs have so many cores these days, that seems like a perfectly reasonable option. Declare a process ‘security sensitive,’ give it it’s own core & memory, then wipe it when done.

      • jet@hackertalks.com
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 month ago

        I wish it were that easy, there’s a lot of shared architecture in CPU design. So maybe there’s cache lines that are shared, those have to be disabled.

        Architecturally, maybe memory tagging for cash lines that in addition to looking at the TLB and physical addresses also looks at memory spaces. So if you’re addressing something that’s in the cache Even for another complete processor, you have to take the full hit going out to main memory.

        But even then it’s not perfect, because if you’re invalidating the cache of another core there is going to be some memory penalty, probably infotesimal compared to going to main memory, but it might be measurable. I’m almost certain it would be measurable. So still a side channel attack

        One mitigation that does come to mind, is running each program in a virtual machine, that way it’s guaranteed to have completely different physical address space. This is really heavy-handed, and I have seen some papers about the side channel attacks getting leaked information from co guest VMs in AWS. But it certainly reduces the risk surface

      • wewbull
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 month ago

        The trouble is that “core” is just that. The heart of the processor. There’s a lot of shared state in the caches and the TLBs which is all common to multiple cores.

      • ms.lane@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 month ago

        The only way to do that is to completely disable Out-of-order execution to begin with and disable any shared caches, which would completely neuter modern CPUs. Not a little bit, that’s going to be around ~30% of the prior performance - not a 30% loss, a 70% loss…


        From ChatGPT- (query: How much performance would a modern Zen 5 or Intel Alder Lake CPU lose if you completely stripped out/disabled SMT, Out of Order Execution and shared caches - operating in-order and only using dedicated (non-shared) caches?)

        Stripping out or disabling key performance-enhancing features like Simultaneous Multithreading (SMT), Out-of-Order Execution (OoOE), and shared caches from a modern CPU based on architectures like AMD’s Zen 5 or Intel’s Alder Lake would result in a significant performance loss. Here’s an overview of the potential impact from disabling each feature:

        1. Simultaneous Multithreading (SMT)

          Impact: SMT allows a single core to execute multiple threads simultaneously, improving CPU throughput, especially in multi-threaded applications. Disabling SMT would reduce the ability to handle multiple threads per core, decreasing performance for multi-threaded workloads. Expected Loss: Performance drop can be around 20-30% in workloads like video encoding, rendering, and heavily threaded applications. However, single-threaded performance would remain relatively unaffected.

        2. Out-of-Order Execution (OoOE)

          Impact: OoOE allows the CPU to execute instructions as resources become available, rather than in strict program order, maximizing utilization of execution units. Disabling OoOE forces the CPU to operate in-order, meaning that it would stall frequently when waiting for data dependencies or slower operations, like memory access. Expected Loss: This could lead to performance drops of 50% or more in general-purpose workloads because modern software is optimized for OoOE processors. Tasks like complex branching, memory latency hiding, and speculative execution would suffer greatly.

        3. Shared Caches (L2, L3)

          Impact: Shared caches (particularly L3 caches) help reduce memory latency by sharing frequently accessed data among multiple cores. Disabling shared caches would increase memory access latency, causing more frequent trips to slower main memory. Expected Loss: Performance could drop by 15-30% depending on the workload, especially for applications that benefit from high cache locality, such as database operations, scientific simulations, and gaming.

        4. Operating In-Order Only with Dedicated Caches

          Overall Impact: Without OoOE and SMT, and with only in-order execution and dedicated caches, the CPU would be much less efficient at handling multiple tasks and hiding latency. Modern CPUs rely heavily on OoOE to keep execution units busy while waiting for slow memory operations, so forcing in-order execution would significantly stall the CPU. Expected Loss: Depending on the workload, the overall performance degradation could be upwards of 70-80%. Some specialized applications that rely on high parallelism and efficient cache usage might perform even worse.

        Summary of Overall Performance Impact:

        • Single-threaded tasks: May see performance drop by 50-70% depending on reliance on OoOE and cache efficiency.

        • Multi-threaded tasks: Could experience a combined drop of 70-80%, as the lack of SMT, OoOE, and shared caches compound the inefficiencies.

        This hypothetical CPU configuration would essentially mimic designs seen in early microprocessors or microcontrollers, sacrificing the massive parallelism, latency hiding, and overall efficiency that modern architectures provide. The performance would be more in line with processors from a couple of decades ago, despite the higher clock speeds and core counts.


        Case in point, it’s not feasible, if you’re looking for that in your own computer, you can do it already. I doubt anyone will follow you though.