Hyper-threading considerd harmful?
Results 1 to 7 of 7

Thread: Hyper-threading considerd harmful?

  1. #1
    Just Another Geek
    Join Date
    Jul 2002
    Location
    Rotterdam, Netherlands
    Posts
    3,401

    Hyper-threading considered harmful?

    At first I thought this effected FreeBSD only. Just another update.. A bit soon after 5.4-release, but still..

    But the more I read, what's really going on, the more I came to the conclusion that this could have a really big impact. And not just on FreeBSD.. A lot of OSs could be at risk here.. Even Intel was made aware of the problem..

    The article has two major issues I wasn't aware of. The first is the concept of covert channels by using timing on cache hits/misses (very, very cool idea in and of itself). The other is using this to get enough bits from a 512bit RSA key to make brute-forcing the remaining viable.

    Here's the page. Definitely read his paper: Cache missing for fun and profit.
    Oliver's Law:
    Experience is something you don't get until just after you need it.

  2. #2
    Senior Member
    Join Date
    Oct 2001
    Posts
    786
    I came across this not too long ago, and didn't have time to read it. Now that you brought it up, I'm taking the time to read it and I find it quite interesting.

    I'm actually in the process of learning about things like the AX/BX/CX/DX registers in x86 arch and haven't quite made it to how an OS manages the different caches, but I have gone far enough to know a little bit about the data bus and some of the even/odd address issues of the memory subsystem including waitstates and buffer delays.

    So seeing some of this stuff come together in the paper is really cool. Fortunately (perhaps) some web/sql servers are using Dual Xeon systems with HT, tossing 4 cores (2 phys 2 virt) for the execution to happen on and hopefully dissrupt the ability of getting information specific to a process. On my own dual CPU machine Win2K has a habit of splitting a process across seperate cores for some odd reason, which sucks for performance, but maybe this shortcoming can be sold as a feature? (j/k)


    Cheers and good read, though I'm not sure of how many people would really understand what he is working so hard to explain in those 12 pages

  3. #3
    Leftie Linux Lover the_JinX's Avatar
    Join Date
    Nov 2001
    Location
    Beverwijk Netherlands
    Posts
    2,534
    There's a thread about this on the linux kernel mailing list too..

    http://kerneltrap.org/node/5120 (condensed)
    ASCII stupid question, get a stupid ANSI.
    When in Russia, pet a PETSCII.

    Get your ass over to SLAYRadio the best station for C64 Remixes !

  4. #4
    Senior since the 3 dot era
    Join Date
    Nov 2001
    Posts
    1,542
    Originally posted here by Tim_axe
    I came across this not too long ago, and didn't have time to read it. Now that you brought it up, I'm taking the time to read it and I find it quite interesting.

    I'm actually in the process of learning about things like the AX/BX/CX/DX registers in x86 arch and haven't quite made it to how an OS manages the different caches, but I have gone far enough to know a little bit about the data bus and some of the even/odd address issues of the memory subsystem including waitstates and buffer delays.

    So seeing some of this stuff come together in the paper is really cool. Fortunately (perhaps) some web/sql servers are using Dual Xeon systems with HT, tossing 4 cores (2 phys 2 virt) for the execution to happen on and hopefully dissrupt the ability of getting information specific to a process. On my own dual CPU machine Win2K has a habit of splitting a process across seperate cores for some odd reason, which sucks for performance, but maybe this shortcoming can be sold as a feature? (j/k)


    Cheers and good read, though I'm not sure of how many people would really understand what he is working so hard to explain in those 12 pages
    The cool thing is that the same concept works for different contexts (AES, HT,...), the particular problem spoken of in the paper is therefor not unique in his approach but in use and that makes it so interesting. Another thing to watch for... cache timing...

  5. #5
    Senior Member
    Join Date
    Oct 2001
    Posts
    786
    Yeah, this whole cache timing (the 2nd process checking the cache to see which rows were dropped by the 1st process) is interesting stuff. We're probably moving away from shared cache architectures though, due to the MHz barrier NetBurst ran into, so this issue and code came a bit late. It also takes a bit more research and time to make the results useful, so while the code could determine cache access of nearly any process, to anyone not dissecting the code you have access patterns for, the results are meaningless.

    I can't imagine a solution of ensuring that all cache operations are the same is good news for performance in the meantime, though. If the 1st process is always trying to fill the entire cache to hide which areas are actually in use, and the 2nd process is always dropping rows to determine cache access patterns of the 1st process, a lot of cycles are going to be wasted just waiting for data to be retrieved from system RAM.


    BTW, I have a couple of questions if you have the time. (I'll probably figure out #2 through some more reading later this week, but #1 is just some curiousity about the paper):

    Does any one have an idea how much longer operations take to complete doing something similar to what was explained in the paper? Or is it quick enough that we won't usually notice?

    Is the 2nd process able to actually access the information stored by the 1st process on the cache? Or does the OS prevent this from happening? (I think it is prevented)

  6. #6
    Regal Making Handler
    Join Date
    Jun 2002
    Posts
    1,668
    I may be way out of my depth here, but the clock speed is the clock speed. It's only the word size that equates to a benifit in processing time for a given piece of data?????????????

    So are we not looking at the re-invention of the wheel + a few bits extra??
    What happens if a big asteroid hits the Earth? Judging from realistic simulations involving a sledge hammer and a common laboratory frog, we can assume it will be pretty bad. - Dave Barry

  7. #7
    Senior Member
    Join Date
    Oct 2001
    Posts
    786
    Well, the first ideas is that the threat the person who discovered this in the P4 w/ HT ( Colin ) points out is that you can know which rows in the cache memory were accessed. (I try to explain it in simpler terms below)

    When a process stores new data in the processor's cache, the processor forces old data out to make room for it. With HyperThreading, the processor can schedule 2 tasks at the same time, and while they are executed independantly they share some resources, such as the processor's cache.

    One process, say our program that decrypts a private key to decrypt messages, can allocate more than 50% of the processor's cache. The other process, our spy program, can do the same. As a result, these two seperate processes could use more cache memory than the processor has onboard. So some data would have to be stored in the much slower L2 cache, or the even slower system RAM, if both wanted to use more than 50% of the avaliable fast cache.

    There is enough difference between these levels of cache that you can measure the delay between what is in the AX/BX/CX/DX registers (which quite simply have no delay and can be accessed in less than a clock cycle) vs the L1 cache (which has very little delay - 1 or 3 clock cycles) vs the L2 cache (which has more delay - 20-30 clock cycles) vs the system RAM which can have a LOT more delays (~45ns for high-end modules + FSB delays - at least ~150 CPU cycles @ with a 2.8GHz CPU).


    Now, suppose our spy program runs first, and is allocates all of the cache memory on the processor. It then access all of it. Since its allocated memory wasn't forced off of the processor, accessing it is a breeze and takes very little time.

    But what happens when we try to decrypt our private key? Our crypto program then allocates some of that memory in the decryption process, and forces some of the data allocated by the spy program into system RAM. Suddenly when our spy process trys to access its data, and notices there are huge delays to access certain rows because it has to be loaded off of the slow system RAM (or possibly L2 cache) into the faster processor cache. Colin says that because you can tell how the cache is being used, there are huge security holes. (Which could be true, but from what I understand -- and am asking above -- is that without some major reverse-engineering you won't know what data is actually stored in the cache, just that some data was forced out.)


    The second thing to understand is this has little to do with processing. It has to do with how long it took to read data from the system cache to determine where it came from (L1 cache vs L2 cache vs System RAM), and what it was replaced with. Here the measurements are fractions of a nanosecond (less than 1 billionth of a second, maybe 35 ten-billionths of a second or less)

    Re-read #1, and imagine if you knew how the program allocated cache memory and you knew that certain inputs caused other cache allocation patterns that you could measure. Most people couldn't do this because it takes a lot of reverse-engineering (and good documentation) and more low-level expirence with the x86 architecture than most people would want to comit themselves to. But in the 8 months he worked hard on this, he managed to show an example that appears valid. I sure wouldn't want to comit myself to something this long (and I can't afford to do so either), but it goes to show that there is quite a barrier to most people putting this all together to show it off.



    Thirdly, the example of the side-channel communications doesn't appear paticularly practical to myself. It was an interesting exercise he thought out there, but if you wanted to use that to communicate effective and concious messages, I'd imagine there are better ways to do it. I guess it could be the newest intra-process communications for the ultimately paranoid...interesting yes. But practical, I don't think so...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •