Let's Chat about Floating Points and AMD Bulldozer CPUs.

Discussion in 'Discussions' started by OmniaNigrum, Aug 27, 2012.

  1. OmniaNigrum

    OmniaNigrum Member

    No need to be unfriendly anyone. Fanboys need not apply. I just want to know the truth.

    Anyone know of FPs used in games that are bigger than 128 bits? (16 Byte, AKA Quad, or Half)

    As far as I know, no games use larger floats than 8 Bytes without specifically coding the game to use them on the GPU rather than the CPU.

    Bulldozer CPUs use a single shared FP scheduler for a pair of Integer cores. Each core can simultaneously handle 16 Byte Floats through the scheduler.

    (I am going to ask Daynab to move the relevant threads to this area so we can talk freely about it without ruining the CE thread.)
     
  2. delta534

    delta534 Member

    First what is being shared is not a scheduler, it is the thing that actually does floating point math, and that unit handels standard floats to doubles to sse instructions. The logic needed to do floating point math is so vastly different than the logic for integer math it needs its own unit.
     
    OmniNegro likes this.
  3. OmniaNigrum

    OmniaNigrum Member

    You are contradicting what I read last night on the link I posted here:
    http://community.gaslampgames.com/t...k-empires-announcement.4869/page-4#post-56236

    Please read that and reply here. I am not calling you a liar. I just want to know the truth. Please help me understand what I am missing from this story.

    *Edit* Here is a quick quote until Daynab gets his Digglebucks and moves the posts we had about this from the CE thread. (It may not happen. It is his call what is moved.)

    This is from:
    http://www.anandtech.com/show/2881

    They specify that one core of the pair can work on FP while the other works on Integers. So how is it possible that the scheduler is a seperate core that does only FP mathematics? If I were being a jerk, I would then imply that this means that a "Pair" of cores that AMD calls "Modules" because because... Actually has three cores. One that can only do FP work, and two doing Integer work. But that would be untrue if I understand this, and inflammatory, so I will not say that. :D

    I also will not say that "My Bulldozer advertised eight cores, but delta634 says there are actually 12 there! Intel sucks!" or anything similar. :D

    Take my jokes as just that. I mean no offense. But I enjoy discussing such things. Not because I want to be "Right", but because I want to know the little details the marketing teams would want me never to know. Thanks in advance for not biting my head off and especially for helping me pull my head from another part of my anatomy.
     
  4. delta534

    delta534 Member

    I could have been wrong about it, I was running off a simplified diagram of the piledriver module but, a few things. Compared to Intels Sandy Bridge and Ivy Bridge, AMD bulldozer,piledriver and now steamroller only have half the floating point throughput. When properly optimized modern floating point work is actually batched together into into a single instruction, things like mmx, sse,see2,avx, whatever, and the data for this instruction tends to be 128 bits long or 4 IEEE standard floats. Recent additions such as the AVX instruction have a data structure that is 256 bit wide, which can store either 4 double or 8 floats. Improperly optimized it is a lot slower and typically is the old x87 instructions.

    While AMD says it can run as four threads, it would more depend on the workload, how the code is structured and how smart the scheduler is. For the typical user, a bulldozer based cpu will act as a quad core cpu. Most word processing, running the OS, internet browsing and other low end computing is mostly integer based. Getting into gaming, video editing, photo editing, ray tracing any other thing floating point heavy it will act more like a dual core processor. This also ignores how the OS will schedule the threads.

    It is very much a semantic issue because there is no real definition of core besides something that can run a thread. If you want to call it a quad core processor fine by me, but I won't.

    As an aside hyperthreading is basically Intel running one thread on the same hardware while another thread waits for it's data to come either from the cache or memory, which can be a long time and why possibly Intel CPUs benefit less from faster memory than AMD CPUs.

    I've gone through a computer architectural class so I actually have an understanding of what is going on and I don't mind talking about it. The bad thing about taking the class is cpu requirements for games are to vague for me now. When it says x86-compatible processor 1.4 ghz or better, for example, I have to ask; based on what cpu. A 1.4 GHZ amd A8 processor is much faster than a 1.6 ghz Intel atom processor.
     
    OmniNegro likes this.
  5. OmniaNigrum

    OmniaNigrum Member

    What program would use the CPU for Floats when there is a GPU on upwards of 90% of systems that is designed for those via OpenCL?

    If I ran a Bitcoin miner, it would tell me right off the bat that *ANY* GPU will outperform any CPU every time. (By no less than a factor of ten or more, at that.)

    When games use Floats, they rarely if ever use anything other than 8 Byte Floats. As I understand things, each core of the Bulldozer can easily handle 128 bit Floats by itself. (For those who do not know, 8 Bytes = 128 bits.)

    So can you tell me any examples of applications that even use bigger Floats? (Benchmarks and Mathematic Miners like Bitcoin Miners do not count. They can use the CPU, but no sane person would use one for that.)

    And presuming there are some applications doing that, why on Earth would they be coded to use the CPU? OpenCL is freeware and open-sourced. Both AMD and Nvidia have it integrated in the drivers.

    Thanks for the replies. Please keep them coming. :D
     
  6. OmniaNigrum

    OmniaNigrum Member

    I was going to post that the Intel Atom is not an x86 CPU at all, but upon looking it up, it turns out that the x86-64 instruction set is an extension, and anything using it can run standard x86 code without hindrances. So ignore this post. :D
     
  7. delta534

    delta534 Member

    It is not about processing bigger floats, the largest is the double which is 8 bytes which is 64 bits(8 bytes * 8 bits in a byte) , it is about processing more floats at once. If a programmer needed more precision or something larger than what a double could hold, they would use a custom data type for the job, typically defined in terms of integers and integer operations.

    Floating point is still useful for the cpu when you have a small amount of data, the work that needs to be done is not very parallel, prepping and getting data to the gpu, there is a mix of floating point and integer work being done, those sort of things.

    There has been some movement to using the gpu for some work. There is a ray tracer that does have a openCL implementation, there is bitcoin mining, on the more illegal side there is password cracking which is more taking advantage vast parallel nature of the GPU than the floating point power, some physics simulations use GPUs.

    Not ignoring the post, anything that can currently run windows is x86.
     
    OmniNegro likes this.
  8. OmniaNigrum

    OmniaNigrum Member

    On that last part, remember this is a very informal forum. You are never required to answer my questions. And I do not fault you for not fully answering every possible question I or another asks. I have yet to answer something in one of your posts above, mostly because I am not sure it was a question or a statement. (The 1.4 A8 or the 1.6 Atom question. Was that a question?)

    Also, because I am a bit crazy, I have to point out that password cracking in and of itself is not illegal. Provided that you are entitled to have the password or the data it unlocks. I have attempted and failed to figure out the password of a Rar archive I made ages ago containing my logins for many sites. And to the best of my knowledge, that can be argued to be illegal in and of itself, but who is the victim? (There are funny laws about such things throughout the world. I live in America, and firmly believe that if I never attempt to harm anyone, there will never be a motive for any authority to crack down on me.)

    Worthy reads for the mathematically inclined. (I am not.)
    https://en.wikipedia.org/wiki/Floating_point
    https://en.wikipedia.org/wiki/Half_precision
    https://en.wikipedia.org/wiki/Double_precision
    https://en.wikipedia.org/wiki/Quad_precision

    If I understand what the article I linked into the other thread and then in post #3 in this thread means, all of these can be done on a single core of the Bulldozer CPU. (And if I understand, it would have no effect on the other core either.) So I am missing something still.
     
  9. delta534

    delta534 Member

    Now lets talk about SSE instructions with a wiki link. http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

    Basically, most floating point math is done with these instructions and a fpu scheduler can easily convert old x87 instructions to them if needed. Now another thing, most modern cpu support what is called out of order execution. What this means is the schedulers can reorder a small subset of instruction if it means that they will run faster but have the same desired outcome. For the fpu scheduler this means if it can combine some old x87 instructions into one single SSE instruction they will. if they cannot they will not. If the FPU scheduler sees two independent instructions and it has two free 128 bit units to run an SSE instruction on, the smart thing to do is schedule one instruction on one, and the other instruction on the other. If they one or both only take a single clock cycle* then everything is fine, but if they take up more than one clock then we get into a stall, think anything involving reading memory. Once they hit a stall then there is no choice but to wait which backs everything up.

    Now assuming the decoder part of the Bulldozer architecture has a way of telling the FPU scheduler that one thread from what it has seen has no floating point math and the other thread can fully use the FPU and whatnot, if the thread that was all integer stuff starts to have floating point, that thread needs to wait until a unit is free which could be a while in CPU time because the wait could be due to a stall. Now with a workload that has say three floating point heavy threads and one integer heavy thread. On an AMD branded quad core, this means compared to an intel quad core two out of those three threads will be running at about half speed since the Intel processor can run two independent SSE instructions per core.

    I agree with the theory behind the Bulldozer architecture it just with out major work it does no quite work in the real world yet.

    *this is just an estimate, actual clock time needed may vary.
     
    OmniNegro likes this.
  10. OmniaNigrum

    OmniaNigrum Member

    I think that your example falls short of demonstrating a lack in the AMD 'Dozer chipset. The biggest flaw I see is that *NONE* of the FP instructions used by SSE are larger than 8 bytes. And as I already demonstrated, the FP scheduler can handle 8 byte FP instructions on each Integer core.

    The second is that AMD licensed the rights to use SSE on every CPU they have made since the Athlon XP series in 2001.

    So am I misunderstanding or missing some detail that would explain how the 'Dozer line is worse than the current Intel lines?

    Believe me that I am not a fanboy of either side. I have usually used AMD CPUs simply because they are much cheaper per unit of performance. I am not seeking an argument.

    Can you tell me anything besides a FP benchmark that shows a weakness?

    I am thinking that my next PC will probably feature a Bulldozer line CPU. And it will probably cost about $200 for an 8 core 3.8 Ghz CPU. Getting the equivalent in an Intel build doubles the cost of the CPU. Admittedly there are performance gains. But no more than 10-15%. And the cost of Intel Motherboards is usually about twice that of a comparable AMD motherboard. (Presuming Gigabyte or ASUS brands. I like those two best in terms of price per performance.)

    Like I said, if you must have the best performance regardless of the expense, then an extra $300 or so dollars will get a good Intel build that rates about the same in most regards, and has a minor advantage in some regards. But as far as real-life performance, I am not convinced I would notice.
     
  11. delta534

    delta534 Member

    I could have probably explained it better but in the grand scheme of things it really does not matter too much. I will try to find a decent floating point benchmark for the Bulldozer but I don't expect to find one, There too many other factors at play. I would say the intel i5 2500/2500k has a better performance price ratio right now especially in single threaded applications. Note I like AMD and would recommend their products to the average user. ​
     
    OmniNegro likes this.
  12. OmniaNigrum

    OmniaNigrum Member

    I have been looking around a bit more at PC hardware, and the price difference between the AMD and Intel middle-high end is remarkably close. Forget my $300 mark above. I was way off.

    Also, since I hope this thread will attract PC minded individuals, I have a question that just came up. What happened to PCiE 2.1? I cannot find anyone selling motherboards with PCiE 2.1 slots. I can find many cards that use PCiE, and I know it is backward compatible, but the slots have ceased to exist. Not even the top of the line systems actually have PCiE 2.1 slots.

    Did some problem crop up that has not been resolved? Is there a licensing issues in the courts somewhere? This just makes no sense.

    I bought a Radeon 5850 using the PCiE 2.1 standard years ago and have been using it in my motherboard with PCiE 2.0 slots since then. I recently suggested to someone to get a particular card that also uses PCiE 2.1 and they pointed out that the motherboard I suggested to go with it was PCiE 2.0 only. It was an AM3+ motherboard, and a new revision of one at that. I looked all around and could not find a trace of the 2.1 slots anywhere.

    Am I forgetting something that would explain this? Or is it just apathy on the part of manufacturers keeping them from updating until the 3.0 standard is finalized?
     
  13. delta534

    delta534 Member

    I think the issue with PCIe 2.1 was that it did not provide much of a reason to upgrade from PCIe 2.0. The speeds are the same and while 2.1 does allow for more power, another power cable straight from the power supply would do the same thing for the devices that would use the power.

    Also Unless I'm reading Wikipedia wrong, the PCIe 3.0 standard was finalized in late 2010 and is being adopted by motherboard manufactures and GPU manufactures.
     
    OmniNegro likes this.
  14. OmniaNigrum

    OmniaNigrum Member

    Then I am even more confused. If 3.0 exists, why are we still using 2.0? I see this like SATA 3. SATA 2 has more than enough speed for all but the faster SSDs, yet every modern motherboard at current uses SATA 3 anyway, It costs the same to build either.

    You read the article right. 2010 was when it was finalized. I knew about the speed being the same, but the efficiency is the benefit of 2.1 over 2.0. (And I never plug more than one power cable into a video card. It just makes a path for resistance. It cannot actually take power from multiple rails and combine them. So it just follows whichever cord is a billionth of a meter shorter than the other.)
    https://en.wikipedia.org/wiki/PCI_Express
     
  15. delta534

    delta534 Member

    I have no clue why PCIe 2.0 is still around, maybe marketing bs to uniformed users about how pci works or something.
     
    OmniNegro likes this.
  16. OmniaNigrum

    OmniaNigrum Member

    Marketing BS sounds likely. Anyone know if I am wrong though? (I said that no video card has managed to exceed the speed of PCiE 2.0 yet.)

    In re-reading my previous post, I guess SSDs are the sole reason we are actually using motherboards with SATA 3.0, Otherwise that would only be on expensive boards for bragging rights. No mechanical drive can exceed the speeds of SATA 2.0 yet.

    So I guess PCiE 3.0 is like RAMBUS. It may have been better than the standard RAM of the time by quite a bit, but the hardware it required was prohibitively expensive.

    Oh well. It does not matter really. I was just curious. Thank you delta534 for chatting with me. :)
     
  17. Daynab

    Daynab Community Moderator Staff Member

    And my mobo is using Sata 6 (with a sata 6 hard drive) which is supposedly much faster but I am pretty skeptical of that myself. I didn't pay more for it though so I don't really care.
     
    OmniNegro likes this.
  18. OmniaNigrum

    OmniaNigrum Member

    You mean SATA 6Gbps? (I.E. SATA 3.0)

    Just a technicality. Call me a grammar nazi if you want to. :)

    If it is a mechanical hard drive less than a foot wide, then it is physically impossible for it to exceed the 3 Gigabits per second SATA 2 supports. They put the SATA 3 interface on anyway because it is the same hardware and different firmware. I.E. no real cost to them.

    And if you are one of the handful of people with a hard drive with foot wide platters, you are either building them for a specific purpose, or you have a drive that is antique already. :D

    I hate how the industry calls things what they are not. SATA 2.0 supports 3 Gigabits per second transfers. SATA 3.0 supports 6 Gigabits per second transfers. The marketing goons struck gold when they started interchanging 6 and 3 and every other number used. Half the competent people out there, like yourself use the wrong term. But I know that you know better. :)
     
  19. Daynab

    Daynab Community Moderator Staff Member

    Ah yes. It's been a while since I recalled, Sata 6Gbps indeed.
     
    OmniNegro likes this.