. 2GHz The key is this: other than TI's OMAP 5 in the second half of 2012 and Qualcomm's Krait, no one else has announced plans to release a new microarchitecture in the near term. Furthermore, if we only look at the first half of next year, Qualcomm is the only company that's focused on significantly improving per-core performance through a new architecture.

Everyone else is either scaling up in core count (NVIDIA) or clock speeds. As we've seen in the PC industry however, generational performance gaps are hard to overcome - even with more cores or frequency. Qualcomm Architecture Comparison TI OMAP 4460 Going into 2012, Qualcomm is set for a return to glory as it will be the first to deliver a brand new microprocessor architecture and the first to ship 28nm SoCs in volume.

Qualcomm's next-generation SoCs will also be the first to integrate an LTE modem on-die, which should enable LTE on nearly all high-end devices at much better power levels than current multi-chip 4x-nm solutions. Today we're able to talk a bit about the architecture details and performance expectations of Qualcomm's next-generation SoC due out in the first half of 2012. The execution back-end receives a similar expansion.

Whereas the original Scorpion core only had three ports to its execution units, Krait increases that to seven. Krait can issue up to four instructions in parallel. The additional execution ports simply help prevent any artificial constraints on ILP.

This is another area where Krait will be able to see significant IPC gains. Scorpion was Qualcomm's first Snapdragon CPU architecture. At a high level, it looked very much like an optimized ARM Cortex A8 design although the two had nothing in common outside of instruction set.

Scorpion was a dual-issue, in-order architecture that eventually scaled to dual-core and 1. 4-wide

  • metafor - Friday, October 07, 2011 - link MSM8260 and MSM8660 only have single-channel 32-bit LP-DDR2 memory, not dual.

    Reply Dual Memory 2H 2012 2 x ARM Cortex A9 @ 1. 0GHz single-issue PowerVR SGX 544 2 x 32-bit LPDDR2 Aren't you the same person who went on (ranting, obviously) about Krait using HKMG and hitting 2. 5GHz next year, in another article? Reply IMG PowerVR Series 5 VFPv3 (not-pipelined) ARM Cortex A8 I do look forward to seeing what the next generation of GPUs will provide


    Seems like we've stayed in this console generation too long with cell phones having graphics nearly on par with their 200W cousins. Reply 13 stages Windows 8 requires Shader Model 3. 0 to be supported by the hardware.

    Whether you call that 10Level9_3 or 9_3, or DX9. From a graphics perspective, it is all just Shader Model 3.

    0 in the end, whatever you want to call it. All of the Windows 8 launch chipsets from nVidia, TI and Qualcomm, including this MSM8960 will all support Shader Model 3.

    Reply The Krait processor is the heart of Qualcomm's second generation Snapdragon and it's the core of all Snapdragon S4 SoCs. Krait takes the aging base of Scorpion and gives it a much needed dose of adrenaline. 2-wide TI OMAP 4470

  • Krait I think the confusion might come from another (older) Qualcomm SoC working like the article described iirc, but this does not apply to the MSM8x60 AFAIK.

    Reply ST-Ericsson NovaThor LP9600 (Nova A9600) Qualcomm's NEON data paths are still 128-bits wide. GeForce++ Out of Order Execution Partial Qualcomm Scorpion 2 x ARM Cortex A9 @ 1GHz ST-Ericsson NovaThor U9500 (Nova A9500) Anyway unlike what the article says, the MSM8x60 indeed only has single-channel 32-bit LPDDR2. However there's a twist: Qualcomm offers it in a PoP (Package-on-Package) configuration at up to 266MHz or an 'ISM' (i. SiP or System-in-Package) at up to 333MHz

    . I wouldn't be surprised if many OEMs used the PoP for cost reasons. 2GHz 4 x ARM Cortex A9 w/ MPE @ ~1. 3GHz NEON Q4 11 - 1H 12
  • A5 - Friday, October 07, 2011 - link Re: DX 9. 65nm/45nm Krait has been upgraded to support the new virtualization instructions added in Cortex A15. Also like the A15, Krait enables LPAE for 40-bit memory addressing.

    Qualcomm lengthened Krait's integer pipeline slightly from 10 stages in Scorpion to 11 stages in Krait. Load/store operations tack on another two cycles and instructions that go through the Neon/VFP path further lengthen the pipe. ARM's Cortex A15 design by comparison features a 15-stage integer pipeline.

    Qualcomm's design does contain more custom logic than ARM's stock A15, which has typically given it a clock speed advantage. The A15's deeper pipeline should give it a clock speed advantage as well. Whether the two effectively cancel each other out remains to be seen.

    28nm 2 x ARM Cortex A15 @ 2GHz ARM Cortex A9 TI OMAP 4430

  • MDP POST A COMMENT 108 Comments View All Comments As it stands there are 6 feature levels: 11, 10_1, 10, 9_3, 9_2, and 9_1. Unfortunately everyone has been lax in their naming standards; DirectX and Direct3D often get thrown around interchangeably, as do periods and underscores in the feature levels (since prior to D3D 11, we'd simply refer to the version of D3D). This is how you end up with DirectX 9. The article has been corrected to be more technically accurate to clear this up. Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored VFPv3 (pipelined) TI OMAP 5 PowerVR SGX 540 1 x 32-bit LPDDR2* Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored by Brian Klug & Anand Lal Shimpi on October 7, 2011 12:35 PM EST Qualcomm Krait
  • partylikeits1999 - Saturday, October 08, 2011 - link Microsoft made such a mess out of its DirectX nomenclature in the DX9 timeframe that the rest of the industry started to ignore it and invent their own.

    Hardly anybody even bothers to distinguish between Direct3D and DirectX anymore.

    They're used interchangeably, even though the former is a subset of the latter. Q4 2011 Y (128-bit wide) Core Configurations 2 x ARM Cortex A9 w/ MPE @ 1. 5GHz NVIDIA Tegra 3/Kal-El Samsung Exynos 4212 1GHz ST-Ericsson Novathor L9540 (Nova A9540) 40nm Let's recap the current smartphone/tablet SoC landscape.

    Everything shipping today is built on a 4x-nm process, built either at Global Foundries, Samsung, TSMC or UMC. Next year we'll see a move to 28nm (bringing better performance and power characteristics) but between now and the end of 2012 there will be a myriad of designs available on the market. L2 Cache (dual-core) Process Technology Take care, In any case, 9_1 is effectively identical to Direct3D 9. 9_3 is somewhere between D3D 9.

    0c; it implements a bunch of extra features like multiple render targets, but the shader language is 2. 0 Reply Qualcomm MSM8x60 10 stages

  • ET - Sunday, October 09, 2011 - link Feature level 9_3 isn't the same as Shader Model 3 support.

    3 though, which is quite confusing since it doesn't exist. That said, I agree with your assessment that it means Shader Model 3, and not feature level 9_3. Reply Although Scorpion featured a dual-channel LPDDR2 memory controller, in a PoP configuration only one channel was available to any stacked DRAM


    In order to get access to both 32-bit memory channels the OEM had to implement a DRAM on-package as well as an external DRAM on the PCB. Memory requests could be interleaved between the two DRAM, however Qualcomm seemed to prefer load balancing between the two with CPU/GPU accesses being directed to the lower latency PoP DRAM. Very few OEMs seemed to populate both channels and thus Scorpion based designs were effectively single-channel offerings.

    At a high-level Qualcomm has built a 3-wide, out-of-order engine that feels very much like a modern version of Intel's old P6. Whereas designs from the A8 generation looked like modern Pentiums, Krait takes us into the era of the Pentium II. Typical Clock Speeds Updated VeNum Unit Adreno 225 ARM Mali-400 MP1 Update: Qualcomm published its whitepaper on the Snapdragon S4. 2GHz 90nm 2 x ARM Cortex A9 @ 1.

    85GHz Optional VFPv3-D16 (pipelined) Adreno 220

  • ArunDemeure - Friday, October 07, 2011 - link I suggest you stop embarassing yourself. Metafor knows what he's talking about, and you clearly don't. I read that previous thread - he was pretty much spot on for everything, as you would expect.

    I honestly don't know why he even bothers here given the reception he's getting.

    11 stages 1H 2012 2 x ARM Cortex A9 w/ MPE @ 1GHz ARM DMIPS/MHz This information does come from Qualcomm, although the odd PoP + external DRAM configuration (that no one seems to use) basically means that MSM8x60 is a single-channel architecture (which is why I starred it in the table above). I will ask Qualcomm once more for confirmation that this applies to MSM8x60 as well as the older single core variants. The table below encapsulates much of what you can expect over the next 12+ months: 2011/2012 SoC Comparison Krait Architecture Process Node 1) I wasn't aware that Microsoft released DirectX 9.

    1? 3-wide? 32nm 1 x 32-bit LPDDR2 412MHz 1MB 2 x Krait @ 1. 5GHz Performance of ARM cores has always been characterized by DMIPS (Dhrystone Millions of Instructions per Second).

    An extremely old integer benchmark, Dhrystone was popular in the PC market when I was growing up but was abandoned long ago in favor of more representative benchmarks. You can get a general idea of performance improvements across similar architectures assuming there are no funny compiler tricks at play. The comparison of single-core DMIPS/MHz is below:

  • felixyang - Saturday, October 08, 2011 - link 2) I believe dual channels don't give any advantage due to tegra's system bus.

    Reply Samsung Exynos 4210

  • dagamer34 - Friday, October 07, 2011 - link Great stuff to look forward to. Some comments: NVIDIA Tegra 2 Scorpion Krait's front end is significantly wider. The architecture can fetch and decode three instructions per clock.

    The decoders are equally capable of decoding any ARMv7-A instructions. The wider front end is a significant improvement over the 2-wide Scorpion core. It alone will be responsible for a tangible increase in IPC. 5GHz Optional MPE (64-bit wide) PowerVR SGX 544MPx At 3. 3, Krait should be around 30% faster than a Cortex A9 running at the same frequency.

    At launch Krait will run 25% faster than most A9s on the market today, a gap that will only grow as Qualcomm introduces subsequent versions of the core. It's not unreasonable to expect a 30 - 50% gain in performance over existing smartphone designs. ARM hasn't published DMIPS/MHz numbers for the Cortex A15, although rumors place its performance around 3. Architecture Comparison Execution Ports

  • z0mb13n3d - Friday, October 07, 2011 - link Please read: IMG PowerVR Series 6 (Rogue) I can tell you with a modicum of confidence that this is true, at least partially. 45nm Note that courtesy of the wider front-end and OoO execution engine, Krait should be a higher performance architecture than Intel's Atom.

    That's right, you'll be able to get better performance than some of the very first Centrino notebooks in your smartphones come 2012. Anand Reply DMIPS/MHz Qualcomm MSM8960 Y (64-bit wide) Issue Width ARM's NEON instruction set is handled by a dedicated unit in all of its designs.

    Qualcomm calls its NEON engine VeNum and has increased its issue capabilities by 50%. Whereas Scorpion could only issue two NEON instructions in parallel, Krait can do three. 3-wide Scorpion was pretty much the CPU architecture of choice in the 2009 - 2010 timeframe.

    Throughout 2011 however, Qualcomm has been very quiet as dual Cortex A9 designs from NVIDIA, Samsung and TI have surpassed it in terms of performance.

  • Ryan Smith - Friday, October 07, 2011 - link It's actually more complex than that. When it comes to programming for Direct3D11, there are a number of different GPU feature level targets.

    The idea is that developers will write their application in DX11, and then have customized render backends to target each feature level they want to hit. PowerVR SGX 543MP2 ST-Ericsson NovaThor U8500 600MHz/1GHz 2) Why is nVidia still using a single LPDDR2 channel when everyone else has gone to dual channel memory? ARM Mali-400 MP4 VFP11 (pipelined) 8 stages 2 x ARM Cortex A9 w/ MPE @ 1. 8GHz Performance Expectations Qualcomm has an ARM architecture license enabling it to build its own custom micro architectures that implement the ARM instruction set.

    This is similar to how AMD has an x86 license but designs its own chips rather than just producing clones of Intel processors. Qualcomm remains the only active player in the smartphone/tablet space that uses its architecture license to put out custom designs. The benefit to a custom design is typically better power and performance characteristics compared to the more easily synthesizable designs you get directly from ARM.

    The downside is development time and costs go up tremendously. 5GHz Ilomilo is pretty, but it's not exactly Gears or Battlefield, you know? Reply Krait's fetch and decode stages are obviously in-order, but the back-end is entirely out-of-order.

    Qualcomm claims that any instruction can be executed out of order, assuming that doing so doesn't create any new hazards. Instructions are retired in order.

  • Anand Lal Shimpi - Friday, October 07, 2011 - link Arun, 512KB.