Why Intel and AMD will never win ARM in energy efficiency.

FinalTidus

High Supremacy Member
Joined
Dec 8, 2004
Messages
34,492
Reaction score
598
Good read on CPU architecture. :)

ARM’s secret recipe for power efficient processing.

There are several different companies that design microprocessors. There is Intel, AMD, Imagination (MIPS), and Oracle (Sun SPARC) to name a few. However, none of these companies is known exclusively for their power efficiency. That isn’t to say they don’t have designs aimed at power efficiency, but this isn’t their specialty. One company that does specialize in energy efficient processors is ARM.

While Intel might be making chips needed to break the next speed barrier, ARM has never designed a chip that doesn’t fit into a predefined energy budget. As a result, all of ARM’s designs are energy efficient and ideal for running in smartphones, tablets and other embedded devices. But what is ARM’s secret? What is the magic ingredient that helps ARM to produce continually high performance processor designs with low power consumption?


A high-end i7 processor has a maximum TDP (Thermal Design Power) of 130 watts. The average ARM-based chip uses just two watts max budget for the multi-core CPU cluster, two watts for the GPU and maybe 0.5 watts for the MMU and the rest of the SoC!

In a nutshell, the ARM architecture. Based on RISC (Reduced Instruction Set Computing), the ARM architecture doesn’t need to carry a lot of the baggage that CISC (Complex Instruction Set Computing) processors include to perform their complex instructions. Although companies like Intel have invested heavily in the design of their processors so that today they include advanced superscalar instruction pipelines, all that logic means more transistors on the chip, more transistors means more energy usage. The performance of an Intel i7 chip is very impressive, but here is the thing, a high-end i7 processor has a maximum TDP (Thermal Design Power) of 130 watts. The highest performance ARM-based mobile chip consumes less than four watts, oftentimes much less.

This isn't the world of desktops and big cooling fans, this is the world of ARM.

And this is why ARM is so special, it doesn’t try to create 130W processors, not even 60W or 20W. The company is only interested in designing low-power processors. Over the years, ARM has increased the performance of its processors by improving the micro-architecture design, but the target power budget has remained basically the same. In very general terms, you can breakdown the TDP of an ARM SoC (System on a Chip, which includes the CPU, the GPU and the MMU, etc.) as follows. Two watts max budget for the multi-core CPU cluster, two watts for the GPU and maybe 0.5 watts for the MMU and the rest of the SoC. If the CPU is a multi-core design, then each core will likely use between 600 to 750 milliwatts.

These are all very generalized numbers because each design that ARM has produced has different characteristics. ARM’s first Cortex-A processor was the Cortex-A8. It only worked in single-core configurations, but it is still a popular design and can be found in devices like the BeagleBone Black. Next came the Cortex-A9 processor, which brought speed improvements and the ability for dual-core and quad-core configurations. Then came the Cortex-A5 core, which was actually slower (per core) than the Cortex-A8 and A9 but used less power and was cheaper to make. It was specifically designed for low-end multi-core applications like entry-level smartphones.

arm-cortex-a15

At the other end of the performance scale, came the Cortex-A15 processor, it is ARM’s fastest 32-bit design. It was almost twice as fast as the Cortex-A9 processor but all that extra performance also meant it used a bit more power. In the race to 2.0Ghz and beyond many of ARM’s partners pushed the Cortex-A15 core design to its limits. As a result, the Cortex-A15 processor does have a bit of a reputation as being a battery killer. But, this is probably a little unfair. However to compensate for the Cortex-A15 processor’s higher power budget, ARM released the Cortex-A7 core and the big.LITTLE architecture.

The Cortex-A7 processor is slower than the Cortex-A9 processor but faster than the Cortex-A processor. However, it has a power budget akin to its low-end brothers. The Cortex-A7 core when combined with the Cortex-A15 in a big.LITTLE configuration allows a SoC to use the low-power Cortex-A7 core when it is performing simple tasks and switch to the Cortex-A15 core when some heavy lifting is needed. The result is a design, which conserves battery but yet offers peak performance.

64-bit

ARM also has 64-bit processor designs. The Cortex-A53 is ARM’s power-saving 64-bit design. It won’t have record breaking performance, however it is ARM’s most efficient application processor ever. It is also the world’s smallest 64-bit processor. Its bigger brother, the Cortex-A57, is a different beast. It is ARM’s most advanced design and has the highest single-thread performance of all of ARM’s Cortex processors. ARM’s partners will likely be releasing chips based on just the A53, just the A57, and using the two in a big.LITTLE combination.

ARM Cortex A50
One way ARM has managed this migration from 32-bit to 64-bit is that the processor has different modes, a 32-bit mode and a 64-bit mode. The processor can switch between these two modes on the fly, running 32-bit code when necessary and 64-bit code when necessary. This means that the silicon which decodes and starts to execute the 64-bit code is separate (although there is reuse to save area) from the 32-bit silicon. This means the 64-bit logic is isolated, clean and relatively simple. The 64-bit logic doesn’t need to try and understand 32-bit code and work out what is the best thing to do it each situation. That would require a more complex instruction decoder. Greater complexity in these areas generally means more energy is needed.

A very important aspect of ARM’s 64-bit processors is that they don’t use more power than their 32-bit counterparts. ARM has managed to go from 32-bit to 64-bit and yet stay within its self-imposed energy budget. In some scenarios the new range of 64-bit processors will actually be more energy efficient than previous generation 32-bit ARM processors. This is mainly due to the increase in the internal data width (from 32- to 64-bits) and the addition of extra internal registers in the ARMv8 architecture. The fact that a 64-bit core can perform certain tasks quicker means it can power-down quicker and hence save battery life.

ARM Cortex A57

This is where the software also plays a part. big.LITTLE processing technology relies on the operating system understanding that it is a heterogeneous processor. This means the OS needs to understand that some cores are slower than others. This generally hasn’t been the case with processor designs until now. If the OS wanted a task to be performed, it would just farm it out to any core, it didn’t matter (in general), as they all had the same level of performance. That isn’t so with big.LITTLE. Thanks to Linaro hosting and testing the big.LITTLE MP scheduler, developed by ARM, for the Linux kernel which understands the heterogeneous nature of big.LITTLE processor configurations. In the future, this scheduler could be further optimized to take into account things like the current running temperature of a core or the operating voltages.

The future is looking brighter than ever for mobile computing.
There is also the possibility of more advanced big.LITTLE processor configurations. MediaTek has already proven that the big.LITTLE implementation doesn’t need to be adhered to rigidly. Its current 32-bit octa-core processors use eight Cortex-A7 cores, but split into two clusters. There is nothing to stop chip makers from trying other combinations that include different sizes of LITTLE cores in the big.LITTLE hw and sw infrastructure, effectively delivering big, little and eve smaller compute units. For example, 2 to 4 Cortex-A57 cores, tx performance tuned Cortex-A53 cores, and two smaller implementations of the Cortex-A53 CUP tuned towards lowest leakage and dynamic power – effectively resulting in a mix of 6 to 8 cores with 3 levels of performance.

ARM Cortex A50 BIG.litte
Think of the gears on a bicycle, more gears means greater granularity. The extra granularity allows the rider to pick the right gear for the right road. Continuing the analogy, the big and LITTLE cores are like the gears on the crank shaft, and the voltage level is like the gears on the back wheel – they work in tandem so the rider can choose the optimum performance level for the terrain.

The future is looking brighter than ever for mobile computing. ARM will continue to optimize and develop its CPUs around a fairly fixed power budget. Manufacturing processes are improving and innovations like big.LITTLE will continue to give us the benefits of peak performance with lower overall power consumption. This isn’t the world of desktops and big cooling fans, this is the world of ARM and its energy efficient architecture.
 

ondooy

Arch-Supremacy Member
Joined
Apr 9, 2012
Messages
10,897
Reaction score
1,147
duh, CISC and RISC... wtf..
apples and oranges again...
 

Maeda_Toshiie

Supremacy Member
Joined
May 12, 2007
Messages
6,311
Reaction score
3
Last edited:

Piezoq

Senior Member
Joined
Mar 31, 2010
Messages
1,568
Reaction score
0
Where is the comparison? It's just a description of ARM architecture.

The word efficiency implies a certain ratio of resource to work. There is ZERO comparison here, other than that 4W vs 130W TDP soundbite.

What is conveniently left out is the fact that Intel has new x86 Broadwell M chips at 4.5W. Or that the A8 has 3 billion transistors, more than double that on a Haswell i7.

Now, let us try to run OSX or Windows 8 on an Apple 8X.

Not meaningful at all.
 

Felfirez

Master Member
Joined
Mar 25, 2009
Messages
3,546
Reaction score
0
Energy efficiency in which perspective? Each of these architectures are targeted for different purposes. It's like comparing a hybrid car to cover a distance compared to a sports car. If the hybrid scales up to the power of the sports car, will it still be efficient?

I'm willing to bet that if both were to scale up or down they will eventually meet at similar efficiency somewhere in the middle.
 

xkiller213

Senior Member
Joined
Oct 21, 2012
Messages
647
Reaction score
0

Smartypants

Arch-Supremacy Member
Joined
Apr 19, 2011
Messages
12,218
Reaction score
0
Part of the article sounds like it was written by ARM marketing department :s22:
 

MoneyFace =p

High Supremacy Member
Joined
Dec 17, 2010
Messages
34,965
Reaction score
94
But performance wise ARM's still way behind x86 front end. Both sides still monopolise in their own niche. Intel for performance, ARM in power efficiency.

I only interested where ARM can reach on single thread performance and how much efficient they are at that extreme.
 

SSDLover

Banned
Joined
May 28, 2012
Messages
6,211
Reaction score
0
Intel has the technology and money to go head to head with ARM. AMD?? Wait long long.... yet to see any of their low power solutions in smartphone/tablet segment. Their x86 is even worse, is like on par with Intel equivalent, dated few generations back!!
 

DarkBlue

Arch-Supremacy Member
Joined
Jan 1, 2000
Messages
14,339
Reaction score
1,099
WTB: Phone that can provide 5 days battery life, with avg. 1-2 hours/day 3D games. Thanks.
 

scsim002

Banned
Joined
Oct 17, 2014
Messages
996
Reaction score
0
Intel has the technology and money to go head to head with ARM. AMD?? Wait long long.... yet to see any of their low power solutions in smartphone/tablet segment. Their x86 is even worse, is like on par with Intel equivalent, dated few generations back!!

AMD is not competing with ARM, AMD is embracing ARM in fact. They are releasing ARM CPUs.

AMD Announces the Availability of 64-bit ARM Opteron Developer Kits

http://www.hardwarezone.com.sg/tech...-based-opteron-a1100-series-server-processors

AMD is also creating ARM/x86 hybrid CPUs which allows you to run both ARM and x86.
 

Dr.Vijay

Administrator
Administrator
Joined
Jan 1, 2000
Messages
26,991
Reaction score
1,244
The debate aside, which many have pointed is not really going anywhere, kindly practice these habits:-

1) If you are quoting or copying content from elsewhere, please attribute the source.

2) If you must copy, do not copy the content wholesale; select a portion or a para.

These are basic web etiquette that's applicable anywhere.

I do get a number of messages from content owners to amend or remove what forum people post because many don't follow these basics. So please be considerate.
 

haylui

High Supremacy Member
Joined
Jul 18, 2006
Messages
29,806
Reaction score
92
Anyone knows about computer micro-architecture will know you can design the instructions set into whatever perf/power you like.

The culprit will always be on the compiler support and optimization.
 

commach

Great Supremacy Member
Deluxe Member
Joined
Apr 11, 2001
Messages
53,997
Reaction score
15
The article is a POS written by ARM fan.

There is no room for comparison on the CISC and "RISC". These processor are not running with the same instruction sets and designed with totally different architecture. Traditionally, CISC/Windows (Intel here) processing is heavily coded to run with cache architecture whereas the "RISC" (ARM here) is designed to run with all the registers with data.

Comparing a totally different architecture and coding is no different from comparing the efficient of a 4WD on the rough sea vs a speed boat on the mountain terrain, a good comparison indeed.
 

scsim002

Banned
Joined
Oct 17, 2014
Messages
996
Reaction score
0
Intel's CISC is a RISC at heart... x86 - Why does Intel hide internal RISC core in their processors? - Stack Overflow (of course, the difference is that Intel's RISC is hidden...)

Actually, Intel has been using their "RISC" core as early as Pentium Pro. If I am not wrong, AMD started theirs for their K5.

Both P-Pro and k5 has the ability to decode simply x86 instructions into RISC-like instructions (proprietary) to be executed more efficiently. This by-pass the microcode ROM and makes execution alot faster.

But programmers still have to use x86 instructions. They cannot bypass the decoder and use those RISC instructions.
 
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ. Forum members and moderators are responsible for their own posts.

Please refer to our Community Guidelines and Standards, Terms of Service and Member T&Cs for more information.
Top