Understanding ARM big.LITTLE MultiProcessing with Mediatek X20


The Mediatek Helio X20 has very interesting cpu inside its Helio X20 SoC, now this cpu is not very interesting from the performance point of view, but they way they implement the ARM big.LITTLE heterogeneous computing architecture.
Most SoC manufacturers and designers tend to implement a two cluster architecture with on cluster dedicated to heavy tasks in this case the big cluster made up of performance cores like the a72 and a 73, and a second cluster aimed at power efficiency made up of cores such as the cortex a 53 and a 7.
However the Helio X20 has atri cluster architecture with powerful, balanced and power efficient core clusters. The balanced cluster simply being a higher clocked version of the power efficient cluster.
Before we go further into the video we must take a quick look at the different task scheduling methods implemented in the big.little multiprocessing architecture.

The first one being cluster switching:
The clustered model approach is the first and simplest implementation, arranging the processor into identically-sized clusters of “Big” or “Little” cores. The operating system scheduler can only see one cluster at a time; when the load on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data is then passed through the common L2 cache, the first core cluster is powered off and the other one is activated. A Cache Coherent Interconnect (CCI) is used

The second one being In-kernel switcher (CPU migration):
CPU migration via the in-kernel switcher (IKS) involves pairing up a ‘Big’ core with a ‘Little’ core, with possibly many identical pairs in one chip. Each pair operates as one virtual core, and only one real core is (fully) powered up and running at a time. The ‘Big’ core is used when the demand is high and the ‘Little’ core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up, running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via the cpufreq framework.

And Finally Heterogeneous multi-processing (global task scheduling):
The most powerful use model of Big.Little architecture is heterogeneous multi-processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the “Big” cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the “Little” cores.
This is used by most modern processors that follow the big.LITTLE architecture include the Helio X20.

The preferred way to control heterogeneous task scheduling so that it is effective in power savings is to disable individual cores that aren’t being used that that is done using the cpu hotplugging feature that allows for the os to individually turn off cpu cores. Luckily the x20 has the hotplugging feature exposed and we can manually override it as discovered by a forum member at the 96boards forum.

So, I thought about manually enabling all cores would give us exponentially better score on geekbench multi core… let’s take a look what happens…

So before we go ahead and enable all cores, let’s look into what the status is by default.
First we have the system monitor app that shows some of the core on and the others as offline, this is also reflected in the cat output of the status of all the cores.

Next we’ll, go ahead and disable automatic cpu hotplug and enable all the cores which can then be seen reflected in the system monitor app.

Now let’s take a look at the benchmark,
Not what I expected at all, there is a difference, in fact the multi core score is higher than the Hikey960. But this is only because, the extra cores that I enabled meant that background tasks were now being scheduled to theses smaller cores instead of taking cpu time on the performance core which meant that the benchmark could now run more efficiently on the bigger cores giving slightly better result.
But it was still only running on the bigger cores, and not on all the cores as i expected it to, the global task scheduler only uses the hotplug feature as a means to turn off cpus that aren’t in use and save power thereby increasing efficiency and hence enabling those core had next to no considerable effect on the benchmark.

So, at the end the big.LITTLE multi-processing architecture is more geared toward power savings than it is towards pure performance. So when you see an octa or deca core ARM cpu in a device, the peak performance would only be equal to its big cores but its peak power efficiency would be equal to its LITTLE cores…

And finally a huge shoutout to seeed studio for providing me with the mediatek x20.

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti-Spam by WP-SpamShield