Difference between revisions of "Choosing a processor for a build farm"

From Ant-Computing
Jump to: navigation, search
(New page: = Overview = Build farms are network clusters of nodes with high CPU performance which are dedicated to build software. The general approach consists in running tools like "'''distcc'''" o...)
 
(2014/08/05)
Line 28: Line 28:
 
This test consisted in building haproxy-latest using gcc-4.7.4, producing code for i386. In all tests, no I/O operations were made because the compiler, the sources and the resulting binaries were all placed in a RAM disk.
 
This test consisted in building haproxy-latest using gcc-4.7.4, producing code for i386. In all tests, no I/O operations were made because the compiler, the sources and the resulting binaries were all placed in a RAM disk.
  
Machines involved in this test were :  
+
Machines involved in this test were 32 & 64 bit x86 as well as 32-bit ARMv7 platforms :  
{|
+
{|border="1" cellpadding="5" cellspacing="0"
 +
|- style="background:#efefef;"
 
! Machine type !! CPU family !! CPU model !! CPU freq (nom/max) !! CPU cores !! CPU threads !! RAM
 
! Machine type !! CPU family !! CPU model !! CPU freq (nom/max) !! CPU cores !! CPU threads !! RAM
 
|-
 
|-
Line 47: Line 48:
 
And the results are presented below in build time for various levels of parallel build.
 
And the results are presented below in build time for various levels of parallel build.
  
{| class="wikitable sortable"
+
{|border="1" cellpadding="5" cellspacing="0" class="wikitable sortable"
|-
+
|- style="background:#efefef;"
 
! Machine !! Processes !! Time (seconds) !! class="unsortable" | Observations
 
! Machine !! Processes !! Time (seconds) !! class="unsortable" | Observations
 
|-
 
|-

Revision as of 20:10, 7 December 2014

Overview

Build farms are network clusters of nodes with high CPU performance which are dedicated to build software. The general approach consists in running tools like "distcc" on the developer's workstation, which will delegate the job of compiling to all available nodes. The CPU architecture is irrelevant here since cross-compilation for any platform is involved anyway.

Build farms are only interesting if they can build faster than any commonly available, cheaper solution, starting with the developer's workstation. Note that developers workstations are commonly very powerful, so in order to provide any benefit, a cluster aiming at being faster than this workstation still needs to be affordable.

In terms of compilation performance, the metrics are lines-of-code per second per dollar (performance) and lines of code per joule (efficiency). A number of measurements were run on various hardware, and this research is still going. To sum up observations, most interesting solutions are in the middle range. Too cheap devices have too small CPUs or RAM bandwidth, and too expensive devices optimise for areas irrelevant to build speed, or have a pricing model that exponentially follows performance.

Hardware considerations

Nowadays, most processors are optimised for higher graphics performance. Unfortunately, it's still not possible to run GCC on the GPU. And we're wasting transistors, space, power and thermal budget in a part that is totally unused in a build farm. Similarly, we don't need a floating point unit in a build farm. In fact, if some code to be built uses a lot of floating point operations, the compiler will have to use floating point as well to deal with constants and some optimisations, but such code is often marginal in a whole program, let alone distribution (except maybe in HPC environments).

Thus, any CPU with many cores at a high frequency and high memory bandwidth may be eligible for testing, even if there's neither FPU nor GPU.

Methodology

First, software changes a lot. This implies that comparing numbers between machines is not always easy. It could be possible to insist on building an outdated piece of code with an outdated compiler, but that would be pointless. Better build modern code that the developer needs to build right now, with the compiler he wants to use. As a consequence, a single benchmark is useless, it always needs to be compared to one run on another system with the same compiler and code. After all, the purpose of building a build farm is to offload the developer's system so it makes sense to use this system as a reference and compare the same version of toolchain on the device being evaluated.

Porting a compiler to another machine

The most common operation here is what is called a Canadian build. It consists in building on machine A a compiler aimed at running on machine B to produce code for machine C. For example a developer using an x86-64 system could build an ARMv7 compiler producing code for a MIPS platform. Canadian builds sometimes fail because of bugs in the compiler's build system which sometimes mixes variables between build, host or target. For its defense, the principle is complex and detecting unwanted sharing there is even more difficult than detecting similar issues in more common cross-compilation operations.

In case of failure, it can be easier to proceed in two steps :

  • canadian build from developer's system to tested device for tested device. This results in a compiler that runs natively on the test device.
  • native build on the test device of a cross-compiler for the target system using the previously built compiler.

Since that's a somewhat painful (or at least annoying) task, it makes sense to back up resulting compilers and to simply recopy it to future devices to be tested if they use the same architecture.

Tests

2014/08/05

This test consisted in building haproxy-latest using gcc-4.7.4, producing code for i386. In all tests, no I/O operations were made because the compiler, the sources and the resulting binaries were all placed in a RAM disk.

Machines involved in this test were 32 & 64 bit x86 as well as 32-bit ARMv7 platforms :

Machine type CPU family CPU model CPU freq (nom/max) CPU cores CPU threads RAM
ThinkPad t430s x86-64 core i5-3320M 2.6/3.3 GHz 2 4 8 GB DDR3-1600
C2Q i686 Core2 Quad Q8300 3.0 GHz 4 4 8 GB DDR3-1066
PC-Engines apu1c x86-64 AMD T40-E 1.0/1.0 GHz 2 2 2 GB DDR3-1066
Asus EEE PC i686 Atom N2600 1.86/1.86 GHz 2 4 2 GB DDR2
Marvell XP-GP armv7 mv78460 1.6/1.6 GHz 4 4 4 GB DDR3-1866
OpenBlocks AX3 armv7 mv78260 1.33/1.33 GHz 2 2 2 GB DDR3-1333

And the results are presented below in build time for various levels of parallel build.

Machine Processes Time (seconds) Observations
apu1c 1 116.3
apu1c 2 59.4 CPU is very hot
apu1c 4 64.0 Expected, more processes than core
t430s 1 19.3 1 core at 3.3 GHz
t430s 2 10.9 2 cores at 3.1 GHz
t430s 4 9.1 2 cores at 3.1 GHz
AX3 2 93.5
XP-GP 2 74.7
XP-GP 4 39.75
EEE PC 2 61.0
EEE PC 4 46.6
C2Q 1 36.6
C2Q 2 18.9 2 cores on the same die
C2Q 4 10.8 L3 cache not shared between the 2 dies.