Ant-Computing

From Ant-Computing
Revision as of 21:44, 7 December 2014 by Willy (Talk | contribs) (Main Page moved to Ant-Computing:About)

Jump to: navigation, search

Ant-Computing

Overview

Ant-computing is a collaborative project born in 2000 with the aim to improve computer efficiency by optimizing them for certain tasks. Observation was made by then that computers were huge and power-hungry, especially in datacenters where the largest pieces of hardware were useless and some under-utilized parts prone to frequent failures could be mutualized (hard drives, power supplies), allowing the remaining parts to achieve higher densities. Compute nodes would then become more easily interchangeable and failures would not be a problem at all. The principle of packing many low-value computers each dedicated to work hard on a specific task quickly sparkled the analogy with ants, and gave its name to the project ("ant-computing"), to the first server ("formicus" = ant in Latin), and to the Linux-based distribution ("formilux").

Below you'll find links to the various sub-parts of the project.

Contents

The project covers several areas detailed in sections below :

  • software : some care is taken to try to centralize any development effort to limit divergence between users, given that the project makes slow progress
  • hardware : since 2000, a significant amount of hardware was evaluated, sometimes used, and publishing observations and tests results can be useful for other people seeking specific features, as well as for hardware makers to try to improve their designs when that can be done at no cost
  • research : a permanent quest for improvement often leaves a taste of something unachieved. Better document and share the state of findings in various areas than delaying them forever
  • applications : over the years, various parts of the project have been used in several areas ranging from network equipments to walking robots

Software

  • formilux : this is the Linux-based distribution which was developped from scratch. Its original purpose was to be able to have an up and running system in a matter of minutes to replace any faulty production server, without spending hours reinstalling. This was made possible using card-sized CDs running a small preinstalled system where many services were pre-configured and ready to start. In order to keep the image small (a few MBs), no packages were used, and dependencies were handled at the file level. This ease of deployment has led some servers to run on this system all their life, and the build and update process had to be redesigned in order to permit upgrades. The system was designed for high reliability (eg: no power-down sequence, simply cut the power) and automatic recovery (eg: no user prompt on FS errors). The system was made smaller, faster and more secure by running entirely on a read-only file system, and updates were distributed as software images. The arrival of non-x86 architectures moved the focus to cross-compilation and a more flexible build process.
  • flx : this utility was initially used only to sign a file system and detect changes between a file system and its backed up signature. It was convenient for upgrades because the signature would tell what package a binary came from. Now that images are most often read-only, the utility is used during the image build process to build package signatures and to detect config changes at run time.
  • toolchain : this is the suite of tools needed to produce binaries for a different architecture and/or system than yours. Initially the project was built on the developers' systems with their own libc, but after system upgrades that started to cause trouble. A first version of the toolchain was built to make it possible to continue to build packages with the original libc (2.1.3, then 2.2.5) regardless of the local one. The recent version also supports multiple architectures. Currently, i386, x86_64, ARMv5, ARMv7 and MIPS are being used with success.
  • compat : even with a toolchain, there is regularly some breakage to expect when upgrading a developers' workstation. GNU Make certainly is the tool causing the most grief, because in addition to introducing bugs in maintenance versions, it does not even always issue a new version when these regressions are fixed! It's very common to require multiple versions of GNU Make on a system in order to be able to build a full distro. Other components such as bash are known for breaking compatibility as well. And sometimes a package will require a more recent version than the developer's. This "compat" component provides build scripts to bring all expected versions of known unstable tools to a developer's system and bring higher compatibility between hosts.
  • boot : this is a bootloader image which supports two images and always notes what image it tries to boot from. It's very small, contains a shell and menus to reinstall the software. It even supports flashing from the serial port. It participates for a significant part to the reliability of embedded systems running on Formilux.

Hardware

An initial goal that was seeked was to be able to put two rackable 1U servers side by side in a normal 19" rack. The purpose was to easily double server density in datacenter racks. That goal was seeked with both Formicus v1 and Formicus v2 servers.

  • Formicus v1 : it was a fanless and diskless 486 DX4-120 MHz with 16 MB of RAM and 16 MB of flash, running a stripped down Slackware Linux 7.0 distro. It used to host the ant-computing website for several months. Its nice looking and large LCD attracted a lot of people who asked us questions about how to make one, and it was even featured on Slashdot.
  • Formicus v2 : it was the first attempt at doing something a bit more professional and usable. It used a celeron 400 MHz (with a fan), had a smaller LCD, a keypad and a smartcard reader. Its more important heating made it harder to build and it never went further than a prototype that raised many important questions for future designs.
  • various platforms that have been evaluated for Formilux : Here you'll find test results of various machines for various usages. Some of them have been used or are still being used.
  • power supplies : since Formicus v1, the selection of a power supply has remained one of the most difficult task. Small power supplies tend to heat more than larger ones and are more fragile. Large models often have fans. Open-frame models heat less but are dangerous to place close to low voltages. It is also often difficult to have the expected voltages on the output, especially when inputs voltages are close to output (eg: DC-to-DC conversion). Such issues are discussed there.
  • enclosures : this has always been one of the most difficult steps in the Formicus projects. Version 1 was open on its sides, which made it dangerous. Version 2 was a real half-width 1U box, but made it difficult to drill/cut and to place parts. Newer devices are much smaller and do not always come with a handy enclosure. The best example often is NAS motherboards which are very small but which come with a very large enclosure supposed to carry hard drives.

Research

  • performance vs efficiency : performance and efficiency rarely come together. While performance can be measured in number of operations per second, efficiency will rather be measured in number of operations per Joule (equivalent to number of operations per second per Watt). Optimized code can improve overall efficiency. Lower idle power usage will also improve overall efficiency on all machines that are not used at full load all the time. The trade-off between the two concepts is not always easy. For example, a vision-capable robot will need both performance to quickly analyse images, and high efficiency not to drain too much power from batteries.
  • power usage reduction : power requirements are often opposite of efficiency. They can be measured in Joule per operation at various performance levels. Sometimes a more performant, less efficient processor can lead to a more efficient global system because it can help the system complete its task faster and go back to idle earlier. For example, a NAS server would take a lot more time running backups over a 100 Mbps link than a 1 Gbps link, preventing the hard drive from going down to standby and eating more energy in total.
  • power converters : multiplying power supplies when using many devices is often inefficient and takes space. Using less power supplies and sharing them between devices is often desirable but requires that the power is converted from the shared voltage to the target device. This comes with power losses, and tt becomes increasingly difficult when input and output voltages are very close, which is a common issue when running on batteries. Many experiments were made, some of them are and will be reported here.
  • hexapod walk : one application of Formilux and principles above was to build an hexapod robot. Making an hexapod robot walk is an interesting challenge. First step was to make it work simply, but after many observations of insects and arachnids, a lot of research was made on 6-feet walk based on center of gravity position instead of the robot-looking tripod walk. Findings are explained here.
  • build farms : one area that's still being researched is build farms. Build farms are only interesting if they can build faster than any commonly available, cheaper solution. The units here are lines-of-code per second per dollar (performance) and lines of code per joule (efficiency). A number of measurements were run on various hardware, and this research is still going. To sum up observations, most interesting solutions are in the middle range. Too cheap devices have too small CPUs or RAM bandwidth, and too expensive devices optimise for areas irrelevant to build speed, or have a pricing model that exponentially follows performance.

Applications

  • toolbox : this is the oldest use of the Formilux distribution. It consists in having a small bootable system for many usages, such as starting a service in emergency, recovering data from a disk after a filesystem crash, taking network captures or running network performance tests. It used to be provided in CD format then CompactFlash format and now as bootable USB sticks or PXE images. It also works fine on small and cheap devices like GL.iNet (MIPS), Guruplug/Dockstar (ARMv5) or Mirabox/OpenBlocks (ARMv7).
  • server chroot : on servers, isolating services in chroots is often recommended for increased security. But that only works if the chroots are almost empty, and read-only. Formilux's file-based packaging model allows very minimalistic chroots to be built just as a regular system image that can be either installed in a directory or mounted into a directory. And since the system is designed to run read-only, consequences of an intrusion on a vulnerable service are very limited.
  • Sentineo : this is Exosec's network edge security node. It was the first wide-scale deployment of Formilux and directly benefits from the concepts above for both software and hardware stability, and extreme availability (some systems placed behind windows in direct sun were found with all dead capacitors, causing several reboots a day, and despite this it was possible to SSH into these machines between reboots to diagnose them or to remotely connect to other systems).
  • Aloha : this is HAProxy Technologies (formerly Exceliance)'s load balancing appliance. Just like Exosec's Sentineo it uses Formilux and concepts above to achieve very high reliability and very good system security. The system is packaged as very small firmware images that customers can easily download and upgrade.
  • Hexapod : this is a vision-equiped walking hexapod robot. The small specs of acceptable boards make them suitable to run Formilux, and particularly the fact that power cuts at any instant when batteries fall below a threshold, so writes must be very limited to non-existent to protect the hardware's flash memory. Several boards have been tried, with various DC-DC power converters. Running a very limited amount of services is also critical to keep latency low.
  • build farm : this is the project of assembling a build farm that could be used to build kernels or Formilux images. Various hardware have been benchmarked. Many sub-$100 devices have been evaluated for their ability to rival more expensive Atoms and Core-i3 at this task. Multi-core with high memory bandwidth are needed, low-latency network as well. Some concepts are still being experimented with (eg: daisy-chain build nodes instead of connecting them to a switch).

Getting started