Ant-Computing
Ant-Computing (work in progress)
Contents
Overview
Ant-computing is a collaborative project born in 2000 with the aim to improve computer efficiency by optimizing them for certain tasks. Observation was made by then that computers were huge and power-hungry, especially in datacenters where the largest pieces of hardware were useless and some under-utilized parts prone to frequent failures could be mutualized (hard drives, power supplies), allowing the remaining parts to achieve higher densities. Compute nodes would then become more easily interchangeable and failures would not be a problem at all. The principle of packing many low-value computers each dedicated to work hard on a specific task quickly sparkled the analogy with ants, and gave its name to the project ("ant-computing"), to the first server ("formicus" = ant in Latin), and to the Linux-based distribution ("formilux").
Below you'll find links to the various sub-parts of the project.
Contents
The project covers several areas detailed in sections below :
- software : some care is taken to try to centralize any development effort to limit divergence between users, given that the project makes slow progress
- hardware : since 2000, a significant amount of hardware was evaluated, sometimes used, and publishing observations and tests results can be useful for other people seeking specific features, as well as for hardware makers to try to improve their designs when that can be done at no cost
- research : a permanent quest for improvement often leaves a taste of something unachieved. Better document and share the state of findings in various areas than delaying them forever
- applications : over the years, various parts of the project have been used in several areas ranging from network equipments to walking robots
software
- formilux : this is the Linux-based distribution which was developped from scratch. Its original purpose was to be able to have an up and running system in a matter of minutes to replace any faulty production server, without spending hours reinstalling. This was made possible using card-sized CDs running a small preinstalled system where many services were pre-configured and ready to start. In order to keep the image small (a few MBs), no packages were used, and dependencies were handled at the file level. This ease of deployment has led soem servers to run on this system all their life, and the build and update process had to be redesigned in order to permit upgrades. The system was made smaller, faster and more secure by running entirely on a read-only file system, and updates were distributed as software images. The arrival of non-x86 architectures moved the focus to cross-compilation and a more flexible build process.
- flx : this utility was initially used only to sign a file system and detect changes between a file system and its backed up signature. It was convenient for upgrades because the signature would tell what package a binary came from. Now that images are most often read-only, the utility is used during the image build process to build package signatures and to detect config changes at run time.
- toolchain : this is the suite of tools needed to produce binaries for a different architecture and/or system than yours. Initially the project was built on the developers' systems with their own libc, but after system upgrades that started to cause trouble. A first version of the toolchain was built to make it possible to continue to build packages with the original libc (2.1.3, then 2.2.5) regardless of the local one. The recent version also supports multiple architectures. Currently, i386, x86_64, ARMv5, ARMv7 and MIPS are being used with success.
- compat : even with a toolchain, there is regularly some breakage to expect when upgrading a developers' workstation. GNU Make certainly is the tool causing the most grief, because in addition to introducing bugs in maintenance versions, it does not even always issue a new version when these regressions are fixed! It's very common to require multiple versions of GNU Make on a system in order to be able to build a full distro. Other components such as bash are known for breaking compatibility as well. And sometimes a package will require a more recent version than the developer's. This "compat" component provides build scripts to bring all expected versions of known unstable tools to a developer's system and bring higher compatibility between hosts.
hardware
An initial goal that was seeked was to be able to put two rackable 1U servers side by side in a normal 19" rack. The purpose was to easily double server density in datacenter racks. That goal was seeked with both Formicus v1 and Formicus v2 servers.
- Formicus v1 : it was a fanless and diskless 486 DX4-120 MHz with 16 MB of RAM and 16 MB of flash, running a stripped down Slackware Linux 7.0 distro. It used to host the ant-computing website for several months. Its nice looking and large LCD attracted a lot of people who asked us questions about how to make one, and it was even featured on Slashdot.
- Formicus v2 : it was the first attempt at doing something a bit more professional and usable. It used a celeron 400 MHz (with a fan), had a smaller LCD, a keypad and a smartcard reader. Its more important heating made it harder to build and it never went further than a prototype that raised many important questions for future designs.
- various platforms that have been evaluated for Formilux : Here you'll find test results of various machines for various usages. Some of them have been used or are still being used.
- power supplies : since Formicus v1, the selection of a power supply has remained one of the most difficult task. Small power supplies tend to heat more than larger ones and are more fragile. Large models often have fans. Open-frame models heat less but are dangerous to place close to low voltages. It is also often difficult to have the expected voltages on the output, especially when inputs voltages are close to output (eg: DC-to-DC conversion). Such issues are discussed there.
- enclosures : this has always been one of the most difficult steps in the Formicus projects. Version 1 was open on its sides, which made it dangerous. Version 2 was a real half-width 1U box, but made it difficult to drill/cut and to place parts. Newer devices are much smaller and do not always come with a handy enclosure. The best example often is NAS motherboards which are very small but which come with a very large enclosure supposed to carry hard drives.
research
- performance vs efficiency : performance and efficiency rarely come together. While performance can be measured in number of operations per second, efficiency will rather be measured in number of operations per Joule (equivalent to number of operations per second per Watt). Optimized code can improve overall efficiency. Lower idle power usage will also improve overall efficiency on all machines that are not used at full load all the time. The trade-off between the two concepts is not always easy. For example, a vision-capable robot will need both performance to quickly analyse images, and high efficiency not to drain too much power from batteries.
- power usage reduction : power requirements are often opposite of efficiency. They can be measured in Joule per operation at various performance levels. Sometimes a more performant, less efficient processor can lead to a more efficient global system because it can help the system complete its task faster and go back to idle earlier. For example, a NAS server would take a lot more time running backups over a 100 Mbps link than a 1 Gbps link, preventing the hard drive from going down to standby and eating more energy in total.
- power converters : multiplying power supplies when using many devices is often inefficient and takes space. Using less power supplies and sharing them between devices is often desirable but requires that the power is converted from the shared voltage to the target device. This comes with power losses, and tt becomes increasingly difficult when input and output voltages are very close, which is a common issue when running on batteries. Many experiments were made, some of them are and will be reported here.
- hexapod walk : one application of Formilux and principles above was to build an hexapod robot. Making an hexapod robot walk is an interesting challenge. First step was to make it work simply, but after many observations of insects and arachnids, a lot of research was made on 6-feet walk based on center of gravity position instead of the robot-looking tripod walk. Findings are explained here.
- build farms : one area that's still being researched is build farms. Build farms are only interesting if they can build faster than any commonly available, cheaper solution. The units here are lines-of-code per second per dollar (performance) and lines of code per joule (efficiency). A number of measurements were run on various hardware, and this research is still going. To sum up observations, most interesting solutions are in the middle range. Too cheap devices have too small CPUs or RAM bandwidth, and too expensive devices optimise for areas irrelevant to build speed, or have a pricing model that exponentially follows performance.
applications
- toolbox
- server chroot
- sentineo
- aloha
- hexapod
- build farm