When developing an embedded Linux system, there are a large number of choices available, not just in terms of hardware components, but also software. A Linux system is not just about the kernel, but also the user-space software layer that sits above it. Some of the most commonly chosen options are:

  • Custom/Hand-built - in some ways this is probably the most common. It allows the most flexibility, and generally results in the tightest integration. It does however require the most developer effort, and the most on-going maintenance.
  • OpenEmbedded - this is a source-based Linux distribution system. In some ways it is a meta-distribution, allowing for your own customised distribution to be made. It requires a reasonable amount of initial setup effort, but once configured additional packages can be trivially added. As its named implies, this system is targeted towards embedded systems, and a lot of effort has been made to ensure it is easy to make small installations using it.
  • T2 - this is another meta-distribution, although it is not solely targeted towards the embedded market, or even only Linux. As with OpenEmbedded, it allows for a large amount of flexibility
  • EmbeddedUbuntu/ Embedded Fedora/EmDebian - Several of the larger Linux distributions have started to target the embedded market. They have the advantage of a huge existing installation and package base, but also have the disadvantage of a large amount of desktop/server baggage that generally isn't required on an embedded system.
Each of these systems has their own merits, and there is definitely no correct answer for all situations. At Bluewater, we have made the decision to go with a mixture of a custom & OpenEmbedded solution. We start with the BusyBox minimal suite of tools. This provides almost all of the standard Linux command line utilities. Using just BusyBox, and the libraries from the GCC compiler tool chain, we are able to get what we call our minimal root filesystem into around 3MB. This can be shrunk even further by using a smaller C library, and a cut-down selection of BusyBox's utilities. From this base, we then add packages using the OpenEmbedded distribution. These are easily added, and once we have built up our core repository of packages we can easily make custom filesystems for different projects. As with any development system, care must be taken not only to the suitablity of the solution with respect to the product, but also with respect to the developers using it. The wrong choice can result in a nightmarish mix of package versions and build environments, where only the most grand Linux guru can find their way out. The correct choice however results in a system as easy to manage as a desktop PC, where packages can be added and removed trivially, meaning developers are free to choose the packages that best solve their problems, rather than those that are the easiest to install.

I don't have a Blu-Ray player yet. People tell me the best option at the moment is to get a Playstation 3, but I don't have a TV to do it justice, so have held off. I need the current TV to break first (accidentally, of course). Having said that I notice that Blu-Ray discs are starting to make an appearance at the video store, so perhaps that will push me over the edge. But at the moment, Blu-Ray is an 'early' technology, yet to hit the massive volumes of the consumer mainstream. The Cortex-R4 is in a similar position in terms of its visibility in the microcontroller market. While there are at least a dozen licensees, I believe only TI has announced a part based around it - presumably TI had a hand in the conception of the device. The Cortex-R4 is living up to its 'deeply embedded' name. Some of the features of the Cortex-R4 are:

  • 8-stage pipeline (partly superscalar)
  • Performance 400MHz (although 600MHz was subsequently announced)
  • Resulting 600 DMIPS at 400MHz, and presumably 900 DMIPS at 600MHz
  • Architecture ARMv7, including very low interrupt latency features
  • Floating point unit (FPU)
  • Branch prediction and prefetch
  • Tightly coupled memory (TCM) aka internal single cycle SRAM, as well as caches
  • Memory protection unit (MPU)
  • Advanced profiling support, hardware divide, CoreSight debug and ECC memory support
The Cortex-R4 is ridiculously configurable, for example allowing three different MPU options, 3 TCM port options (with different sizes as well), selectable number of breakpoints and watchpoints and floating point or not. One can only imagine what sort of unit volumes / cost pressures might drive such extreme configurability. Not content with a single core, TI's TMS570 range includes two cores running together, although the second can be a Cortex-M3 instead. This is a serious amount of processing power for automotive applications. But ARM has aimed the chip at more than just the automotive market. Disk drives, both magnetic and optical; ink jet and laser printers, and even 3G modems. Which brings us back to Blu-Ray. Broadcom has announced that it has selected Cortex-R4 for its next-generation of Blu-Ray player chips. One point noted in the PR is that the Cortex-R4 allows the TCMs to be powered while the CPU itself is stopped. It isn't clear what this is used for - perhaps to allow data transfer to happen under DMA while processing is suspended. Broadcom's new chip is presumably a replacement for the MIPS-based BCM7440 single chip solution. The level of technology in these devices is a wonder to behold. Let's hope ARM announces a Japanese licensee this year, and perhaps someone in the disk drive market. In the meantime, I predict that the first company to put this new Cortex-R4 part into a Blu-Ray player will at last drag me into the high definition world.

Without hardware support for screen rotation the programmer is often left to rotate the content in software before it is placed into the frame buffer. This has a considerable impact on the performance of the product. An alternative on some devices is to use DMA to do the screen rotation. This requires the DMA engine to increment the SDRAM row between pixel reads and while it frees up the processor, it uses the SDRAM very inefficiently and can have similar performance impacts as doing the entire job in software. The OMAP3530 contains hardware support for screen rotation using a rotation engine called the Virtual Rotated Frame Buffer (VRFB). This is embedded into the SDRAM Controller and can be configured to issue multiple requests to the SDRAM ensuring a maximum of consecutive accesses is performed. By tuning the VRFB to the architecture of the SDRAM the impact of page-miss penalties can be decreased and accordingly memory access performance is improved. For the programmer the VRFB provides four virtual frame buffers; 0, 90, 180 and 270 degrees. Normally the display controller is programmed to read from the unrotated location and the content to be displayed is written to the virtual address of the required rotation. OMAP3530 VRFB We have recently implemented this feature in our Snapper-DV product. By using the combination of hardware rotation and hardware scaling the customer can pick and choose how their content is displayed on a variety of screens.

During development of our new Snapper 9260 module, one of the requirements was for boot times (power -> prompt) of less than 6s. Since the (rather feature-rich) development system boot time was in the order of 45s, there was quite a bit of scope for improvement: - Enable MMU and data cache in U-Boot: by default, U-Boot does fairly minimal initialisation of hardware. We extended main board support and network driver code to support making full use of the instruction cache (easy) and data cache (a bit more interesting). Time saving: ~3s. - Use on-board storage for kernel and root filesystem, instead of NFS. Time saving: ~2s. - Rationalise U-Boot environment: remove the boot delay, flatten out dynamically-generated bootup scripts, use uncompressed image (or gzip-compressed image if space is a constraint) instead of self-decompressing image, disable redundant image verification. Time saving: ~2s. - Strip unneeded drivers out of Linux kernel: by removing unneeded systems (eg. networking, unneeded filesystems, debugging facilities, splash logo), we reduced the Linux kernel size from 3.6Mb to 1.8Mb. Time saving: ~1.5s, plus reduced kernel boot times. - Add 'quiet' to Linux boot arguments, suppressing synchronous printk output. Unlike removing printk support entirely, the bootup messages are still available via dmesg, supporting field debugging. Time saving: ~2s. - Preset loops-per-jiffy setting for Linux kernel. Time saving: ~0.25s. - Strip filesystem to avoid unneeded subsystems, slow init() scripts, minimise footprint. Time saving: ~30s. Overall, these changes allowed us to reduce the boot times for the unit to 5.2s - well within the 6s target - without sacrificing support for interactive access and field upgrades. We hope to gain another 1.2s back between power-on and start of execution by using a new revision of the AT91SAM9260 chip (an erratum on the 'A' revision silicon), which should bring overall time below 5s - not bad for a full-featured Linux device!

In 2004 the first Snapper Single Board Computer Module was designed by Bluewater Systems. Snapper 255 is a powerful module which incorporates an FPGA, Ethernet, USB OTG, LCD with touchscreen, 64MB SDRAM and 512MB NAND Flash, all on a board smaller than a business card. Snapper 255 has been used in various designs since then but sadly is now coming closer to the end of its life. The Intel PXA255 microcontroller (now under the ownership of Marvell) that was originally used is unfortunately nearing the last time buy stage with no pin compatible replacement available. Marvell will sell stock as long as it is available, but the last time buy will be in the next six months. Grey market stock will still be available after this date but will prove to be more expensive. Bluewater will continue to sell Snapper 255s and liaise with existing customers to forecast future usage and ensure that sufficient stock is available. Snapper 255 is a great building block for embedded designs, proving to be a great tool to test product software before the final product is made. In saying this we have numerous other Snapper modules to take its place. Snapper255