Throughput of an ADSL2+ Connection

In our apartment we have a Adam Internet Naked ADSL2+ connection. Our Billion BiPAC 7401VGPR3 modem reports a sync speed of 21.302Mbit/s downstream and 1.176Mbit/s upstream. The high sync speed is probably attributed to the 150m as the crow flies between us and the exchange. Adam’s control panel reports a 5.1dB SNR, the modem roughly agrees at 5.5dB. We are using the Adventurous line profile that provides 24Mbit/s with interleaving. There is one higher speed line profile that turns off interleaving, but this increases the likelihood of transmission errors.

Enough of the details, this post is about collecting some real-world throughput data. The idea comes from Vincent’s blog, which I read through Planet Debian. He is trialing a high speed internet service, and used curl to perform some throughput testing. In his case the servers tested were not up to his 50Mbit/s connection; something to think about with our 1000Mbit/s National Broadband Network on the way.

Using Vincent’s method of scraping curl‘s output, which provides a throughput every second, and displaying using gnuplot, I collected a number of data sets. I tested using the file servers of Adam Internet; mirror.filearena.net, as well as those of Internode, another local ISP that peer with Adam Internet.

The bash and gnuplot scripts used to produce the data, as well as the source data, is here. If you’re interested in running the tests yourself, use a large file from your ISP. Whilst my ISPs have test files, I instead chose an Ubuntu ISO as the file size is consistent across mirrors. Credit goes to Vincent for both snippets.

Late night and Early Morning

The first graph shows the difference between night time and weekday morning throughput. Adam have an off-peak period that allows dowloading from midnight until 8am, meaning many users queue up large downloads to complete over night. On the other hand, during business hours there are more people awake and using the net, so it is hard to draw any conclusions without looking at the ISP’s data.

Friday Evening

I also took a look at the difference between downloading the file from my 100BASE-TX switched wired network, versus the 802.11g WiFi network. A laptop to server file transfer is greater than the 2Mbit/s observed on the DSL connection, so the throughput characteristics of the WiFi should not matter. I was interested to see if the extra 0.5ms of latency that the WiFi network brings would make a difference. The answer is yes, by a small margin.

Friday Evening: WiFi and Wired

The most interesting result of these tests was Internode’s mirror gave higher throughput than the mirror provided by my own ISP. This comes with some surprise, and I do not have any suggestions for why it is the case. If you have a theory as to the discrepancy in throughput please leave a comment.

Audio Output with the ML605 FPGA

In my final year project I used the Xilinx ML605 Virtex-6 development board. Whilst having a large number of I/O options, it lacks any audio output. It does have 7, 5V CMOS I/O pins plus voltage rails used by a socketed 24-character LCD board, which were appropriated for this project.

ML605 LCD Header (J41), p33 of xtp052_ml605_schematics.pdf

The chosen audio Digital-to-Analog Converter (DAC) was the The Cirrus Logic CS4344, as it is easy to source, does 5V logic, and requires few pins to interface with. The CS4344 takes I2S (Intergrated Interchip Sound), a 4-wire bus protocol for serially transmitting sound data to a DAC. A search found Eric Brombaugh‘s site, where he provides Verilog for producing I2S samples. Using this already tested and written code saved time writing our own, so thanks Eric.

CS4344 datasheet [PDF] provides an example circuit appropriate for our application. The schematic is reproduced below.

CS4344 Audio Schematic
Audio Schematic

The CS4344 comes in a TSSOP10 surface mount package. For prototyping, it was attached to a DIP socket and placed on a breadboard.

Audio Breakout Prototype

To interface with the FPGA systems already built a the I2S generator was wapped in a VHDL state machine that takes input from a FSL Slave. FSL was chosen as a small amount of VHDL is needed to produce a port, and the interface is appropriate for feeding sequential sound samples. To integrate with Xilinx’s XPS toolflow, the following directory structure is required:


fsl_i2s_v1_00_a/
|-- data
|   |-- fsl_i2s_v2_1_0.mpd
|   `-- fsl_i2s_v2_1_0.pao
`-- hdl
    |-- verilog
    |   |-- clkgen.v
    |   `-- i2s_out.v
    `-- vhdl
        `-- fsl_i2s.vhd

The .pao file lists the HDL code present in the module. .mpd files are used to indicate what buses are used, and the name and type of I/O pins to expose to XPS, and any configuration paramaters. The version numbers for these files corresponds to Xilinx’s requirements, not the version of the IP.

XPS IP Catalog tab showing the custom peripherals.
Xilinx XPS GUI: IP Catalog tab, showing custom peripherals

HDL files are placed under their own directory, with a subdirectory used for the language used. The entire lot is placed in the pcores directory at the root of a project. When launching XPS, the module will now show up as a usable in the IP Catalog tab.

The frequency of the I2S clocks can be selected from one of a number of options depending on the desired output frequency. Table 1 in the CS4344 data sheet describe the options. It was chosen that a 44.1kHz output rate would be chosen, meaning an 88.2kHz LRCLK, and MCLK would be 11.2896MHz. Due to the properties of the ML605’s MMCM (Multi-Mode Clock Module), the exact frequency chosen was 11.290323MHz, for 64ppm of error. A python script used to select the clock, and calculate the error, can be downloaded here.

By connecting the FSL slave of the nes_fsl to a Microblaze and supplying the appropriate clock, sound can be output by performing 32-bit writes to the slave. The upper half-word is the left sample, and the lower half-word the right. Samples are 16-bit signed little big endian, that native endianness of the Microblaze.  The use of the swapix function can be removed from fsl_i2s.vhd if it is more convenient to use little endian data.

A number of enhancements were made to the base audio output peripheral:

  • It will intentionally cause the output to ‘pop’ if samples are not received when the state machine tries to fetch them. This highlights the importance of meeting real-time requirements in an audio systems.
  • A deadline counter, providing an 8-bit register that counts logarithmically up whenever a deadline miss occurs, and down whenever a sample is successfully retrieved.  This can be hooked up to the 8-bit LEDs on the ML605 to make the counter externally visible.
  • A second FSL slave was added to provide a hardware mixer. This was not completed.

The peripheral can be downloaded here. To use, unzip it into the pcores directory of a XPS project and add it to your system. The clock port needs to be connected to a 11.2896MHz clock, and the SFSL interface connected to the slave port of a FSL. It would be great to hear from you if you happen to use it as part of a design.

There’s Something on my ARM

There’s something on my ARM is the name of a talk I first gave at the Open Source Developers Conference in Brisbane, Australia, in December 2009. Since drafting this post I have given an updated version of the talk to LinuxSA, my local Linux Users Group.

In the talk, I gave an introduction to the software used in Chromium and some of the neater features present in the browser, followed by some information about the code I wrote as part of my Google Summer of Code project working on Chromium’s Linux port. Finally, I spoke about the ARM port of Chromium that I maintain and showed some power usage data I collected comparing Chromium to Firefox running on the BeagleBoard, the core of which is the OMAP3430, an ARM based system-on-chip. This page goes into some detail about the power usage setup.

Power usage

Method

Initially I was using a bench power supply with a digital ammeter borrowed off of local open source hardware hacker David Rowe. This method did not allow the automatic collection of data, relying on the visual inspection and noting of the readout from the ammeter. I also had to return the bench supply, so I desired a better method.

After perusing the BeagleBoard System Reference Manual,
I discovered the board has a 0.1Ω sense resistor (R6) on the 5V input power rail. The two pins of the resistor can be accessed via the pads at J2. See section 8.2.5 of the BBSRM for more information.

It should be noted that the resistance of the sense resistor has an error factor of 5% on the early revC models (C2), and I presume that it is a similar case for the revB, making the absolute numbers potentially inaccurate. The relative differences are still valid.

As described by a thread on the BeagleBoard mailing list, the revC version has this resistor connected to the TWL4030 IC, allowing it to be measured via I²C. As I only had a revB board available, I had to use an alternate method.

I removed a jumper from an old PCI card and attached it to J2. Using a cable from the reset switch of an old PC case, I attached it to a Arduino board.

The Arduino Diecimila contains an ATmega168 8-bit microcontroller. It also has a 10-bit analog input suitable for measuring the voltage drop across the Beagle’s sense resistor. The analog input can use a 5V or internal 1.1V reference, I chose the 1.1V reference as the measured voltage ranges from 0 to 50mV.

The Arduino was programmed to use the 1.1V reference and read from analog pin 0 every 15ms, and send the value down the serial port to a host PC. A python script, grabserial, was used to capture this output and write it along with a timestamp. This output was post-processed using pylab to smooth the data, and graphed using matplotlib.

Software

Firefox was installed from the Ubuntu Karmic repository, version 3.5.5. Chromium was built from top-of-tree using the Code Sourcery 2009q3 cross compiler, and linked against a Ubuntu Karmic rootfs. It was built with the following gcc options:

-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mthumb -mfloat-abi=softfp

For more information on cross compiling Chromium, see the page I wrote at LinuxChromiumArm. The same rootfs as used for linking was placed on a class 4 SD card and booted using a 2.6.32-omap kernel.

Results

Cold start

The system must make do with 128MB of RAM, and a small swap file mounted on a very slow root file system. Chromium starts faster than Firefox, even in the low resource environment present on the BeagleBoard. The peak power draw is higher, however the area that Chromium’s use is above that of Firefoxes is much smaller than the area that Firefox is above Chromium.

Warm start

Chromium starts much faster than Firefox on warm start, and manages to get close to idle power draw before Firefox has reached it’s peak draw.

The steady state, where both are displaying their static home page, draw is similar for both browsers. The peaks that can be seen around the 40 second mark are suspected to be due to a timer expiring, causing the kernel to flush it’s page cache to disk.

Gmail load and idle start

When displaying a dynamic page Chromium does a btter job of staying idle. Again the peak power draw is higher for chromium, but it drops to a steady state and stays in that state for a long time. Firefox does not achieve an idle state after two minutes of zero user interaction.

Conclusion

These tests show that Chromium’s has the ability to get to idle quickly, and stay there whilst running a large web application. This is useful information byond the current data we have on load time and memory usage.

Further work

In my Summer of Code proposal I outlined the creation of a power aware buildbot, a system that would run a subset of Chromium’s page load tests, as well as a idle test, and graph the results so that improvements and regressions can be tracked as the code base is worked on. I intend to prototype this system, using the BeagleBoard along with a OLPC XO-1.5 pre-production machine. There is some evidence that ChromeOS engineers at Google have a setup doing power instrumentation.

A goal would be to show a shift in power usage, either up or down, over time, that can be attributed to code changes in the browser. A use of this information could be to correlate power usage against memory, other I/O, and CPU usage, to see which of these metrics affects power usage the most.

The OMAP3 system on chip consists of many small components that can be individually powered down. In my tests above, the kernel was not aware of these features, meaning there is potential for lower power usage, especially when idle. Re-running the tests with a power management enabled kernel may offer new insights into the systems behaviour.

Links

Some simliar articles that I have found since doing my work:

The script I used to post-process and create the graphs. I used python and matplotlib:

I originally wrote this up on my wiki: