These Debian packages are built to use the floating point hardware of the Cirrus Logic MaverickCrunch unit present in their EP93xx series of devices using my own modified version of GCC which performs floating point arithmetic in the FPU. To enable them on your armel-lenny system, add a line deb http://simplemachines.it/debian armel-lenny+crunch/ to your /etc/apt/sources.list file and then run # apt-get update; apt-get upgrade These versions will then be installed (or upgraded to) in preference to the ones in the standard repositories, but they will in turn be replaced if the standard version is updated (e.g. due to security bugfixes). The rest of this file described the build method in general terms, followed by a details for the individual source packages in alphabetical order. HOW THEY WERE COMPILED ====================== My compiler installs to /usr/local/bin/gcc-4.3-crunch (Download from http://simplemachines.it/tools Description at http://martinwguy.co.uk/martin/crunch) In each package, I edit debian/changelog and insert a few lines at the top to fiddle with the version number. e.g., for lame, I prefix the six lines: ------------------ lame (3.98.2-0.3+crunch) unstable; urgency=low * Rebuilt to use MaverickCrunch floating point hardware -- Martin Guy Wed, 11 Mar 2009 00:00:00 +0000 ------------------ There are two main methods for diverting the compilation procedure without (or with minimal) modifications to the build scripts: 1. ENVIRONMENT VARIABLE METHOD Some packages respond to variables in the environment, so sometimes this works: $ CC=gcc-4.3-crunch CXX=$CC CFLAGS="-mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -fno-signed-zeros -O2 -g" CXXFLAGS="$CFLAGS" dpkg-buildpackage -rfakeroot -B 2. FAKE GCC METHOD This method works with just about everything. Create ~/fake/gcc, containing an executable shell script: ------------------ #! /bin/sh # gcc: Wrapper to force MaverickCrunch FPU code generation exec /usr/local/bin/gcc-4.3-crunch -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -fno-signed-zeros "$@" ------------------ and since some packages call the compiler by different names: $ ln -s gcc ~/fake/gcc-4.3 $ ln -s gcc ~/fake/cc $ ln -s gcc ~/fake/arm-linux-gnueabi-gcc then in each Debian source directory use $ PATH=~/fake:$PATH dpkg-buildpackage -rfakeroot -B TERMINOLOGY =========== By "cfdensity" I mean the number of MaverickCrunch floating point instructions in a Debian package divided by the total number of machine instructions in it, expressed in parts per million. For example: cfdensity 192908 libmp3lame0_3.98.2-0.3+crunch_armel.deb means that 19% of the machine instructions in that library are MaverickCrunch instructions, roughly indicative of the importance of using an accelerated version of that library to improve system performance. "aeabidensities", instead, measure the number of calls to softfloat math functions in an unaccelerated Debian package or library file (again divided by the total number of instructions, and expressed in parts per million). I use this to select candidate libraries for optimization. Some scripts analyze object files and produce these figures: cfdensity Measure the cfdensity of one or more executables or library (works on *.so* or *.a) aeabidensity Ditto for calls to __aeabi_* (the soft-float library functions) in an executable or static (*.a) library dpkg-cfdensity Measure the cfdensity of the executables and libraries in a Debian package The top ten densest libraries installed on a sample ARM EABI system are: system are: 174484 /usr/lib/libfftw3f.a 147398 /usr/lib/liblua5.1-bit.a (a small bitfields library) 138089 /usr/lib/libm.a 112050 /usr/lib/libfftw3.a 109604 /usr/lib/libartsflow.a aRts is in C++. 103206 /usr/lib/libvorbisenc.a 101482 /usr/lib/libnifticdf.a Medical imaging system 94924 /usr/lib/libsmpeg.a 91275 /usr/lib/libproj.a 83662 /usr/lib/libsamplerate.a PACKAGE-SPECIFIC NOTES ====================== asterisk -------- Uses various libraries whose aeabidensities (of the /usr/lib/lib*.a files) are: 2341 libasound2 66771 libgsm1 0 libogg0 0 libpri1.0 Primary rate ISDN specification library 60903 libtonezone1 [from zaptel] 64799 libvorbis0a 103206 libvorbisenc2 [broken on armel; fixed here] 16512 libvorbisfile (not used by asterisk) 146 zlib1g 149 libopenh323 237 libopenh323 Using fake gcc method... checking for ptlib-config... no ./configure: line 27632: --pwlibdir: command not found ./configure: line 27640: --ldflags: command not found checking if PWLib version 1.10.10 is compatible with chan_h323... yes configure: WARNING: "CPU arm not recognized - proceed with caution!" checking PWLib installation validity... no configure: *** configure: *** The PWLIB installation on this system appears to be broken. configure: *** Either correct the installation, or run configure configure: *** including --without-pwlib make: *** [config.status] Error 1 The unstable buildlog for the same version of asterisk with the same version of libpt-dev (asterisk_1:1.4.21.2~dfsg-3 libpt-dev_1.10.10-2) has: checking for ptlib-config... /usr/share/pwlib/make/ptlib-config checking if PWLib version 1.10.10 is compatible with chan_h323... yes configure: WARNING: "CPU arm not recognized - proceed with caution!" checking PWLib installation validity... yes checking /usr/share/openh323//version.h usability... yes ptlib-dev does seem screwed. fftw (fftw2) ---- Fake gcc method Debian short tests: ./tests/fftw_test -t -e -v -p 1024 -x 1 ./tests/rfftw_test -t -e -v -p 1024 -x 1 succeed for single precision; Long testsuite "make -C tests check" succeeds for double cfdensity 509020 fftw2_2.1.3-22+crunch_armel.deb 576767 sfftw2_2.1.3-22+crunch_armel.deb fftw3 ----- Fake gcc method Compiled on n2100, finished on ts7250. Running check.pl -v -a using lt-bench. cfdensity 503466 libfftw3-3_3.1.2-3.1+crunch_armel.deb 51940 libfftw3-dev_3.1.2-3.1+crunch_armel.deb (the fftw-wisdom progs) lame (from debian-multimedia.org) --------------------------------- debian/rules defaults CC to "ccache cc" so we add "cc" to our fake directory. GCC bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39501 seems not to bite on Maverick. However, when encoding a WAV file with 11050 samples per second, it segfaults in its sample rate conversion code (converting to 11025). -O0 works -O1 works -O1 -ffast-math works -O1 -ffinite-math-only works -O1 -ffast-math -fschedule-insns dumps core -O1 -ffast-math -fschedule-insns2 dumps core -O2 dumps core -O2 -ffast-math dumps core -O2 -ffast-math -fno-schedule-insns -fno-schedule-insns2 works CFLAGS="-ffinite-math-only -fno-schedule-insns -fno-schedule-insns2" \ PATH=~/fake:$PATH dpkg-buildpackage -rfakeroot -B cfdensity: 169302 lame_3.98.2-0.3+crunch_armel.deb 187255 libmp3lame0_3.98.2-0.3+crunch_armel.deb Time to encode a 30-second CD-quality wav file with default settings on a 200MHz unit: normal -V0 (fixed) Real CPU Real CPU Standard version 6m02 6m01 5m28 5m25 Maverick version 2m25 2m23 2m01 2m00 Maverick version -O 2m34 2m33 2m14 2m13 CFLAGS version 2m34 2m34 2m11 2m10 libgsm ------ Fake GCC method. libgsm does use single-precision floating point and compiles with -O2 Output from x86, softfloat and crunch are bitwise identical. softfloat maverick real user real user toast -c intro.l > /dev/null 11.14 11.09 2.86 2.80 11.16 11.09 2.87 2.79 11.13 11.09 2.86 2.79 tcat -l intro.gsm > /dev/null 0.86 0.81 0.95 0.90 0.87 0.79 0.96 0.89 0.86 0.81 0.95 0.90 cfdensity: 173111 libgsm1_1.0.12-1+crunch_armel.deb 0 libgsm1-dev_1.0.12-1+crunch_armel.deb (needs libgsm = version) 0 libgsm-tools_1.0.12-1+crunch_armel.deb (standard one is ok with >= version) Speedup: 4 times as fast when encoding. 10% slower decoding (?!?) liboil ------ It contains VFP assembler code that is included whatever ARM CPU you compile for but which is only enabled at runtime if "vfp" is present in /proc/cpuinfo's "Features" section. It also checks for "edsp" but that doesn't seem to enable anything. The Maverick Debian build using the fake gcc method and dpkg-buildpackage -B -d to avoid the pointless gtk-doc-tools dependency unmodified, it fails saying: cc -g -O2 -g -Wall -O2 -o .libs/build_prototypes build_prototypes-build_prototypes.o ../liboil/.libs/liboil-0.3.so -lm -lrt ../liboil/.libs/liboil-0.3.so: undefined reference to `vfp_negative_f32' ... This is because it fails to pass "-mfpu=vfp" to the cc -g -O2 -g -Wall -O2 -c -o math_vfp_asm.lo math_vfp_asm.S lines. A generic solution would be to add -mfpu=vfp to ASFLAGS for math_vfp_asm, but we do not want VFP code at all, so we patch it to drop VFP support. The build must be run on an EP93xx host because it insists on running the testsuite even if DEB_BUILD_OPTIONS=notest. It can be started on a faster host and completed on the EP93xx with dpkg-buildpackage -rfakeroot -B -nc libsamplerate -------------- Fake GCC method - uses arm-linux-gnueabi-gcc ./throughput_test Softfloat: Converter Duration Throughput ----------------------------------------------------------- ZOH Interpolator 3.31 98007 Linear Interpolator 3.02 85935 Fastest Sinc Interpolator 8.31 7807 Medium Sinc Interpolator 19.56 3316 Best Sinc Interpolator 69.02 940 Crunch: Converter Duration Throughput ----------------------------------------------------------- ZOH Interpolator 3.29 157765 Linear Interpolator 3.41 152213 Fastest Sinc Interpolator 3.56 36449 Medium Sinc Interpolator 4.40 14745 Best Sinc Interpolator 23.56 2753 Duration is in seconds. Throughput is in samples/sec. libsamplerate-0.1.4 passed all tests. Signal-to-Noise ratios in all tests are identical for crunch and softfloat. 0 libsamplerate0-dev_0.1.4-1+crunch_armel.deb 186798 libsamplerate0_0.1.4-1+crunch_armel.deb 43633 samplerate-programs_0.1.4-1+crunch_armel.deb libvorbis --------- libvorbis works but libvorbisenc is broken on armel. See http://bugs.debian.org/515949 See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39501 The workaround for softfloat of adding "-fno-finite-math-only" to CFLAGS is not enough, and removing -ffast-math completely doesn't fix 22050 and 11025. Dropping to -O makes the testsuite succeed at all sample rates. Edit debian/rules and change CFLAGS += -O2 to CFLAGS += -O then use fake gcc method. cfdensity: 167278 libvorbis0a_1.2.0.dfsg-3.1+crunch_armel.deb 227391 libvorbisenc2_1.2.0.dfsg-3.1+crunch_armel.deb 29586 libvorbisfile3_1.2.0.dfsg-3.1+crunch_armel.deb real user oggdec -o /dev/null Happy.ogg 36.4 31.4 (vmstat says 90% user time, 10% system time) 36.1 31.1 36.1 31.1 With crunch libvorbis0a 16.7 12.6 16.5 12.3 16.7 12.4 and crunch libvorbisfile3 15.0 10.2 15.0 10.3 14.9 10.1 and crunch vorbis-tools 14.9 10.2 14.9 10.1 14.9 10.1 oggenc with no-finite-math libvorbis* 4m33 4m32 (to a file) 4m32 4m31 (to /dev/null) oggenc with crunch no-finite-math libvorbis* 1m11 1m10 44100 bad oggenc with crunch no-fast-math libvorbis* 1m26 1m25 44100 good oggenc -O1 -ffast-math 1m30 1m29 all output good mpg123 ------ debian/rules configures to use fixed-point math on ARM using: CONF_arm:=--with-cpu=generic_nofpu so change this line to CONF_arm:=--with-cpu=generic_fpu and use the fake gcc method cfdensity 13471 mpg123_1.4.3-4+crunch_armel.deb 142133 libmpg123-0_1.4.3-4+crunch_armel.deb Time to decode the MP3 file encoded above: Standard version Maverick version openssl ------- Srandard Debain armel running "openssl speed md5" type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md5 794.06k 2780.44k 8027.55k 15228.78k 20593.93k python2.5 --------- Fake gcc method Compiled on bb. Waiting for ts7250. With top running, taking 4.3% CPU: cd /home/martin/crunch/debian/source/python2.5-2.5.2/build-static && ./python ../Lib/test/pystone.py Pystone(1.1) time for 50000 passes = 46.94 This machine benchmarks at 1065.19 pystones/second Pystone(1.1) time for 50000 passes = 46.82 This machine benchmarks at 1067.92 pystones/second (while the standard softfloat version says: Pystone(1.1) time for 50000 passes = 46.89 This machine benchmarks at 1066.33 pystones/second Pystone(1.1) time for 50000 passes = 46.69 This machine benchmarks at 1070.89 pystones/second) cfdensity: 6154 python2.5_2.5.2-15+crunch_armel.deb 2590 python2.5-dbg_2.5.2-15+crunch_armel.deb 6934 python2.5-minimal_2.5.2-15+crunch_armel.deb 0.6% FP... and it's unlikely to accelerate anything much, since interpreters tend to spend most of their time interpreting rather than calculating. However, it used to be one of the packages whose binary dumped core, but now it builds, runs and passes its testsuite, which is what I was really interested in. speex ----- Edit debian/rules to change ifeq ($(DEB_HOST_ARCH_CPU),arm) objdir = $(objdir_fixedpoint) EXTRA_CONFIG_FLAGS = --enable-arm4-asm endif to ifeq ($(DEB_HOST_ARCH_CPU),arm) EXTRA_CONFIG_FLAGS = endif The resulting Maverick executable is slower than the optimised ARM assembler used in the standard package: Standard speex, encoding Happy.wav: 1m00, decoding Happy.spx: 0m12 (fixed point) Maverick speex, encoding Happy.wav: 1m32, decoding Happy.spx: 0m15 (floating) so we won't bother with this one... timidity -------- Has some double float FFT code that is used for some operations. Fake GCC methiod works fine, but note that it needs to compile and run s program locally to generate a lookup table. On a fast non-EP93xx host, this will fail saying: ./calcnewt > newton_table.c /bin/sh: line 1: 9440 Illegal instruction ./calcnewt > newton_table.c make[3]: *** [newton_table.c] Error 132 make[3]: Leaving directory `/.../timidity-2.13.2/timidity' You can just just run that one command on an EP93xx host then restart the dpkg-buildpackage with -nc flag, and it runs to completion. 28335 timidity_2.13.2-20+crunch_armel.deb 5352 timidity-interfaces-extra_2.13.2-20+crunch_armel.deb vorbis-tools (oggenc, oggdec, ogg123 etc) ------------------------------------ Fake GCC method. Seems to work. cfdensity 26541 vorbis-tools_1.2.0-5+crunch_armel.deb zaptel ------ Provides libtonezone for asterisk. cfdensity 25755 zaptel_1.4.11~dfsg-3+crunch_armel.deb 124834 libtonezone1_1.4.11~dfsg-3+crunch_armel.deb TODO ==== Speed tests on mpg123