These Debian packages are built to use the floating point hardware of the
Cirrus Logic MaverickCrunch unit present in their EP93xx series of devices
using my own modified version of GCC which performs floating point arithmetic
in the FPU.

To enable them on your armel-lenny system, add a line

   deb http://simplemachines.it/debian armel-lenny+crunch/

to your /etc/apt/sources.list file and then run

   # apt-get update; apt-get upgrade

These versions will then be installed (or upgraded to) in preference to the
ones in the standard repositories, but they will in turn be replaced if the
standard version is updated (e.g. due to security bugfixes).

The rest of this file described the build method in general terms, followed by
a details for the individual source packages in alphabetical order.


HOW THEY WERE COMPILED
======================
My compiler installs to /usr/local/bin/gcc-4.3-crunch
(Download from http://simplemachines.it/tools
Description at http://martinwguy.co.uk/martin/crunch)

In each package, I edit debian/changelog and insert a few lines at the top to
fiddle with the version number. e.g., for lame, I prefix the six lines:
------------------
lame (3.98.2-0.3+crunch) unstable; urgency=low

  * Rebuilt to use MaverickCrunch floating point hardware

 -- Martin Guy <martinwguy@yahoo.it>  Wed, 11 Mar 2009 00:00:00 +0000

------------------
There are two main methods for diverting the compilation procedure without
(or with minimal) modifications to the build scripts:

1. ENVIRONMENT VARIABLE METHOD

Some packages respond to variables in the environment, so sometimes this works:

$ CC=gcc-4.3-crunch CXX=$CC CFLAGS="-mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -fno-signed-zeros -O2 -g" CXXFLAGS="$CFLAGS" dpkg-buildpackage -rfakeroot -B

2. FAKE GCC METHOD

This method works with just about everything.

Create ~/fake/gcc, containing an executable shell script:
------------------
#! /bin/sh

# gcc: Wrapper to force MaverickCrunch FPU code generation

exec /usr/local/bin/gcc-4.3-crunch -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -fno-signed-zeros "$@"
------------------
and since some packages call the compiler by different names:
$ ln -s gcc ~/fake/gcc-4.3
$ ln -s gcc ~/fake/cc
$ ln -s gcc ~/fake/arm-linux-gnueabi-gcc

then in each Debian source directory use
$ PATH=~/fake:$PATH dpkg-buildpackage -rfakeroot -B

TERMINOLOGY
===========
By "cfdensity" I mean the number of MaverickCrunch floating point instructions
in a Debian package divided by the total number of machine instructions in it,
expressed in parts per million. For example:

cfdensity
192908  libmp3lame0_3.98.2-0.3+crunch_armel.deb

means that 19% of the machine instructions in that library are MaverickCrunch
instructions, roughly indicative of the importance of using an accelerated
version of that library to improve system performance.

"aeabidensities", instead, measure the number of calls to softfloat math
functions in an unaccelerated Debian package or library file (again divided
by the total number of instructions, and expressed in parts per million).
I use this to select candidate libraries for optimization.

Some scripts analyze object files and produce these figures:
cfdensity	Measure the cfdensity of one or more executables or library
		(works on *.so* or *.a)
aeabidensity	Ditto for calls to __aeabi_* (the soft-float library functions)
		in an executable or static (*.a) library
dpkg-cfdensity	Measure the cfdensity of the executables and libraries
		in a Debian package

The top ten densest libraries installed on a sample ARM EABI system are:
system are:
174484  /usr/lib/libfftw3f.a
147398  /usr/lib/liblua5.1-bit.a	(a small bitfields library)
138089  /usr/lib/libm.a
112050  /usr/lib/libfftw3.a
109604  /usr/lib/libartsflow.a		aRts is in C++.
103206  /usr/lib/libvorbisenc.a
101482  /usr/lib/libnifticdf.a		Medical imaging system
 94924  /usr/lib/libsmpeg.a
 91275  /usr/lib/libproj.a
 83662  /usr/lib/libsamplerate.a


PACKAGE-SPECIFIC NOTES
======================

asterisk
--------
Uses various libraries whose aeabidensities (of the /usr/lib/lib*.a files) are:
   2341	libasound2
  66771	libgsm1
      0	libogg0
      0	libpri1.0	Primary rate ISDN specification library 
  60903	libtonezone1	[from zaptel]
  64799	libvorbis0a
 103206	libvorbisenc2	[broken on armel; fixed here]
  16512	libvorbisfile (not used by asterisk)
    146	zlib1g
    149 libopenh323
    237 libopenh323

Using fake gcc method...

 checking for ptlib-config... no
 ./configure: line 27632: --pwlibdir: command not found
 ./configure: line 27640: --ldflags: command not found
 checking if PWLib version 1.10.10 is compatible with chan_h323... yes
 configure: WARNING: "CPU arm not recognized - proceed with caution!"
 checking PWLib installation validity... no
 configure: ***
 configure: *** The PWLIB installation on this system appears to be broken.
 configure: *** Either correct the installation, or run configure
 configure: *** including --without-pwlib
 make: *** [config.status] Error 1

The unstable buildlog for the same version of asterisk with the same version
of libpt-dev (asterisk_1:1.4.21.2~dfsg-3 libpt-dev_1.10.10-2) has:

 checking for ptlib-config... /usr/share/pwlib/make/ptlib-config
 checking if PWLib version 1.10.10 is compatible with chan_h323... yes
 configure: WARNING: "CPU arm not recognized - proceed with caution!"
 checking PWLib installation validity... yes
 checking /usr/share/openh323//version.h usability... yes

ptlib-dev does seem screwed.

fftw (fftw2)
----
Fake gcc method
Debian short tests:
        ./tests/fftw_test  -t -e -v -p 1024 -x 1
        ./tests/rfftw_test -t -e -v -p 1024 -x 1
succeed for single precision;
Long testsuite "make -C tests check" succeeds for double

cfdensity
509020	fftw2_2.1.3-22+crunch_armel.deb
576767	sfftw2_2.1.3-22+crunch_armel.deb

fftw3
-----
Fake gcc method
Compiled on n2100, finished on ts7250.
Running check.pl -v -a using lt-bench.

cfdensity
503466	libfftw3-3_3.1.2-3.1+crunch_armel.deb
 51940	libfftw3-dev_3.1.2-3.1+crunch_armel.deb (the fftw-wisdom progs)

lame (from debian-multimedia.org)
---------------------------------
debian/rules defaults CC to "ccache cc" so we add "cc" to our fake directory.

GCC bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39501 seems not to bite on
Maverick.  However, when encoding a WAV file with 11050 samples per second,
it segfaults in its sample rate conversion code (converting to 11025).

-O0			works
-O1			works
-O1 -ffast-math		works
-O1 -ffinite-math-only	works
-O1 -ffast-math -fschedule-insns
			dumps core
-O1 -ffast-math -fschedule-insns2
			dumps core
-O2			dumps core
-O2 -ffast-math		dumps core
-O2 -ffast-math -fno-schedule-insns -fno-schedule-insns2
			works

CFLAGS="-ffinite-math-only -fno-schedule-insns -fno-schedule-insns2" \
	PATH=~/fake:$PATH dpkg-buildpackage -rfakeroot -B

cfdensity:
169302	lame_3.98.2-0.3+crunch_armel.deb
187255	libmp3lame0_3.98.2-0.3+crunch_armel.deb

Time to encode a 30-second CD-quality wav file with default settings
on a 200MHz unit:
			normal		-V0 (fixed)
			Real	CPU	Real	CPU
Standard version	6m02	6m01	5m28	5m25
Maverick version 	2m25	2m23	2m01	2m00
Maverick version -O	2m34	2m33	2m14	2m13
CFLAGS version		2m34	2m34	2m11	2m10

libgsm
------
Fake GCC method.
libgsm does use single-precision floating point and compiles with -O2
Output from x86, softfloat and crunch are bitwise identical.

				  softfloat	  maverick
				real	user	real	user
toast -c intro.l > /dev/null	11.14	11.09	2.86	2.80
				11.16	11.09	2.87	2.79
				11.13	11.09	2.86	2.79
tcat -l intro.gsm > /dev/null	 0.86	 0.81	0.95	0.90
				 0.87	 0.79	0.96	0.89
				 0.86	 0.81	0.95	0.90

cfdensity:
173111  libgsm1_1.0.12-1+crunch_armel.deb
0       libgsm1-dev_1.0.12-1+crunch_armel.deb (needs libgsm = version)
0       libgsm-tools_1.0.12-1+crunch_armel.deb (standard one is ok with >= version)

Speedup: 4 times as fast when encoding.  10% slower decoding (?!?)

liboil
------
It contains VFP assembler code that is included whatever ARM CPU you compile for
but which is only enabled at runtime if "vfp" is present in /proc/cpuinfo's
"Features" section. It also checks for "edsp" but that doesn't seem to enable
anything.

The Maverick Debian build using the fake gcc method and
dpkg-buildpackage -B -d to avoid the pointless gtk-doc-tools dependency
unmodified, it fails saying:
cc -g -O2 -g -Wall -O2 -o .libs/build_prototypes build_prototypes-build_prototypes.o  ../liboil/.libs/liboil-0.3.so -lm -lrt 
../liboil/.libs/liboil-0.3.so: undefined reference to `vfp_negative_f32'
...

This is because it fails to pass "-mfpu=vfp" to the
 cc  -g -O2 -g -Wall -O2 -c -o math_vfp_asm.lo math_vfp_asm.S
lines. A generic solution would be to add -mfpu=vfp to ASFLAGS for math_vfp_asm,
but we do not want VFP code at all, so we patch it to drop VFP support.

The build must be run on an EP93xx host because it insists on running the
testsuite even if DEB_BUILD_OPTIONS=notest. It can be started on a faster host
and completed on the EP93xx with dpkg-buildpackage -rfakeroot -B -nc

libsamplerate
--------------
Fake GCC method -  uses arm-linux-gnueabi-gcc

./throughput_test

Softfloat:
    Converter                        Duration        Throughput
    -----------------------------------------------------------
    ZOH Interpolator                   3.31               98007
    Linear Interpolator                3.02               85935
    Fastest Sinc Interpolator          8.31                7807
    Medium Sinc Interpolator          19.56                3316
    Best Sinc Interpolator            69.02                 940

Crunch:
    Converter                        Duration        Throughput
    -----------------------------------------------------------
    ZOH Interpolator                   3.29              157765
    Linear Interpolator                3.41              152213
    Fastest Sinc Interpolator          3.56               36449
    Medium Sinc Interpolator           4.40               14745
    Best Sinc Interpolator            23.56                2753

            Duration is in seconds.
            Throughput is in samples/sec.

  libsamplerate-0.1.4 passed all tests.

  Signal-to-Noise ratios in all tests are identical for crunch and softfloat.

0       libsamplerate0-dev_0.1.4-1+crunch_armel.deb
186798  libsamplerate0_0.1.4-1+crunch_armel.deb
43633   samplerate-programs_0.1.4-1+crunch_armel.deb

libvorbis
---------
libvorbis works but libvorbisenc is broken on armel.
See http://bugs.debian.org/515949
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39501

The workaround for softfloat of adding "-fno-finite-math-only" to CFLAGS is
not enough, and removing -ffast-math completely doesn't fix 22050 and 11025.
Dropping to -O makes the testsuite succeed at all sample rates.

Edit debian/rules and change
    CFLAGS += -O2
to
    CFLAGS += -O
then use fake gcc method.

cfdensity:
167278	libvorbis0a_1.2.0.dfsg-3.1+crunch_armel.deb
227391	libvorbisenc2_1.2.0.dfsg-3.1+crunch_armel.deb
 29586	libvorbisfile3_1.2.0.dfsg-3.1+crunch_armel.deb

						real	user
oggdec -o /dev/null Happy.ogg			36.4	31.4
(vmstat says 90% user time, 10% system time)	36.1	31.1
						36.1	31.1
With crunch libvorbis0a			  	16.7	12.6
						16.5	12.3
						16.7	12.4
and crunch libvorbisfile3			15.0	10.2
						15.0	10.3
						14.9	10.1
and crunch vorbis-tools				14.9	10.2
						14.9	10.1
						14.9	10.1

oggenc with no-finite-math libvorbis*		4m33	4m32	(to a file)
						4m32	4m31	(to /dev/null)
oggenc with crunch no-finite-math libvorbis*	1m11	1m10	44100 bad 
oggenc with crunch no-fast-math libvorbis*	1m26	1m25	44100 good
oggenc -O1 -ffast-math				1m30	1m29	all output good

mpg123
------
debian/rules configures to use fixed-point math on ARM using:
CONF_arm:=--with-cpu=generic_nofpu
so change this line to
CONF_arm:=--with-cpu=generic_fpu
and use the fake gcc method

cfdensity
  13471 mpg123_1.4.3-4+crunch_armel.deb
 142133 libmpg123-0_1.4.3-4+crunch_armel.deb

Time to decode the MP3 file encoded above:
Standard version
Maverick version 

openssl
-------
Srandard Debain armel running "openssl speed md5"
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5                794.06k     2780.44k     8027.55k    15228.78k    20593.93k

python2.5
---------
Fake gcc method
Compiled on bb. Waiting for ts7250.
With top running, taking 4.3% CPU:
    cd /home/martin/crunch/debian/source/python2.5-2.5.2/build-static && ./python ../Lib/test/pystone.py
    Pystone(1.1) time for 50000 passes = 46.94
    This machine benchmarks at 1065.19 pystones/second
    Pystone(1.1) time for 50000 passes = 46.82
    This machine benchmarks at 1067.92 pystones/second
(while the standard softfloat version says:
Pystone(1.1) time for 50000 passes = 46.89
This machine benchmarks at 1066.33 pystones/second
Pystone(1.1) time for 50000 passes = 46.69
This machine benchmarks at 1070.89 pystones/second)

cfdensity:
6154    python2.5_2.5.2-15+crunch_armel.deb
2590    python2.5-dbg_2.5.2-15+crunch_armel.deb
6934    python2.5-minimal_2.5.2-15+crunch_armel.deb
0.6% FP... and it's unlikely to accelerate anything much, since interpreters
tend to spend most of their time interpreting rather than calculating.
However, it used to be one of the packages whose binary dumped core, but
now it builds, runs and passes its testsuite, which is what I was really
interested in.

speex
-----
Edit debian/rules to change
    ifeq ($(DEB_HOST_ARCH_CPU),arm)
    objdir             = $(objdir_fixedpoint)
    EXTRA_CONFIG_FLAGS = --enable-arm4-asm
    endif
to
    ifeq ($(DEB_HOST_ARCH_CPU),arm)
    EXTRA_CONFIG_FLAGS = 
    endif

The resulting Maverick executable is slower than the optimised ARM assembler
used in the standard package:
Standard speex, encoding Happy.wav: 1m00, decoding Happy.spx: 0m12 (fixed point)
Maverick speex, encoding Happy.wav: 1m32, decoding Happy.spx: 0m15 (floating)
so we won't bother with this one...

timidity
--------
Has some double float FFT code that is used for some operations.

Fake GCC methiod works fine, but note that it needs to compile and run s
program locally to generate a lookup table. On a fast non-EP93xx host, this
will fail saying:

./calcnewt > newton_table.c
/bin/sh: line 1:  9440 Illegal instruction     ./calcnewt > newton_table.c
make[3]: *** [newton_table.c] Error 132
make[3]: Leaving directory `/.../timidity-2.13.2/timidity'

You can just just run that one command on an EP93xx host then restart the
dpkg-buildpackage with -nc flag, and it runs to completion.

28335   timidity_2.13.2-20+crunch_armel.deb
5352    timidity-interfaces-extra_2.13.2-20+crunch_armel.deb

vorbis-tools (oggenc, oggdec, ogg123 etc)
------------------------------------
Fake GCC method. Seems to work.

cfdensity
 26541	vorbis-tools_1.2.0-5+crunch_armel.deb

zaptel
------
Provides libtonezone for asterisk.
cfdensity
  25755 zaptel_1.4.11~dfsg-3+crunch_armel.deb
 124834 libtonezone1_1.4.11~dfsg-3+crunch_armel.deb

TODO
====
Speed tests on mpg123