Explaining Fast Fourier Transformations, Part 2; The Code
or Botting Dancing Unicorns
17 Dec 2019 - atrodo - Song: Battle Dancing Unicorns by Five Iron Frenzy
This is part 2 of a 2 part series about Discrete and Fast Fourier Transformations. The first part is about the theory and how DFTs work while the second part is about the practical application and code execution to get the frequency data from a sound sample.
While in the last post I went over some very theoretical things, this post will be much more pragmatic and have a lot of code in it.
Common
The code below has been lightly modified and heavily commented compared to what
I have running now. For the core of the FFT algorithm I have two versions to
compare; the perl
version is designed to be easier to read while combining
some very high level concepts into a small amount of space. The C
version will
more optimized for speed and will map the high level concepts in perl
down to
discrete steps. I’ve used the same variable names between versions so it will
be easier to reference what’s happening between the two.
Since there is a lot of lookups used with Cooley-Tukey, I start by defining some
constants and shared lookup tables. I’ll calculate these once and use them for
both the perl
and C
versions.
my $blk_bits = 12;
my $blk_size = 1 << $blk_bits; # 4096
my @cos = map { cos( 2 * pi * $_ / $blk_size ) } 0 .. $blk_size;
my @sin = map { sin( 2 * pi * $_ / $blk_size ) } 0 .. $blk_size;
my @rbit = map { oct( "0b" . unpack( "b[$blk_bits]", pack "S", $_ ) ) } 0 .. $blk_size;
Since my code really only concerns itself with one block size of FFT, we can statically define its size. Since we are using a traditional Cooley-Tukey, it must be a power of 2. That makes the first two statements simple.
The @cos
and @sin
arrays are generated for each element in a $blk_size
sized array. These will be used as the angle step amounts to progress around the
circle during each round.
Finally, the @rbit
array is a lookup table so that I can easily translate
from the original array item to the reversed bit index. This is important
with Cooley-Tukey because we must rearrange the data before we do the FFT
calculation.
The code is very dense, so I think it’d be useful to break it down and heavily
comment it. The sequence is a map and series of function calls. In order to
facilitate the explanation, I’m going to work from the inside, out. Each ...
will replace the previous explanation’s sequence. This will help demonstrate the
discrete steps that perl
takes to generate the final array.
# Take the integer in $_ and translate it into an "unsigned short (16-bit) value"
pack( "S", $_ )
# We then unpack that 16-bit value into a bit string in ascending bit order
# This is actually where the reversing of the bits takes place, because
# lower-case b in unpack starts with the least significant bit when outputting
# them left-to-right
unpack( "b[$blk_bits]", ... )
# This will construct a bit string for conversion, '0b' starts a binary sequence
"0b" . ...
# oct translates from octal to a number, but can also translate binary or hex
oct(...)
# map each value from 0 to $blk_size (will contain 1 extra index)
map {...} 0 .. $blk_size
This is all thanks to just a few built in perl
functions: pack, unpack, oct, and map.
perl
Since perl
is going to be more concise and easier to see from a high level, it
seems natural to start with there.
I have avoided using the terms real and imaginary in this code in order to try
and come up with something more conceptually descriptive: xsum
and ysum
.
Because that is what both arrays represent: the sum of the waveform across each
directions (x
and y
). If, however, you’re trying to connect this code with
another FFT tutorial, then xsum
is equivalent to the real part and ysum
is
equivalent to the imaginary part.
The $stride
and $half
variables needs some special attention. Each group is
divided into an even half (the first half) and an odd half (the last half). Both
halves play an important role in the calculation, and we only visit each pair
once per iteration. $stride
defines the size of each group for each round, so
the first round has a group size of 2, so $stride = 2
. Likewise, $half
represents the number of elements between the even, lower half and the odd,
upper half; so when $stride = 2
, then $half = 1
.
The last item to pay special attention to is $lu_step
, the look-up step. We
only generated 1 set of sine and cosine tables, but depending on the size of
each group, we can still look up the relevant values by stepping through it so
that it starts at 0 and end at $blk_size
. That will provide us with the
correct value for sine and cosine for that group.
For instance, when $stride = 2
, $lu_step
will be blk_size / 2
or 2048.
This will give us the sine and cosine for 180°. On round 3, or when $stride =
8
, $lu_step
will be blk_size / 8
. This will give us the sine and cosine for
the multiples of 45°.
sub ex_pp
{
my @input = @_;
# Initalize ysum to 0 and xsum to the input values in bit-reversed order
my @xsum = map { $input[ $rbit[$_] ] } 0 .. $blk_size - 1;
my @ysum = (0) x $blk_size;
# We have as many rounds as we have bits in the calculation size
# Start at 1, not 0, because we start with 2 elements in each group and the
# group size must be an even number
foreach my $bits ( 1 .. $blk_bits )
{
my $stride = 1 << $bits; # Size of each group
my $half = $stride >> 1; # Distance in the array between each half
my $lu_step = $blk_size / $stride; # Step between each @sin and @cos lookup
foreach my $idx ( 0 .. $blk_size )
{
# We need to know which element inside the group we are at
my $i = $idx % $stride;
# Since we're just iterating over each item in the input array, we
# skip half of the items with this conditional
next
if $i >= $half;
my $even = $idx; # The even element index we are working with
my $odd = $even + $half; # The odd element index we are working with
my $lu = $i * $lu_step; # The lookup index
# This is the actual calculation meat. This is where we calculate the
# "Twiddle Factor" or Root-of-Unity to rotate the existing sums by the
# requisite degrees.
my $x = $xsum[$odd] * $cos[$lu] + $ysum[$odd] * $sin[$lu];
my $y = -$xsum[$odd] * $sin[$lu] + $ysum[$odd] * $cos[$lu];
$xsum[$odd] = $xsum[$even] - $x;
$ysum[$odd] = $ysum[$even] - $y;
$xsum[$even] += $x;
$ysum[$even] += $y;
}
}
# Finally, we calculate the magnitude of each FFT item using Pythagorean
# theorem and only return the first half of the array.
return [ map { sqrt( $xsum[$_]**2 + $ysum[$_]**2 ) } 0 .. $blk_size >> 1 ];
}
C
On the opposite side, the C
version of the exact same transformation was done
with much more optimization. I say that, however there is not much difference
between the C
and perl
version, primarily in the looping construct.
The big thing you’ll notice right away is that I start with a bit of perl
code. I’m using
Inline::C
to load this code into one source file. This has the advantage of being able to
calculate all the tables concisely in perl
once, and reuse them in C
by
packing them into a format C
has an easy time understanding. The overhead for
this is negligible since it’s only done once and stored with a state
.
sub ex_cc
{
my $input = \@_;
state $rbit = pack 's*', @rbit;
state $cos = pack 'd*', @cos;
state $sin = pack 'd*', @sin;
my $sub = \&_ex_cc;
my $result = $sub->(
pack( 'd*', @$input ),
$blk_size,
$blk_bits,
$rbit,
$cos,
$sin
);
return [ unpack 'd*', $result ];
}
use Config;
use Inline C => Config => ccflags => $Config{ccflags} . ' -std=c99';
use Inline C => <<'EOC';
// C code below goes here
EOC
The main difference to notice in the C
version is the looping construct I
used. In perl
, I used a single foreach
loop and used next
to skip the
“odd” elements. In the C
version, I used an inner and outer loop to do the
skipping. The speedup I received from doing it that way was significant enough
that I decided that the slightly more complex code was worth the speed increase.
#include <math.h>
#include <stdint.h>
SV* _ex_cc(char *cinput, int blk_size, int blk_bits, char *crbit, char *ccos, char *csin)
{
// Cast all of our inputs into proper types. Inline::C only handles a couple
// of types, so I just use char* to pass it in and cast to the correct type
int16_t *rbit = (int16_t *) crbit;
double *input = (double *) cinput;
double *cos = (double *) ccos;
double *sin = (double *) csin;
double xsum[blk_size];
double ysum[blk_size];
// Initialize our arrays
for ( int i = 0; i < blk_size; i++ )
{
xsum[i] = (double) input[ rbit[i] ];
}
memset(ysum, 0, sizeof ysum);
for ( int bits = 1; bits <= blk_bits; bits++ )
{
int size = 1 << bits;
int stride = size >> 1;
int lu_step = blk_size / size;
// Here is the major difference in C, we do two loops instead of just
// skipping the indexes that don't belong. It's a small optimization, but
// significant enough that I did it instead of the perl version's more
// clear version
// base is for the base index, which is the index of the first element of
// each group. The first round, base will be every even index. In the
// second round, base will be every fourth index, etc.
for ( int base = 0; base < blk_size ; base += size )
{
// idx is the index inside each group, only covering half of the group
// since it will process two items, idx and idx+stride, at the same time
for ( int idx = 0; idx < stride; idx++ )
{
int even = idx + base;
int odd = even + stride;
int lu = idx * lu_step;
double x = xsum[odd] * cos[lu] + ysum[odd] * sin[lu];
double y = -xsum[odd] * sin[lu] + ysum[odd] * cos[lu];
xsum[odd] = xsum[even] - x;
ysum[odd] = ysum[even] - y;
xsum[even] += x;
ysum[even] += y;
}
}
}
// Prepare the output by producing the mangitude of each element
double output[blk_size];
for ( int i = 0; i < blk_size >> 1; i++ )
{
output[i] = sqrt(xsum[i]*xsum[i] + ysum[i]*ysum[i]);
}
// Finally return it all in a new SV (perl's scalar type)
return ( newSVpv((double *)output, sizeof output) );
}
And that’s it, that’s how to implement Cooley-Tukey in both perl
and C
. I
hope that this has been informative and that the lessons I’ve learned can be of
use for you. One last piece of advice, if you’re not interested in how FFT works
or want your own to use or learn from, I’d highly recommend just using
FFTW instead. While this has been educational and
rewarding to do myself, my goals in doing this were mroe than just having a
working FFT . If your only goal is just to use a FFT, I don’t recommend you
write and maintain one yourself.
If you do decide to use FFTW, please remember that it doesn’t give you the magnitude of each output frequency, you have to calculate that yourself. That is a lesson that took me entirely too long to learn.