Christof Paar, ``Efficient VLSI Architectures for Bit Parallel Computation in Galois Fields,'' Ph.D., Institute for Experimental Mathematics, University of Essen, Germany, November 1994.
Advisor: Han Vinck
The thesis describes various efficient architectures for
arithmetic in Galois fields of the type
. ``Efficient'' refers to
the fact that the architectures require a small number of elementary gates
that are logical AND and exclusive OR. All architectures
are bit parallel. The work focuses on the basic operations: multiplication,
constant multiplication, and inversion. The architectures are based on
algorithms which make extensive use of the decomposition of fields
into
, the latter of which will be called composite fields.
Two efficient algorithms which are related to composite fields are
developed. One algorithm finds matrices which map binary field
representations to composite field representations. The second algorithms
performs a fast test to determine whether a polynomial over
is
primitive.
First, previous bit parallel architectures for fields
and composite
fields are reviewed. We comment on some of the previous architectures. A
suboptimum algorithm for constant multiplication with a reduced number of
gates is introduced.
A general architecture for multiplication in composite fields is developed
based on the Karatsuba-Ofman algorithm. The algorithm is closely
investigated with respect to a parallel hardware implementation. It is
shown that multiplication of two polynomials of degree less than m over
is of order
. Through an exhaustive
search, primitive polynomials are determined which perform modulo reduction
with low complexity. Detailed descriptions of efficient parallel multipliers
for field orders
are given.
It is shown that for certain field orders a combined optimization of the polynomial multiplication and modulo reduction further improves the gate count and the delay of multiplier architectures. The gate counts achieved for some field orders are the lowest ones reported in technical literature.
A comparative synthesis maps four different types of multiplier architectures to the gate-array library of the TC 160G family. It is found that the improved theoretical gate count of composite field multipliers results in netlists with an improved number of gate equivalences if gate-arrays are used. A performance estimation of the composite field multipliers results in a data throughput of up to 3.8 Gbit/sec.
An algorithm from Itoh and Tsujii for inversion over composite fields is
applied to elements in standard base representation. A relationship between
this algorithm and an architecture over tower fields proposed by Morii and
Kasahara is developed. For the fields
and
architectures for parallel inverters with moderate gate count are provided.
A new combined software/hardware approach for systems involving finite field
arithmetic is proposed. As an application, a composite field multiplier over
is attached to a 16 bit DSP. We implemented a shortened (10,8)
Reed-Solomon code which allows decoding at a speed of up to 1.9 Mbit/sec.
For more information contact the author at: ECE Department, Worcester Polytechnic Institute, Worcester, MA 01609, USA, email: christof@ece.wpi.edu