ee designer: 2009

Thursday, December 24, 2009

Error Correction Coding Receiver - Part 3

For coding schemes that can correct more than one error, things get more complicated. Instead of one syndrome vector, there will be 2t equations, where t is the number of correctable errors. A method called the Berlekamp-Massey algorithm is used to solve this system of equations to create an error locator polynomial.

Next, the roots of this polynomial must be found to indicate the location of up to t errors. Once these locations are known, the corrupted bits can be fixed.

A good description and example can be found here: [pdf]

I will attempt to write some code to implement the 31,16 BCH code that can correct up to 3 errors. There are 16 message bits and 15 parity bits which together make up a 31-bit codeword. It uses a polynomial x^5 + x^2 + 1 which generates a field of 32 elements. See the table starting on page 12 of the above link.

To implement this in hardware, a couple of math functions are needed. One is the mutiplying of primitive coefficients. This can be done by representing the coefficients as exponents, and they are multiplied by adding the exponents. For example, to multiply alpha^5 times alpha^11, add 5 and 11. This must be done mod 31 for this code, since any higher exponents are equivalent to the (mod 31) of that exponent. It is not very efficient to do a (mod 31) function in logic, and in any case, it is often not synthesizable with the mod operator (% symbol in Verilog.) A lookup table that handles adding and subtracting (mod 31) is fairly efficient and can be done in 22 slices on a Xilinx Virtex4 device. A result can be returned in one clock cycle.

Another operation is addition, which is done bitwise modulo 2. This is a simple XOR operation, but must be done on the tuple form of the coefficients. Therefore it is useful to have another lookup table that converts between the exponent and tuple forms. This can be realized in 11 slices of a Virtex4 device. For example, alpha^7 is equivalent to alpha^4 + alpha^2 and can be represented as binary 10100.

Wednesday, December 23, 2009

Error Correction Coding Receiver - Part 2

The first thing needed to decode a linear block code in hardware is a Linear Feedback Shift Register (LFSR). To implement this in programmable logic, a generic block such as that shown below can be written. It is easily customizable for different generator polynomials by adding registers as needed, and by changing the generic parameter that describes the p0lynomial g, with the most-significant degree on the left. This example shows the polynomial

x^5 + x^4 + x^3 + 1.

The term x^5 is implied in the generic parameter since the length of the register is 5. Messages are shifted in and out most-significant-bit first (where the highest bit is the coefficient of the highest degree term in the polynomial.)

This code block has a mode with feedback enabled, and a shift-only mode for shifting out the result.


module lfsr(
   input    rst_n,
   input    clk,
   input   d,
   input    en,
   input   calc_shft_n,
   output    q
);

reg [4:0] shift;

parameter g = 5'b11001;


assign q = shift[4];

always @(posedge clk or negedge rst_n)
    if (!rst_n)
        shift <= 5'b00000;
    else
        if (en)
            if (calc_shft_n) begin
                shift[0] <= d ^ shift[4];
                shift[1] <= shift[0] ^ (shift[4] & g[1]);
                shift[2] <= shift[1] ^ (shift[4] & g[2]);
                shift[3] <= shift[2] ^ (shift[4] & g[3]);
                shift[4] <= shift[3] ^ (shift[4] & g[4]);
            end
            else begin
                shift[0] <= d;
                shift[1] <= shift[0];
                shift[2] <= shift[1];
                shift[3] <= shift[2];
                shift[4] <= shift[3];
            end
endmodule

Error Correction Coding Receivers - Part 1

Error correction coding is a method of encoding a message for transmission over a noisy channel so that the message can be correctly received even if it has been corrupted by up to some number of errors. A good introduction can be found in a text on digital communications, such as
Digital Communications Fundamentals and Applications by Bernard Sklar, which is available on scribd. This text introduces the overall idea of error correction coding, and some important concepts such as Hamming distance, block codes, and generator polynomials. But it only scratches the surface. When it comes to a practical implementation, a more rigorous algebraic description is needed.

A Hamming code is fairly simple to decode. It is structured to be able to correct one error per message block. The encoded message is decoded by running it through a linear feedback shift register (LFSR) described by a certain generator polynomial. The remainder is the contents of the shift register, and this is called the syndrome. The syndrome maps to a location in the encoded message where a bit error occurred. If no errors were detected, the syndrome will be zero. In hardware, this can be realized with a Meggitt decoder. A good description of such a decoder can be found here [pdf].

More challenging is a decoder that can detect multiple errors, such as for a BCH code. The texts on this delve deeply into abstract algebra, which presents a challenge to someone who is more engineer than mathematician. Fortunately the problem has been reduced to an algorithm that should work for any BCH code, and this can be realized in hardware. Stay tuned for Part 2.

Monday, November 9, 2009

CRC Calculators

Here are some good links on CRC (cyclic redundancy check). Online calculators are good for checking your logic design. There are also some good descriptions of how CRC works.

http://www.zorc.breitbandkatze.de/crc.html
A calculator that includes CRC-CCITT, CRC16, CRC32 presets. Arbitrary lengths, initial value and final XOR value.

http://www.lammertbies.nl/comm/info/crc-calculation.html
Good mathematical explanation and calculator with outputs using several popular methods.

Monday, April 27, 2009

Tricky state machines

You have a programmable logic design, and you have tested it on the bench. Everything works. You have run functional simulation backwards and forwards. Then you re-spin the design to change or add something, and...something locks up. A part of the circuit no longer operates. Or a subblock that you did not touch starts giving sporadic malfunctions. Sometimes this malfunction is hard to recreate. It only shows up after a lot of testing. What is going on?

There is a good chance you are having issues with synchronization. You either have an asynchronous signal going into your synchronous logic, or a signal from one clock domain going into logic in another clock domain. Most designers are somewhat aware of these problems, but in large, complex design it is easy to overlook things or take shortcuts and assume that things will work.

Finite state machines are very susceptible to this kind of problem. At each active clock edge, the inputs to the flip-flops that comprise the state machine will be sampled to determine the next state. The inputs must be valid before the setup-hold window around this clock edge. When you run the place-and-route tool, it will run timing analysis to insure that this requirement is met, but normally it will do this only for signals in this clock domain.

It is easy to overlook an asynchronous input, or a signal from another clock domain which is essentially asynchronous to the clock that runs the state machine. There is no guaranteed timing relationship between this input and the setup-hold window of the clock. Input signals can change state right in this setup-hold window. If this signal goes to the input logic of more than one flip-flop, the data path delay is likely to be slightly different for each flip-flop, and the new input state will be latched by some flip-flops on the current clock edge, but not others. Depending on how the state machine is coded, you could throw it either into an unexpected but valid state, or into an undefined state (especially if it is one-hot encoded.) This will cause the state machine to malfunction or hang.

By the way, a counter is a state machine. It might be easy to take an asynchronous signal and use it as a "synchronous" clear signal that only has to reset the counter to zero. The thought is that the counter reset will be active for a few clocks, and even if the signal initially only gets to some of the registers, it will get to all of them on the next clock. However, this reset is going to be released at some point, and if it de-asserts on a clock edge, you can get unexpected results because the clear signal is combined with counting logic that is trying to increment the counter when not in reset.

Even a single flip-flop can show unexpected results. If the input consists of logic that can either set or reset the flip-flop, and the "set" logic is synchronized properly, but the "reset" logic is asynchronous, you can still have problems. I have seen that even when the "set" logic remains inactive, and the aynchronous "reset" logic activates, the flip-flop can be set. From an RTL point of view, this makes no sense. But at the gate level, something else happens that produces the opposite of the expected result. The only safe option is to synchronize all signals into the correct clock domain.

ee designer