# Systolic Architecture of a Viterbi Equalizer for Optical Communications

Ioannis Stamoulias and Spyridon Blionas Dept. Informatics & Telecommunications University of Peloponnese Tripoli, Greece jstamoulias@gmail.com, sbli@uop.gr

**Abstract**—This paper presents the architecture of an equalizer for optical communications, for achieving equalization of data at 10Gbps. It is a Fractionally Spaced Clustering Based Sequence Equalizer, one dimensional Euclidean, for NRZ-OOK modulation. The architecture is parametric as far as the parallelism is concerned and the final value of this parameter will define the performance of each architecture element and the number of processing elements needed to achieve real time performance.

Keywords—Electronic equalization; Fractionally Spaced Clustering Based Sequence Equalizer; Viterbi; Maximum Likelihood Estimation, Systolic architecture

# I. INTRODUCTION

Nowadays optical communications system products should cover functionalities for adapting to variations of the routes of data, between transmitter and receiver. Electronic equalization is a flexible solution that could adapt to these variations and offer high performance similar to the adapters based on optical fibers that are no adaptive to any changes. Electronic equalization in combination with adaptive optics may also allow the upgrade of existing channels from few Gb/s to several hundreds Gb/s without changing the transmission infrastructure and ensure low cost optical networks. The electronic equalizer may improve the quality of individual channels when the adaptive optics systems improve the quality for individual channels and multichannel systems by modifying the interactions between channels.

The design of a high speed Fractionally Spaced Clustering Based Sequence Equalizer, one dimensional Euclidean, using the Viterbi algorithm [1], is presented here. Simulations of this system for an optical communication channel, show that it improves the quality of the received data by extending the possible distance between transmitter and receiver (BER $\leq 10^{-3}$ ), up to 400 Km with a 16 states equalizer, up to 700 Km with a 64 states equalizer and up to 1.000Km with a 1024 states equalizer. The algorithmic suites for the case study were implemented with matlab in floating point and then the proposed architecture was simulated with SIMULINK in fixed point. Fixed point analysis defined the minimum number of bits for the various units of the architecture without affecting the BER performance. The matlab receiver model of the case study included both the digital equalizer and the analogue components of the receiver as the fiber model, the adaptive optics systems and the A/D. The equalizer part of the model that counters the impairments of the transmission channel, is based on the Viterbi algorithm using for the metrics calculations the derived cluster representatives from the received training data symbols. The proposed parallel architecture is targeted to achieve real time performance.

Section II of the paper presents the proposed systolic architecture as well as the performance analysis of it, based on MATLAB and SIMULINK simulations, when section III is the conclusions.

### II. SYSTOLIC EQUALIZER

The basic adopted model for the optical path for developing the algorithms, is shown in **Fig. 1**.



### Fig. 1 : Optical path model

The transmitted data I(n) are modulated by the laser source transmitted through the optical fiber and at the receiver they are amplified by an optical transmitter. Transmitted optical pulses are distorted by the transmission channel (fiber) and by the spontaneous emission noise n(t) of the optical amplifier. The received optical signal is converted to electrical signal by a photodiode that adds additional noise e(t). Then the resulting analogue electrical signal is sampled by an ADC (Analog to Digital Converter), with a sampling rate two times the data transmission rate. The produced data are the input of the equalizer. The equations of the system in **Fig. 1** are the following:

$$u(t) = St(I(n),I(n-1),...,I(n-L)), L=System memory$$
(1)

$$y(t) = |u(t) + n(t)|^{2} + e(t)$$
(2)

Here, the proposed equalizer architecture can balance all added distortions (as shown in Fig. 1) to the transmitted optical pulses. The distortions considered for this paper, are linear distortions (Chromatic dispersion -CD- and Polarization Mode Dispersion -PMD-), and also, non linear distortions from the photodiode (self-phase-modulation, -SMP- and cross-phasemodulation, -XPM-, in case of WDM). Many optical and optoelectronic techniques have been proposed for dealing with the aforementioned distortions and have been studied before deciding which to implement.

## A. Description of the proposed Equalizer

Many alternatives exist for electronic equalization. Supervised mode and blind mode equalizer algorithms were studied. Exhaustive simulations and architecture complexity estimations were carried out considering an optical system with Non Return to Zero On-Off keyed (NRZ-OOK) modulation, symbol rate 10 GS/s, laser 1mW CW at 1550 nm, using an external modulator with extinction rate 25dB. At the receiver there were considered a 3<sup>rd</sup> order Gaussian response optical filter and a pin photodiode with sensitivity 0,83/W and a 4<sup>th</sup> order Bessel electrical filter. The received optical signal it was assumed to be optically amplified such that the power to be equal to the transmitted power. The simulations in matlab environment for all the studied alternative algorithms for the equalization, used as data 100\*2<sup>14</sup> symbols (bits), of which 200.000 were used for training. The settings of the matlab realization of Fig. 1 were: Dispersion parameter D=17 ps/(nm\*km), and fiber lengths of m=400km up to 1.000km, that result Dispersion measure (residual CD) 6.800ps/nm, up to 17.000ps/nm. The output of the sampled optical signal before equalization, it was considered to be 7 bits. The benchmarking showed that Fractionally Spaced Clustering Based Sequence Equalizer one dimensional Euclidean (FsCBSE 1d Euclidean), using the Viterbi algorithm has a BER $\leq 10^{-3}$  and needs less hardware than all other algorithms [1]-[3]. Thus the equalizer adopted here was the FsCBSE 1d Euclidean.

The Viterbi algorithm for maximum likelihood estimation [1]-[13], used for the chosen equalizer, assumes that the transmission channel (fiber) is a finite state machine with a number of states equal to  $2^{L}$  possible sequences of data (L= channel memory i.e. the number of bits into the transmitting channel). It is assumed as a trellis diagram, where nodes represent the possible states of the system (the L bits into the transmitting channel), at every step, and arrows represent the possible transitions from a node to other nodes that may happen between two received symbols (steps). Every arrow (transition) has a weight that represents the "distance" of this specific transition, between the received sample at the current step and the expected for this transition (estimated cluster representative). This weight (Branch Metric), is used by the destination node, together with the sum of the weights of the previous resulted transitions (trace-back chain of transitions). At every step the transition with the minimum "Total distance" is selected at every node. The Viterbi algorithm after a number of n steps (convergence length), decides about a specific bit in the channel, transmitted n steps earlier.

At every step of the Viterbi algorithm, the system stores the "Total distance" of the chain of transitions for every node, and also the branch of every state (node) from which every "winning" transition comes at every step, together with its "history" of previously selected transitions. The storing of the bit resulting from the winning transition, for every state, at every step, is done for a total of n+1 consecutive steps, at a matrix of n+1 columns (1<sup>st</sup> column stores the results of n+1 steps back, (n+1)<sup>th</sup> column the results of the current step), and  $2^{L}$  rows. The aforementioned matrix is called survivor paths memory and it will have identical bits at every row of the first column at the

beginning of the matrix. This identical bit of the first column is the bit detected by the equalizer that was transmitted n symbols-steps earlier. If the number of data to be estimated, is a block of size K, then an input sequence of total length K+n symbols has to be input to the equalizer. The n last symbols of this input sequence are called tail. Simulations showed that n has to be greater or equal than  $2^*(L+1)$ , and K could be of any size.

Viterbi algorithm could be performed alternatively at every two steps, for 2 symbols concurrently (without affecting the algorithmic core) and in this case (followed by the proposed architecture), it is called "Radix 4" approach. It is shown in [15] that following "Radix 4" achieves throughput increase with linear scaling of the hardware requirements.

BER measurements for various OSNR-distances in simulations, showed that the chosen equalizer, could achieve always a BER $\leq 10^{-3}$ . Matlab simulations showed specifically that for a fiber of 400 Km with 16 states the chosen equalizer performs a BER  $10^{-4}$ . For a fiber of 700Km with 64 states it performs a BER of  $0,77*10^{-3}$ . Finally for a fiber 1.000Km and with 1024 states the equalizer performs a BER of  $0,9*10^{-3}$ .

### B. Description of the Systolic architecture.

In [14]-[24] there are presented various systolic architectures for implementing the Viterbi algorithm, all of them based on the detection of a Block of data (Sliding Block Viterbi Decoder - SBVD). Further optimizations of the SBVD architecture, called minimized SBVD, were presented by [17] and [18]. The SBVD architecture is of similar complexity with that of the minimized SBVD, claiming to be less area demanding. SBVD architectures are based on a parallel approach by inputting a block of data (Sliding Block) at every "sample" clock of the equalizer architecture. The output of the architecture will be the decision for this block of data and is expected to be available after a number of "sample" clocks equal to the size of the block. Nevertheless after this initial latency, it is expected that at every "sample" clock, a block of data will be available at the output of the architecture in parallel.

As shown in Fig. 2 the proposed architecture is sliding Block and systolic and it has a number of processors equal to the block size plus the tail, similarly to [16].



Fig. 2 : Systolic architecture



Fig. 3 : Viterbi Processor described in SIMULINK

Every processor of the architecture, has three different units as shown in Fig. 3. The BMU (Branch Metric Unit) that calculates, at every step, the Euclidean distances (branch metrics) for all the states, for the two symbols processed in parallel ("Radix 4" Viterbi was chosen), using both sampling instances of each symbol (sampling rate two times the data transmission rate) and all the estimated cluster representatives. The ACSU (Add Compare Select Unit) that calculates the total metric for both branches of every state. ACSU selects at every two steps, for all the states, the double transition with the minimum "Total distance". Both these units are achieving in SIMULINK implementation, the same BER performance as matlab simulations, using fixed point arithmetic processing elements with 8 bits fractional part and 2 bits integer (including the sign). The third unit SPU (Survivor Paths Unit) shown in Fig. 2 and Fig. 3 is keeping the survivor paths memory updated at every two steps of the algorithm, also because of the "Radix 4" choice.

Some estimations concerning the size of the proposed architecture, for a real time performance equalizer. A future implementation expected to have an operating clock frequency of 250MHz and be able to achieve a targeted 10Gbps throughput. The needed size of the systolic architecture for the chosen equalizer is (K+n)/2 processors (because of "Radix 4" every processor process 2 symbols concurrently). K is defined as the ratio of the required symbol rate and the hardware maximum frequency (K=10Gbps/250MHz=40). The tail n is the convergence length of the Viterbi algorithm and as mentioned above n=2\*(L+1). Each processor unit includes a "Radix 4" BMU that requires  $2^{\frac{1}{2}^{L+1}}$  Euclidean distance calculations (-BM-, 2<sup>L+1</sup> calculations per symbol) and a "Radix 4" ACSU that requires 2<sup>L</sup> ACS processes. Consequently the size for the proposed systolic architecture of the chosen equalizer with BER $\leq 10^{-3}$  is as follows: 25 processors with a total of 1600 BM and 400 ACS for 400Km (L=4, n=10), 27 processors with a total of 6912 BM and 1728 ACS for 700Km (L=6, n=14), 30 processors with a total of 122880 BM and 30720 ACS for 1.000 Km (L=10, n=22). It is expected that the first two of the above cases of the proposed architecture, would fit in a large size FPGA, when the last one for the 1000Km would only be possible to be implemented in an ASIC. Finally, the important issue of how to input and output the entire block of data, at every clock cycle, into the proposed systolic architecture of Fig. 2, for a targeted throughput of 10Gbps, is shown in Fig. 4. An array of 7 shift Registers of 40+n bits with a clock of 10 GHz, are used to input the two sampling instances of every received symbol into the systolic architecture. When this shift registers array is filled with data, it feeds the systolic architecture every 40 clock pulses of 10GHz frequency. For the output, a 40 bits shift register, loading in parallel and shifting out at 10GHz the detected data, is sufficient.



#### Fig. 4 : Equalizer interfaces

# III. CONCLUSIONS

A systolic architecture for an electronic equalizer for optical communications was presented. The BER performance of the proposed electronic equalizer architecture was validated at its algorithmic level, in matlab environment and in SIMULINK (fixed point arithmetic processing elements). The size of the proposed architecture was estimated for a future FPGA implementation (at 250 MHZ clock frequency), for the equalization of transmitted data with 10 Gbps throughput and distorted from fibers up to 700Km length.

#### ACKNOWLEDGMENTS

The authors would like to thank Dr. Christos Matrakidis for providing the code for the simulation of the optical communications link and Dr. Kristina Georgoulakis for providing the code for the implementation of the FsCBSE equalizer.

This research was funded by the Operational Program "Education and Lifelong Learning" of the Greek National Strategic Reference Framework (NSRF) Research Funding Program: THALES PROTOMI, grant number MIS 377322.

#### REFERENCES

- [1] K. Georgoulakis, C. Matrakidis, G. O. Glentis, and A. Stavdas, "Fractionally spaced clustering based equalizer for optical channels," in OSA SPPCom Karlsruhe Germany, June 21 - 24 2010.
- W. Chen, F. Buchali, X. Yi, W. Shieh, J. S. Evans, and R. S. Tucker, [2] "Chromatic Dispersion and PMD Mitigation at 10 gb/s using Viterbi Optics Equalization for DPSK and DQPSK Modulation Formats.' express, vol. 15, no. 9, pp. 5271-6, Apr. 2007.
- [3] K. Georgoulakis and G. O. Glentis, "Performance evaluation of the Clustering Based Sequence Equalizer in Direct Detection Optical Communication Links," in PHOTOPTICS Berlin Germany, March 12 -14 2015.
- [4] Chunmin Xia and W. Rosenkranz, "Nonlinear electrical equalization for different modulation formats with optical filtering," J. Lightwave Technol., Vol. 25, No. 4, (2007), pp. 996-1001. 5
- K. Georgoulakis, S. Theodoridis "Clustering based sequence equalization of TCM signals in hostile environments" IEEE Trans. [5] Signal Processing, Vol.47, No.6, (1999), 1783-1787.
- H.Bulow, F.Buchali, A.Klekamp, "Electronic [6] Dispersion Compensation", IEEE Journal of Lightwave Tech nology, 26(1), 2008, 158-167.
- [7] Viterbi, Andrew J., "Error bounds for convolution codes and an asymptotically optimum decoding algorithm", IEEE Transactions on Information Theory, April 1967, Vol. 13, 2.
- Forney, G.D. Jr., "The viterbi algorithm", Proceeding of the IEEE. 1973, [8] Vol. 61, 3.
- O. Agazzi, M. Hueda, H. Carrer, and D. Crivelli, "Maximum likelihood sequence estimation in dispersive optical channels," J. Lightwave Tech., vol. 23, no. 2, pp. 479-762, 2005.
- [10] T. Foggi, G. Colavolpe, E. Forestieri, and G. Prati, "Channel Estimation Algorithms for MLSD in Optical Communication Systems," IEEE Photonics Technology Letters, vol. 18, no. 19, pp. 1984-1986, Oct. 2006.
- [11] T. Foggi, E. Forestieri, G. Colavolpe, and G. Prati, "Maximumlikelihood sequence detection with closed-form metrics in OOK optical systems impaired by GVD and PMD," Journal of Lightwave Technology, vol. 24, no. 8, pp. 3073-3087, Aug. 2006.
- [12] G. Bosco and P. Poggiolini, "Long-distance effectiveness of MLSE IMDD receivers," IEEE Photonics Technology Letters, vol. 18, no. 9, pp. 1037-1039, May 2006.
- [13] M. R. Hueda, D. E. Crivelli, H. S. Carrer, and O. E. Agazzi, "Parametric Estimation of IM/DD Optical Channels Using New Closed-Form Approximations of the Signal PDF," Journal of Lightwave Technology, vol. 25, no. 3, pp. 957-975, March 2007.

- [14] Tzou, Kou-Hu and Dunham, J., "Sliding Block Decoding of Convolutional Codes", IEEE Transactions on Communications, 1981, Vol. 29, 9.
- [15] Black, P. J. and Meng, T. H., "A 140-Mb/s, 32-state, radix-4 Viterbi decoder", IEEE Journal of Solid-State Circuits, 1992, Vol. 27, 12.
- [16] Black, Peter J. and Meng, Teresa H.-Y., "A 1-Gb/s, four-state, sliding block viterbi decoder", IEEE Journal of Solid-State Circuits, 1997, Vol. 32, 6.
- [17] Fettweis, G. and Meyr, H., "Parallel Viterbi algorithm implementation: Breaking the ACS bottleneck", IEEE Transactions on Communications, 1989, Vol. 37, 8.
- [18] Fettweis, G., Dawid, H. and Meyr, H., "Minimized method Viterbi decoding, 600 Mbit/s per chip", IEEE Global Telecommunications Conference and Exhibition, San Diego : s.n., 1990. Vol. 3.
- [19] N. C. McCinty, R. Kennedy, and P. Hoeher, "Parallel trellis Viterbi algorithm for sparse channels", IEEE Communi. Lett., vol. 2, pp. 143– 145, 1998.
- [20] G. Bai, J. B. Ashbrook, P. Jinki, N. R. Shanbhag, A. C. Singer and S. Chopra, "An MLSE Receiver for Electronic Dispersion Compensation of OC-192 Fiber Links," Solid-State Circuits, IEEE Journal of, vol. 41, no. 11, pp. 2541–2554, Nov. 2006.

- [21] T. Kupfer, C. Dorschky, M. Ene and S. Langenbach, "Measurement of the Performance of 16-States MLSE Digital Equalizer with Different Optical Modulation Formats," in Optical Fiber Communication Conference/ National Fiber Optic Engineers Conference, San Diego California, Feb. 24 - 28, 2008.
- [22] Fritzsche, D.; Breuer, D.; Schurer, L.; Ehrhardt, Armin; Oeruen, H.; Schaffer, C.G., "Experimental Investigation of Real Time 10 Gbit/s MLSE Equalizer Using 4-States and 16-States Viterbi Detector," Global Telecommunications Conference, 2009. GLOBECOM 2009. IEEE, vol., no., pp.1,5, Nov. 30 2009-Dec. 4 2009
- [23] T. Veigel, M. Groizing, M. Berroth, F. Buchali, Design of a Viterbi Equalizer Circuit for Data Rates up to 43 Gb/s, ESSCIRC Fringe, Athens, Greece, Sept. 2009
- [24] Veigel, T.; Alpert, T.; Lang, F.; Grozing, M.; Berroth, M., "A Viterbi Equalizer Chip for 40 Gb/s optical communication links," Microwave Integrated Circuits Conference (EuMIC), 2013 European , vol., no., pp.49,52, 6-8 Oct. 2013