UNIVERSITY OF CALIFORNIA
IRVINE

CENTER FOR PERVERSIVE COMMUNICATIONS AND COMPUTING

GRADUATE FELLOWSHIP PROJECTS PROGRESS REPORTS
FALL 2006
# Projects

*Alphabetized According to Student Lastname*

<table>
<thead>
<tr>
<th>Student Name</th>
<th>Project Title</th>
<th>Advisor</th>
</tr>
</thead>
<tbody>
<tr>
<td>Jun Ho Bahn</td>
<td>A Multi-Processor Architecture for Multimode Mobile Digital TV Platforms</td>
<td>Nader Bagherzadeh</td>
</tr>
<tr>
<td>Karim El Defrawy</td>
<td>Filter Allocation against DDoS Attacks</td>
<td>Athina Markopoulou</td>
</tr>
<tr>
<td>Ferzad Etemadi</td>
<td>Joint Source-Channel Coding and Quantized Feedback for Quasi-Static Fading Channels</td>
<td>Hamid Jafarkhani</td>
</tr>
<tr>
<td>Fatemeh Fazel</td>
<td>Quasi-Orthogonal Space-Frequency Block Codes for MIMO-OFDM Channels</td>
<td>Hamid Jafarkhani</td>
</tr>
<tr>
<td>Alfred Grau</td>
<td>Design of MEMS-Reconfigurable Decoupling Networks for Closely-Spaced Multi-Element Antennas in MIMO Systems</td>
<td>Franco De Flaviis</td>
</tr>
<tr>
<td>Krishna Srikant Gomadam</td>
<td>Multi-User Parallel AF Network with Noise Correlation</td>
<td>Syed A. Jafar</td>
</tr>
<tr>
<td>Inanc Inan</td>
<td>Performance Analysis of the IEEE 802.11e Enhanced Distributed Coordination Function using Cycle Time Approach</td>
<td>Ender Ayanoğlu</td>
</tr>
<tr>
<td>Mahyar Kargar</td>
<td>High-Speed Adaptive Analog Decision Feedback Equalizer for Multimode Fiber Channels in 0.18µm CMOS</td>
<td>Michael Green</td>
</tr>
<tr>
<td>Amin Khajeh Djahromi</td>
<td>Aggressive Power Management Utilizing Fault Tolerant Adaptation for Wireless Systems</td>
<td>Ahmed Eltawil</td>
</tr>
<tr>
<td>Mohammad A. Makhzan</td>
<td>Iterative and Fault Tolerant Error Concealment JPEG2000 Codec/A Low Power Fault Tolerant Cache Capable of Voltage Scaling</td>
<td>Fadi Kurda hi</td>
</tr>
<tr>
<td>Sudeep Pasricha</td>
<td>SOC Power Optimization Framework</td>
<td>Nikil Dutt</td>
</tr>
<tr>
<td>Chitaranjan Pelur Sukumar</td>
<td>Information Rate Maximization in Wireless Systems Employing Channel Feedback under Affine Precoding</td>
<td>Ahmed Eltawil</td>
</tr>
<tr>
<td>Hulya Seferoglu</td>
<td>Content-Aware Streaming from Multiple Cameras</td>
<td>Athina Markopoulou</td>
</tr>
<tr>
<td>Ersin Sengul</td>
<td>Bit-Interleaved Coded Multiple Beamforming with Limited CSIT Feedback</td>
<td>Ender Ayanoğlu</td>
</tr>
<tr>
<td>Sudhir Srinivasa</td>
<td>Cognitive Radio – Opportunistic and Reconfigurable Communication with Distributed Side Information</td>
<td>Syed A. Jafar</td>
</tr>
<tr>
<td>Lei Zhou</td>
<td>Design of Fast Synchronizer for DS-UWB Transceiver</td>
<td>Payam Heydari</td>
</tr>
</tbody>
</table>
Introduction: In Digital Television Terrestrial Broadcasting (DTTB), DVB-T/H [1], ISDB-T [2], and T-DMB [3] are standardized and support digital video broadcasting for terrestrial as well as handheld devices. These standards have common signal processing features such as OFDM-based modulation, time-interleaving, and frequency-interleaving with different parameters. In order to handle these computation-intensive DSP applications, there are two fundamental problems to be solved. One is how to reduce the required computation intensity. The other is how to overcome the excessive communication bandwidth which is caused by data-intensive operations.

As a communication methodology in System-on-Chip (SoC) or MP-SoC, the 2-D mesh based Network-on-Chip (NoC) architecture has been proposed [4]. The feasibility of real implementation as well as its communication performance has been evaluated. Additionally with unique management of virtual channel, the performance is enhanced. Both simulation and analytical model have been developed. Specifically in the analytical model, the message length and buffer depth are coped with simultaneously which is the first approach in this area.

As a base platform for NoC based system architecture, our NePA (Network-based Processor Array) which are evolved from the previous MaRS (Macro-pipelined Reconfigurable System) will be used. Different from the original MaRS [5][5], the developed NoC architecture is applied and heterogeneous EU model which can be either programmable IPs or specific IPs is included. To support multiple standards in a single platform, the reconfigurability is an inevitable feature for the base platform.

Summary of Accomplishments: In the Fall Quarter 2006, as a base platform for evaluating several standards for mobile digital TV applications [1][2][3], a cycle accurate SystemC™ simulator of NePA was integrated by combining the previous NoC SystemC™ models of various routers, a generic NI (network interface), and a EU (execution unit). In the initial NePA system, the homogeneous EU model is considered. Depending on the performance evaluation of the selected standards, various heterogeneous EU models for acceleration of computation intensive parts will be adopted. As for various router models, from a basic simple buffer router to a complicated VC (virtual channel) enhanced router can be assigned by changing the compilation flags. Regarding the NI model, minimal functions of generic NI between current EU and router are defined and its SystemC™ model is developed. Maintaining generality of NI, additional functionality will be defined and improved for several different IP cores to be connected as EU models. Finally for the current homogeneous NePA system, OpenRISC core is chosen and simplified to be fit in MP-SoC environment. From the original OpenRISC core [6], several blocks for general purpose processor like caches, MMUs for instruction and data are omitted. And for interfacing with NI, some specific interfaces are added. Based on the provided tool chain of software development of OpenRISC core, a programming model and related utilities for mapping generated codes to each OpenRISC core are developed.

On going work: In order to verify the basic functionality of NePA SystemC™ simulator, basic applications such as fast Fourier transform (FFT) or some encryption algorithms like AES and DES will be tested. Throughout mapping these applications, the requirements of software APIs will be defined. And with previously developed RTL model of each element such as router, NI, and OpenRISC core, the integrated RTL model of NePA will be constructed. With this NePA RTL model, its hardware complexity and system-related parameters like power, area, and maximum operating clock will be estimated throughout synthesis and P&R. On the other hand, for evaluating the performance of proposed mobile digital TV standards, a high-level model for one of standards will be developed using C/C++ or Matlab™ environment.

Reference:
[1] Digital Video Broadcasting (DVB); Framing structure, channel coding and modulation for digital terrestrial television, European Standard (EN) 300 744 v1.5.1, European Telecommunication Standards Institute (ETSI), November 2004.
Abstract—Distributed Denial-of-Service (DDoS) attacks are a major problem in the Internet today. We consider filtering as a defense mechanism, and we study how to optimally use it to minimize the damage inflicted by the DDoS attack, subject to various resource constraints.

I. INTRODUCTION

Distributed Denial-of-Service Attacks (DDoS) attacks are an inherent weakness in the Internet any-to-any communication paradigm. During a DDoS flooding attack, a large number of compromised hosts coordinate and send unwanted traffic to a victim, typically a public access web-server, thus exhausting the victim’s access bandwidth and preventing it from serving its legitimate clients. Filtering (using access control lists) allows routers to selectively block unwanted traffic and thus to protect the victim from the flooding attack [3], [4]. It can be used by itself or in conjunction with other mechanisms. However, filters are an expensive resource and there are fewer filters in the routers today than attack sources. Therefore, filtering becomes a resource allocation problem.

We considered a single router, typically the victim’s gateway, which has a limited number of available filters. We studied the problem of optimal filter allocation to attackers or groups of attackers, so as to maximize the amount of good traffic preserved, subject to constraints in the number of filters and in the access bandwidth.

II. CONTRIBUTIONS

Single-Tier Problem. First, we considered filtering of entire prefixes/domains of attackers. At this granularity, there are typically enough filters, but there is also collateral damage due to filtering out legitimate traffic co-located with attack traffic. We formulated filtering of attack domains as an knapsack problem, which can be solved using a greedy algorithm.

Two-Tier Problem. Ideally, one would like to filter out individual attackers to eliminate the collateral damage; however, there are typically less filters than attackers. Therefore, we consider filtering at the granularity of both individual attackers and prefixes/domains. We formulated the problem of optimal two-tier filter allocation so as to maximize the preserved goodput, subject to constraints in the number of filters and in the access bandwidth, as a non-linear optimization problem. We computed the optimal solution using dynamic programming.

Heuristics. We also designed efficient heuristics that achieve near-optimal performance (acceptable defense levels using a small number of filters) at low complexity.

Performance Evaluation. We constructed realistic DDoS attack scenarios based on recent reports from worm propagation data, and we simulated the optimal solution and the heuristics for these scenarios. We showed that our filtering algorithms preserve significantly more good traffic preserved than commonly used filtering policies.

A detailed description of the formulations, algorithms and simulation results can be found in [1], [2].

III. PUBLICATIONS

The work during this quarter led to a workshop paper [1], which focuses on optimal filter allocation. A longer version, including the optimal allocation and heuristics, can be found in the technical report [2]. Both papers are available at our websites.

IV. FUTURE WORK

We continue this work in several directions. First, we currently study the structural properties of the optimal solution and we improve our heuristics. Second, we plan to study the filter allocation problem without assuming perfect knowledge about the good and bad traffic. Finally, we got interested in investigating novel attack mechanisms that could potentially be used to launch DDoS attacks.

REFERENCES

Joint Source-Channel Coding and Quantized Feedback for Quasi-Static Fading Channels

Farzad Etemadi, Hamid Jafarkhani
Center for Pervasive Communications and Computing
University of California, Irvine, CA 92697
fetemadi@uci.edu, hamidj@uci.edu

I. SUPERPOSITION CODING

Our research is motivated by the recent demand for multimedia content delivery over wireless channels. Unlike data transmission systems where the objective is to minimize the bit errors, the design objective in multimedia transmission systems is to reduce the expected distortion of the received signal. In this context, the Shannon’s separation theorem states that under an infinite block length assumption, the source and channel coding tasks can be performed separately. The latter assumption is easily violated in delay-sensitive and power-limited applications such as the case for slowly fading channels. Thus, we study joint source-channel coding approaches to this problem. We consider a scenario where only the receiver has knowledge about the channel, and in this context, superposition coding with successive decoding is optimal. The previous work on this problem has been limited to the asymptotically high-SNR regimes. We formulated the joint source-channel coding problem for finite SNRs, and proposed an efficient algorithm to allocate the transmission rate and powers of different layers of such a scheme. Fig. 1 shows the numerical results for 1x1 and 4x1 systems and different number of layers. We showed that a small number of layers is usually sufficient to achieve all the layering gain, and that equal rate allocation is optimal for a wide range of operating conditions.

II. UNRELIABLE QUANTIZED FEEDBACK

It is known that the knowledge of the channel state information at the transmitter (CSIT) can significantly improve the performance. The CSIT, however, requires a feedback link from the receiver to the transmitter. A power-limited mobile device cannot allocate much resources to the feedback link and as a result, a very limited amount of information is fed back to the transmitter. Thus, the channel state information is usually quantized and sent to the transmitter. The existing work usually assumes a noiseless feedback link that is not realistic in our power-limited scenario. In our work, we assume that the feedback channel behaves similarly to the forward link, and then formulate the rate maximization problem in the presence of fading in the feedback channel. The effect of the errors in the feedback link are minimized by using a channel optimized scaler quantizer (COSQ) instead of a regular quantizer. For a high quality feedback channel, the proposed COSQ performs close to the noiseless feedback case, while its performance converges to the no-feedback scenario as the feedback channel quality degrades. In our work, we considered systems with and without power adaptation at the transmitter. In Fig. 2 we have shown the achievable rates, the COSQ Voronoi regions, and the power adaptation gains. In this work, the design criteria is to maximize the expected transmission rate that is not optimal if the goal is to transmit an analog source. Since the main thrust of our research is multimedia transmission, the next step in our work is to consider the joint source-channel coding problem with unreliable feedback.

REFERENCES

I. PROJECT MOTIVATION AND OVERVIEW

The combination of MIMO (Multiple-Input Multiple-Output) and OFDM (Orthogonal Frequency Division Multiplexing) is a promising technique for high data rate broadband wireless systems. A frequency selective channel offers an additional degree of diversity known as multipath or frequency diversity. Space-Frequency (SF) and Space-Time-Frequency (STF) codes have been designed to achieve some levels of space and multipath diversity. Reference [1] proposes a repetition mapping technique to transform the existing space-time codes, designed for quasi-static flat fading channels, to construct full-diversity codes in frequency selective fading channels. Their proposed method provides a tradeoff between diversity and symbol rate. Later on, the authors proposed a rate-one, full-diversity space-frequency block code in [2]. Their proposed scheme can obtain a target diversity gain but the decoding complexity grows exponentially with the desired diversity. We use their design as a reference to compare our proposed structure in terms of performance and complexity. Quasi-orthogonal space-time block code (QOSTBC) structures for quasi-static channels were first introduced in [3]. Original quasi-orthogonal designs provide rate-one codes and pairwise Maximum Likelihood (ML) decoding but fail to achieve full-diversity. Later on, improved quasi-orthogonal codes were proposed through constellation rotation. A rotated QOSTBC provides both full diversity and rate one and performs better in low and high SNR, compared to OSTBC. These benefits together with the simple decoding capabilities of rotated QOSTBCs, motivate us to design Space-Frequency codes based on quasi-orthogonal structures.

II. PROGRESS REPORT

During the course of this project, based on generalizing the QOSTBCs, we provide a systematic method of designing rate-one, full-diversity space-frequency codes for two transmit antennas. As the simulation results suggest, the proposed codes have a better performance than the existing codes in the literature. We also design a class of quasi-orthogonal space-time-frequency block codes. Our proposed STF codes, in addition to the space and frequency diversity gains, are able to exploit the temporal diversity gains of the channel as well, thus achieving the maximum possible diversity level. If the channel is quasi-static over $B$ OFDM symbol durations, there are no temporal diversity gains offered by the channel. In this case, we propose to use the STF structure to reduce the decoding complexity. Both SF and STF code structures provide full symbol rate (one symbol per frequency tone per time slot) and achieve any desired multipath (frequency) diversity available in the frequency selective fading channel. In Figure 1, the performances of the QOSF and QOSTF codes are compared with the SF code of [2] for an exponential decay channel model with an rms delay spread of $5 \mu s$ and the maximum delay spread is set to be 10 times the rms delay spread. As the simulation result suggests, the proposed QOSF code demonstrates a superior performance over that of [2].

REFERENCES

Multi-user Parallel AF relay Network with Noise Correlation

Krishna S. Gomadam, Advisor: Prof. Syed Ali Jafar
Center for Pervasive Communications and Computing
University of California, Irvine, CA 92697-2625

I. INTRODUCTION

Wireless mesh networks where information is transferred via multiple hops and routes provide significant throughput enhancement and have been the focus of much recent work [1], [2]. Various relay strategies have been studied in literature. Among these strategies, amplify and forward (AF) has been found to be highly suitable for parallel relay networks for its ability to pass on soft information. For an AF relay network, the relay design involves optimizing the relay amplification factors to maximize performance.

Most work in the literature assumes independent noise at the relay terminals. However, noise correlation between nodes occurs in wireless relay networks due to the following reasons:

1) Common Interference
   Due to the broadcast nature of wireless networks the relays are exposed to a set of common interferers resulting in correlated noise at the nodes.

2) Noise propagation with multi-hop AF relaying
   For example, in a three-hop AF network, every relay in the second hop observes a linear combination of noise from the relays of the preceding stage. This clearly results in correlated noise at the second-hop relays.

The natural question is whether the relays can exploit the correlation knowledge to improve performance. In practice, learning correlation may result in network overheads. Whether such overheads are justified depends on the potential advantage of learning correlation. Thus, our goal in this work is to estimate the improvement in performance when perfect correlation knowledge is available at the relays. The optimal relay design in this case will have the following two objectives:

1) Desired signals to add up in phase at the destination.
2) Noise terms to be canceled.

In general, these objectives cannot be satisfied simultaneously and the relay design can be expected to be a trade-off between the two objectives.

II. SUMMARY OF WORK

We consider a multi-source parallel relay network as shown in Fig. 1. In this model, $L$ source nodes wish to communicate to a common destination through a set of $N$ relays. Here each source has power $P_k$ and the relays have a sum power constraint of $P_R$. Due to the half-duplex nature of the relays, communication takes place in two slots. The relay received symbols during the first time slot is given by

$$r = \sum_{k=1}^{L} f_k x_k + n_R$$

where the elements of $n_R$ are AWGN with the covariance matrix given by

$$K = \mathbb{E}[n_R n_R^H]$$.

In the second slot, the relays amplify their received signal subject to a sum power constraint of the relays. The received signal at the destination is

$$y = \sum_{k=1}^{L} d^H G_k x_k + d^H G n_R + n_D$$

where $G = \text{diag}(g)$.

We seek to maximize the sum rate of the network subject to the sum power constraint at the relays. The following theorem provides the optimal relay design and maximum achievable sum rate.

**Theorem 1:** The maximum achievable sum rate of a two-hop multi-source parallel relay network consisting of half-duplex AF relays with correlated noise is given by

$$R = \log_2(1 + \text{SNR}^*)$$

$$\text{SNR}^* = \frac{P P_R \lambda_{\text{max}}}{\sum_{k=1}^{L} (f_k \otimes g)^2}$$

![Fig. 1. Two Hop Multi-User Parallel Relay Network](image-url)
Fig. 2. Average sum rate of a single user system as a function of total relay power $P_R$.

Fig. 3. Average sum rate of a single user system as a function of the number of relays for $P=P_I=10$ dB when $P_R \to \infty$.

where

$$A = K \otimes gg^H P_R + \sum_{k=1}^{L} f_k^H f_k P_k \otimes I + I.$$

Please refer [3]

Fig. 2 shows the sum rate of the system with and without correlation knowledge as a function of the total relay power. Note that the performance improvement is significant and the difference increases with relay power. Fig. 3 plots the average sum rate of the system versus the number of relays. It can be observed that the performance gain increases with the number of relays. In the next report we will characterize the asymptotic performance of the schemes with and without correlation knowledge.

Future work will involve relay optimization for a three-hop AF network by utilizing the result in (3). We are also interested in a practical scheme for learning correlation that can achieve most of the gains promised by Theorem 1.

REFERENCES


Alfred Grau*, and Franco De Flaviis*
*University of California at Irvine, USA

I. INTRODUCTION

In today’s wireless communication applications, compact and efficient antennas/arrays are required to satisfy the spacing requirements arising from multi-element antenna architectures (beamforming switching, multiple input multiple output, etc) and for integration purposes with other radio modules. Closely-spaced antennas suffer from mutual coupling which is a detrimental electromagnetic phenomenon that reduces the efficiency of the antennas. External decoupling networks can be used to compensate the mutual coupling among the antennas such that a good efficiency can be achieved over a reasonable frequency band.

As seen by the author, for a given antenna array geometry, the design of the decoupling networks is not unique. This opens the possibility to build MEMS-reconfigurable decoupling networks that while decoupling closely-spaced antennas, they are also simultaneously able to produce a set of uncorrelated radiation patterns/modes. This reconfigurable capability can be used in a MIMO [1] system to further enhance its performance.

II. ACHIEVEMENTS

At this stage of the project, the first goal of the project has already been achieved, that is, a methodology for the design of specific decoupling networks for different antenna array topologies that are suitable to be made reconfigurable through MEMS technology, has been developed. This methodology is based on the Takagi Decomposition of the measured Scattering Parameters matrices of the arrays. For the seek of knowledge, a Scattering Matrix is a matrix that relates the voltage wave incident on the ports to those reflected from the ports, and is normally measured directly with a vector network analyzer [2][3]. As shown by the authors, the Takagi factorization, provides a methodology to design decoupling networks that are reciprocal, that is, physically realizable, for any MEA structure. The multiplicity of the eigenvalues of the factorization is an indicator of the number of different decoupling networks that exist for a given array configuration.

We are currently focusing on a circular array of four closely-spaced antennas. Using the aforementioned methodology, we have generated two different decoupling networks. The first one uses four 180° hybrids and the second one uses three 180° hybrids. Fig. 2 shows the schematic and the scattering representation of these two networks. Due to the fact that these networks are different, also the resultant radiation patterns are different and orthogonal. This is a novel capability that can be used to improve the performance of current MIMO systems. Fig. 3 shows the simulated resultant radiation patterns when the 3-hybrid network is connected to the circular array. We are currently focusing on overlapping these two networks into one reconfigurable network integrating MEMS switches.

III. FUTURE WORK

In the next month we plan to finalize the design of the proposed reconfigurable decoupling network for the circular array of four antennas, and therefore conduct measurements on the radiation pattern and scattering parameters of the antenna array when connected to the proposed decoupling networks. Those measurements will be obtained using the laboratory equipment, such as: HP 8510 2-50GHz network analyzer, and a conical compact anechoic chamber. Finally we will conduct system level simulations using MATLAB to find out the capacity and BER of the new proposed system, as well as specific experiments to obtain the system capacity and diversity order of the system.

REFERENCES

Fig. 1. Network model of the proposed decoupling networks generated using the Tagaki factorization (decoupling + matching network approach).

Fig. 2. Schematic of the 4-hybrid and 3-hybrid decoupling networks, associated with a circular array of four antennas.

Fig. 3. Simulated radiation pattern associated with the 3-hybrid decoupling network.
I. INTRODUCTION

The IEEE 802.11e standard [1] specifies the Hybrid Coordination Function (HCF) which enables prioritized and parameterized Quality-of-Service (QoS) services at the MAC layer. The HCF combines a distributed contention-based channel access mechanism, referred to as Enhanced Distributed Channel Access (EDCA), and a centralized polling-based channel access mechanism, referred to as HCF Controlled Channel Access (HCCA). We confine our analysis to the EDCA scheme, which uses Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) and slotted Binary Exponential Backoff (BEB) mechanism as the basic access method. The EDCA defines multiple Access Categories (AC) with AC-specific Contention Window (CW) sizes, Arbitration Interframe Space (AIFS) values, and Transmit Opportunity (TXOP) limits to support MAC-level QoS and prioritization.

A cycle time is the duration in which an arbitrary tagged user successfully transmits one packet on average [2]. This quarter, we derived the explicit mathematical expression of the AC-specific EDCA cycle time. The derivation considers the AIFS and CW differentiation by employing an average collision probability analysis. The EDCA cycle time is used to predict the first moments of the saturation throughput, the service time, and the packet loss probability. We show that the results obtained using the cycle time model closely follow the accurate predictions of the previously proposed Discrete-Time Markov Chain (DTMC) model [3]. It can serve as a simple and practical alternative model for the EDCA saturation throughput analysis.

II. EDCA CYCLE TIME ANALYSIS

The difference in AIFS of each AC in EDCA creates the so-called contention zones as shown in Fig. 1 [4]. In each contention zone, the number of contending stations may vary. Therefore, the collision probability cannot simply be assumed to be constant among all ACs. To maintain simplicity of the cycle time analysis, we employ an average analysis on the AC-specific collision probability rather than calculating it separately for different AIFS and backoff slots as in [3]. We calculate the AC-specific collision probability according to the long term occupancy of AIFS and backoff slots by generalizing the approach of [4, Section IV-D] for four ACs.

We can define $p_{c_{i,x}}$ as the conditional probability that AC $i$ experiences either an external or an internal collision given that it has observed the medium idle for $AIFS_x$ and transmits in the current slot (note $AIFS_x \geq AIFS_i$ should hold). For the following, in order to be consistent with the notation of [1], we assume $AIFS_0 \geq AIFS_1 \geq AIFS_2 \geq AIFS_3$. Let $d_i = AIFS_i - AIFS_3$. Following the slot homogeneity assumption of [5], assume that each AC $i$ transmits with constant probability, $\tau_i$. Also, let the total number AC $i$ flows be $N_i$. Then, for the heterogeneous scenario in which each station has only one AC

$$p_{c_{i,x}} = 1 - \prod_{i': d_i' \leq d_x} (1 - \tau_{i'})^{N_{i'}} \frac{(1 - \tau_{i})^{N_{i}}}{(1 - \tau_{i})}.$$  \hfill (1)

We use the Markov chain shown in Fig. 2 to find the long term occupancy of contention zones. Each state represents the $n^{th}$ backoff slot after completion of the $AIFS_3$ idle interval following a transmission period. The Markov chain model uses the fact that a backoff slot is reached if and only if no transmission occurs in the previous slot. Moreover, the number of states is limited by the maximum idle time between two successive transmissions which is $W_{\text{min}} = \min(CW_{i,max})$ for a saturated scenario. The probability that at least one transmission occurs in a backoff slot in contention zone $x$ is

$$p_B^{(x)} = 1 - \prod_{i': d_i' \leq d_x} (1 - \tau_{i'})^{N_{i'}}.$$ \hfill (2)

Given the state transition probabilities as in Fig. 2, the long term occupancy of the backoff slots $b'_n$ can be obtained from...
the steady-state solution of the Markov chain. Then, the AC-specific average collision probability $p_{ci}$ is found by weighing zone specific collision probabilities $p_{ci,x}$ according to the long term occupancy of contention zones (thus backoff slots)

$$p_{ci} = \frac{\sum_{n=d_i+1}^{W_{min}} b'_n p_{si,n}}{\sum_{n=d_i+1}^{W_{min}} b'_n}$$ \hspace{1cm} (3)

where $x = \max \left( y \mid d_y = \max (d_z \mid d_z \leq n) \right)$ which shows $x$ is assigned the highest index value within a set of AIDs that have AIFS smaller or equal to $n + AIFS_{i}$. This ensures that at backoff slot $n$, $AC_{i}$ has observed the medium idle for AIFS$_{i}$. Therefore, the calculation in (3) fits into the definition of $p_{ci,x}$.

Intuitively, it can be seen that each user transmitting at the same AC has equal cycle time, while the cycle time may differ among AIDs. Therefore, let $E_{i}[t_{cyc}]$ be average cycle time for a tagged $AC_{i}$ user. $E_{i}[t_{cyc}]$ can be calculated as the sum of average duration for $i)$ the successful transmissions, $E_{i}[t_{suc}]$, $ii)$ the collisions, $E_{i}[t_{coll}]$, and $iii)$ the idle slots, $E_{i}[t_{idle}]$ in one cycle of $AC_{i}$.

In order to calculate average time spent on successful transmissions during an $AC_{i}$ cycle time, we should find the expected number of total successful transmissions between two successful transmissions of $AC_{i}$. Let $Q_{i}$ represent this random variable. Also, let $\gamma_{i}$ be the probability that the transmitted packet belongs to an arbitrary user from $AC_{i}$ given that the transmission is successful. Then,

$$\gamma_{i} = \frac{\sum_{n=d_i+1}^{W_{min}} b'_n p_{si,n}/N_{i}}{\sum_{n=d_i+1}^{W_{min}} b'_n}$$ \hspace{1cm} (4)

$$p_{si,n} = \begin{cases} 
\frac{N_{i} \tau_{i}}{(1-\tau_{i})} \prod_{i',d_{i'},i' \leq n-1} (1-\tau_{i'})^{N_{i'}}, & \text{if } n \geq d_{i} + 1 \\
0, & \text{if } n < d_{i} + 1.
\end{cases}$$ \hspace{1cm} (5)

Then, the Probability Mass Function (PMF) of $Q_{i}$ is

$$\Pr\{Q_{i} = k\} = \gamma_{i}(1-\gamma_{i})^{k}, \hspace{1cm} k \geq 0. \hspace{1cm} (6)$$

We can calculate expected number of successful transmissions of any $AC_{j}$ during the cycle time of $AC_{i}$, $ST_{j,i}$, as

$$ST_{j,i} = N_{j} E[Q_{i}] \frac{\gamma_{j}}{1-\gamma_{i}}. \hspace{1cm} (7)$$

As it can easily be seen, (7) confirms the intuition that each user from $AC_{i}$ can transmit successfully once on average during the cycle time of another $AC_{i}$ user. Including the own successful packet transmission time of tagged $AC_{i}$ user in $E_{i}[t_{suc}]$, we find

$$E_{i}[t_{suc}] = \sum_{\forall j} ST_{j,i} T_{s,i}.$$ \hspace{1cm} (8)

where $T_{s,i}$ is defined as the time required for a successful packet exchange sequence and depends on the medium access method and packet size for $AC_{j}$ [3],[5].

To obtain $E_{i}[t_{coll}]$, we need to calculate average number of users that involve in a collision, $N_{c}$, at the $n^{th}$ slot after the last busy time for given $N_{i}$ and $\tau_{i}, \forall i$. Let the total number of users transmitting at the $n^{th}$ slot after last busy time be denoted as $Y_{n}$. We see that $Y_{n}$ is the sum of random variables, $Binomial(N_{i},\tau_{i}), \forall i : d_{i} \leq n - 1$. Employing simple probability theory, we can calculate $N_{c,n} = E[Y_{n}]$ if $N_{c}$ is greater than 2. If we let the average number of users that involve in a collision at an arbitrary backoff slot be $N_{c}$, then

$$N_{c} = \sum_{\forall n} b'_n N_{c,n}.$$ \hspace{1cm} (9)

We can also calculate the expected number of collisions that an $AC_{j}$ user experiences during the cycle time of an $AC_{i}$, $CT_{j,i}$, as

$$CT_{j,i} = \frac{p_{ci}}{1-p_{ci}} ST_{j,i}.$$ \hspace{1cm} (10)

Then, defining $T_{c,j}$ as $T_{s,i}$ in (8)

$$E_{i}[t_{coll}] = \frac{1}{N_{c}} \sum_{\forall j} CT_{j,i} T_{c,j}.$$ \hspace{1cm} (11)

Given $p_{ci}$, we can calculate the expected number of backoff slots $E_{i}[t_{bo}]$ that $AC_{i}$ waits before attempting a transmission. Let $W_{i,k}$ be the CW size of $AC_{i}$ at backoff stage $k$ [3]. Note that, when the retry limit $\tau_{i}$ is reached, any packet is discarded. Therefore, another $E_{i}[t_{bo}]$ passes between two transmissions with probability $p_{ci}^{r}$.

$$E_{i}[t_{bo}] = \frac{1}{1-p_{ci}^{r}} \sum_{k=1}^{r} p_{ci}^{k-1} (1-p_{ci}) \frac{W_{i,k}}{2}$$ \hspace{1cm} (12)

Noticing that between two successful transmissions, $AC_{i}$ also experiences $CT_{i,i}$ collisions,

$$E_{i}[t_{idle}] = E_{i}[t_{bo}] (CT_{i,i}/N_{i} + 1) t_{slot}.$$ \hspace{1cm} (13)

An approximate method for calculating $\tau_{i}$ is using the average backoff time [6].

$$\tau_{i} = \frac{1}{E_{i}[t_{bo}] + 1}.$$ \hspace{1cm} (14)

After solving the fixed-point equations numerically for $\tau_{i}$ and $p_{ci}, \forall i$, we can calculate each component of the average cycle time for $AC_{i}$, $\forall i$. Let $T_{p}$ be the average packet duration. Then, the normalized throughput for $AC_{i}$ is

$$S_{i} = \frac{N_{i} T_{p}}{E_{i}[t_{suc}] + E_{i}[t_{coll}] + E_{i}[t_{idle}]}.$$ \hspace{1cm} (15)

Simply, the average packet drop probability is

$$p_{i,drop} = p_{ci}^{r}.$$ \hspace{1cm} (16)
The AC-specific cycle time is directly related but not equal to the mean protocol service time. By definition, the cycle time is the duration between successful transmissions. We define the average protocol service time such that it also considers the service time of packets which are dropped due to retry limit. On the average, \( 1/p_{i,drop} \) service intervals correspond to \( 1/(p_{i,drop} - 1) \) cycles. Therefore, the mean service time can be approximated from the cycle time as
\[
\mu_i = \frac{1}{p_{i,drop}} E_i[t_{cyc}]. \tag{17}
\]

### III. Numerical and Simulation Results

We validate the accuracy of the numerical results calculated via the proposed EDCA models by comparing them to the simulation results obtained from ns-2 [7]. For the simulations, we employ the IEEE 802.11e HCF MAC simulation model for ns-2.28 that we developed [8]. This module implements all the EDCA and HCCA functionalities stated in [1].

As all the work on the subject in the literature, the simulations consider ACs that transmit fixed-size User Datagram Protocol (UDP) packets. Each station runs only one AC. In simulations, we consider two ACs, one high priority (AC3) and one low priority (AC1). Each AC has always buffered packets that are ready for transmission. We set \( AIFS_{N_1} = 3, AIFS_{N_3} = 2, CW_{1, min} = 31, CW_{3, min} = 15, m_1 = m_3 = 3, r_1 = r_3 = 7 \). For both ACs, the payload size is 1000 bytes. Again, as in most of the work on the subject, the simulation results are reported for the wireless channel which is assumed to be not prone to any errors during transmission. The errored channel case is left for future study.

All the stations have 802.11g Physical Layer (PHY) using 54 Mbps and 6 Mbps as the data and basic rate respectively \( (T_{slot} = 9 \mu s, SIFS = 10 \mu s) \) [9]. The simulation runtime is 100 seconds.

Fig. 3 shows the normalized throughput of each AC when both \( N_1 \) and \( N_3 \) are varied from 5 to 30 and equal to each other for the cycle time analysis. Analytical results of our DTMC model [3] and the model in [10] are also added for comparison.

### III. Numerical and Simulation Results

We validate the accuracy of the numerical results calculated via the proposed EDCA models by comparing them to the simulation results obtained from ns-2 [7]. For the simulations, we employ the IEEE 802.11e HCF MAC simulation model for ns-2.28 that we developed [8]. This module implements all the EDCA and HCCA functionalities stated in [1].

As all the work on the subject in the literature, the simulations consider ACs that transmit fixed-size User Datagram Protocol (UDP) packets. Each station runs only one AC. In simulations, we consider two ACs, one high priority (AC3) and one low priority (AC1). Each AC has always buffered packets that are ready for transmission. We set \( AIFS_{N_1} = 3, AIFS_{N_3} = 2, CW_{1, min} = 31, CW_{3, min} = 15, m_1 = m_3 = 3, r_1 = r_3 = 7 \). For both ACs, the payload size is 1000 bytes. Again, as in most of the work on the subject, the simulation results are reported for the wireless channel which is assumed to be not prone to any errors during transmission. The errored channel case is left for future study.

All the stations have 802.11g Physical Layer (PHY) using 54 Mbps and 6 Mbps as the data and basic rate respectively \( (T_{slot} = 9 \mu s, SIFS = 10 \mu s) \) [9]. The simulation runtime is 100 seconds.

Fig. 3 shows the normalized throughput of each AC when both \( N_1 \) and \( N_3 \) are varied from 5 to 30 and equal to each other. As the comparison with our DTMC model [3] and the simulation results reveal, the cycle time analysis can predict saturation throughput with marginal errors. We also highlight the importance of accurate collision probability treatment by including results from an analytical model that does not consider varying collision probability among different AIFS and backoff slots [10]. The saturation throughput prediction error for the model presented in [10] is significant even for a scenario with only 2 ACs with AIFSN difference of 1.

Fig. 4 and Fig. 5 display the mean protocol service time and packet drop probability respectively for the same scenario of Fig. 3. As comparison with our DTMC model [3] and the simulation results show, both performance measures can accurately be predicted by the cycle time model with marginal error.
REFERENCES

**Project Name:** High-Speed Adaptive Analog Decision Feedback Equalizer for Multimode Fiber Channels in 0.18\( \mu \text{m} \) CMOS

**CPCC Affiliate Professor:** Michael Green

**Mailing Address:** 544 Engineering Tower, Irvine, CA, 92697-2625

**Phone:** (949) 824-1656  **E-mail:** mgreen@uci.edu

**Student Fellowship Recipient:** Mahyar Kargar

**Introduction:**

The goal of the next generation high-speed Ethernet is to make data communications 10 times faster over the already installed Multi mode Fiber (MMF) now being used up to 1 Gb/s data rates. The main challenge of 10 Gb Ethernet over MMF is the Inter Symbol Interference (ISI) caused by the modal dispersion of the MMF channel. Using adaptive equalizers is known to make the data communications over short ranges of the MMF possible. In this project a high-speed adaptive equalizer is designed to combat the ISI caused by the band-limited MMF channel. The block diagram of the system is shown in the following figure. A 3-tap T-spaced Feedforward and 3-tap T-spaced feedback equalizer are designed. The feedback path is realized by high speed flip flops. The slicer is realized by using very high speed differential pairs loaded with inductors to achieve the high bandwidth required for slicer operation. The slicer output is applied to a clock and data recovery (CDR) circuit (not shown in the figure) to extract the clock and feed it back to the D flip-flops used as delay elements in the feedback path. The Least Mean-Square (LMS) algorithm will be used to adapt the coefficients in the feedback path.

**Summary of Accomplishments:**

The design of high speed delay cells, the slicer, D flip-flops and summing circuits have been completed. The designed blocks have been simulated over all temperature corners. A delay locked loop (DLL) has been designed which will force the delay of each feedforward delay cell to exactly 100ps (T-space) over temperature corners. The top level DLL operation has been verified over all temperature corners. Moreover a gain control loop has been realized to make the gain of each Feedforward delay cell exactly 0dB which creates the exact delayed replica of the input signal at the output of the delay cells. The delay cell has been laid out and post-layout extracted simulations have been done. To realize the LMS algorithm integrators and analog multipliers have been realized. Matlab simulations have been performed to simulate the non-idealities of the integrator such as DC gain and GBW and their effect on the LMS convergence.

**Ongoing Work:**

1. Completing the design of the LMS circuit and verifying its convergence properties using transistor level simulations.
2. Design and top-level simulation of the CDR block including phase detector, Gm cell, loop filter and 10GHz VCO.
3. Top-level simulation of the feedback path including the D flip-flop delays, the slicer and the summing circuit. Also other top level simulations like CDR and FFE.
4. Layout and post-layout extraction of the remaining blocks.
5. Matlab simulations to find the system behavior and reach for a family of MMF channels.

Amin Khajeh Djahromi, Advisor: Professor Ahmed Eltawil
Center for Pervasive Communications and Computing
Department of Electrical Engineering and Computer Science
The Henry Samueli School of Engineering
University of California, Irvine
Irvine, California 92697-2625
Email: akhajehd@uci.edu

Abstract—This report presents a new approach to co-designing communication systems and their respective hardware architectures. We show that by taking into account the specific needs and assumptions of the algorithms running on hardware, the circuit specifications targeted by ASIC engineers can be relaxed. This in turn leads to optimal designs in both power consumption and robustness. A case-study of a complete WCDMA modem incorporating this approach shows a savings of 23% in embedded memory power consumption and a total of 13% power savings for the whole system.

I. INTRODUCTION

In this report, we present an alternative approach to designing systems that have built in inherent redundancy. This redundancy is used to relax the 100% constrain on the underlying computational architecture. Communication systems in general and wireless systems in particular are a perfect fit due to the high level of redundancy introduced at the system design time. To do so, during Fall 2006 we examined the relationship between (a) the constituent components of an architecture and their vulnerability in terms of power consumption and reliability as a function of the operating conditions, and (b) the needs, assumptions and requirements of the algorithms executing on the design. Clearly, if these needs are ignored, then the architecture is required to provide the correct result 100% of the time, and the circuit designer will then have to use correcting techniques [1] in order to ensure meeting these requirements. If the algorithm implemented has the ability to accept and possibly even correct hardware induced errors, it becomes possible to co-design the algorithm and the hardware simultaneously and thereby reduce the burden the circuit designer faces with technology scaling.

II. POWER AWARE DESIGNS

Dynamic and leakage power are the two major power components of every integrated circuit. Although leakage power was insignificant for technology nodes down to 180nm, it is becoming a crucial power component for smaller technology nodes. The exponential relation of the leakage power with the supply voltage [3] and the square relation of the dynamic power with V\text{dd} [2], point out the importance of voltage scaling as one of the most effective means of reducing power consumption in leakage and dynamic mode. In most cases embedded memories are the limiters, since they exhibit the most vulnerability to supply changes as compared to logic [4].

By utilizing voltage islands, logic can be run at a very low voltage while still meeting throughput requirements, whereas memories are operated at a “safe” voltage. However, it is also important to note that a significant (sometimes even dominant) fraction of embedded metrics (power, performance, area) is related to memories. Furthermore, in the case of communication transceivers, a large portion of the design area is consumed by data buffering memories. These memories store raw soft bit values that have multiple levels of redundancy as follows:

1. At the algorithmic level, coding redundancy is inserted to protect against channel errors.
2. At the implementation levels, soft bit representations are available prior to any decisions being made.
3. The current approach to design specifies the architecture assuming operation at the minimum sensitivity of the device. However, the signal will be close to sensitivity only a small fraction of the time. The majority of the time, the receiver will be experiencing a relatively better SNR. At high SNR’s an optimal design should target these extra “algorithmic dBs” to provide runtime flexibility to minimize power consumption.

Obviously, some memories are untouchable, such as program memories or control memories. However a majority of on-chip memories are used for data storage and buffering operation, such as frame buffering, time interleaving and de-interleaving etc. These memories contain inherently redundant data and can be a target of aggressive voltage scaling.

III. MEMORY FAILURE ANALYSIS RESULTS

We have analyzed the total memory cell failure probabilities including read, write and destructive read probabilities for different V\text{dd} (and performances between 100MHz to 450MHz). We used Hspice to simulate the cell and Matlab to analyze the results. The results are given in Figure 2. This figure illustrates that the designers can tradeoff error tolerance versus supply voltage for a fixed performance, and that the decision should be made during runtime, contrary to the accepted method of assuming worst case conditions at design time and operating continuously assuming that mode. In other words, to achieve a low power solution, the designer must first decide on the acceptable level of error tolerance that is permissible by the application and the overall system design while still maintaining the required performance. Given that level, and a required performance level (the cell speed), the appropriate V\text{dd} can be selected based on Figure 2. For example, a wireless receiver may tolerate different error levels when carrying video data versus audio or image data. Another situation arises when the battery level becomes low. The system may (again by design) decide to increase the tolerated error level and suffer a degradation in quality but allow an even more aggressive V\text{dd} scaling and as a result battery lifetime is increased.
IV. ALGORITHMIC TOLERANCE

As described in the previous sections, tolerating a certain degree of errors in memories will yield considerable power savings. The question now is, given an application such as wireless communications, what system parameters can be manipulated to allow a limited amount of hardware errors to occur in memory? The obvious choice of system redundancy is added by channel coding techniques such as Turbo and Viterbi codes. Turbo codes are especially suited to this application due to the iterative nature of their decoders. Furthermore, it provides the system engineer with degrees of freedom that can be changed to compensate for varying degrees of data corruption, namely a) the depth of the interleaving memory and b) the number of iterations required to converge to a target error performance. We set up a simulation for WCDMA system and we showed that by doubling the trace back depth and introducing 0.1% errors in the buffer memories of the system, the faulty system performs almost the same as the system with no errors in terms of BER. The result is given figure 2.

![Figure 2(a), Compensating memory errors in WCDMA system](image)

![Figure 2(b), Probability of error for 70nm 6T SRAM cell for different Vdd's.](image)

V. MEMORY FAILURE ANALYSIS RESULTS

A. Partitioning the Design

In order to apply the fault adaptive technique to the WCDMA the area and power metrics of the system have to be partitioned into logic and memory categories. Furthermore, the memory section has to be divided into fault tolerant memories and non-fault tolerant memories. A defect tolerant memory is a memory that is used primarily to buffer data and thus can be a target of aggressive power management.

It is interesting to note that the data buffering and de-interleaving memories (defect tolerant candidates) consume approximately 50% of the overall memory required for the entire modem as shown in Figure 3. Furthermore, two instances of the fault tolerant memories alone account for 45% of the total on-chip memory. These two instances are used to buffer the received data right after RAKE combining and to store the de-interleaved symbols prior to rate matching and processing by the decoder. Figure 3 also illustrates how the power consumption would scale by migrating from 180nm to 70nm technologies. Note the increase in memory power consumption as compared to logic when scaling to lower geometries due to the dominance of leakage currents in memory at these technology nodes (unless circuit techniques are used to mitigate that).

B. Power Saving Calculations

As discussed in section IV, redundancy in the form of channel coding gain can be utilized to offset the impact of circuit errors. At low SNR, the architecture can run at the highest allowable voltage thus nullifying the contribution of hardware to the noise floor and allowing the full coding gain to be used for channel errors. At higher SNRs, the voltage can be reduced to save power and the coding gain can be used to compensate for these power management induced failures. For example, referring back to Figure 2, one can identify that a 25% reduction in Vdd results in a $10^{-5}$ BER from memory. This reduction in Vdd results in a saving of 60.68% in leakage power and 44% in dynamic power for the 70nm technology. These results are based on a simulation model as indicated in section III. Extending the results to a 180 nm technology the saving become 46% saving in leakage and 44% saving in dynamic power. These numbers are for the defect tolerant memory, which account for 50% of the total memory on chip. Figure 4 illustrates that for a 180 nm technology chip, an overall reduction of 13% in power consumption is possible. These gains reach close to 20% when migrating to a 70 nm based technology. It is important to note that these saving are independent of any other method (i.e. reducing the frequency, etc.). Furthermore, they can be applied in conjunction with other conventional techniques such as clock gating methods

![Figure 3, Power and area comparison for 180nm and 70nm](image)

![Figure 4, Power savings by utilizing the proposed method](image)

REFERENCES

Project 1 Name: Iterative and Fault Tolerant Error Concealment JPEG2000 codec
Project 2 Name: A Low Power Fault Tolerant cache capable of voltage scaling.

Graduate Student: Mohammad A Makhzan EECS (on CPCC Fellowship for WQ 2006)
CPCC Affiliate Professors: Fadi J. Kurdahi

Project 1 overview:
In this project we are developing an iterative error concealment to deal with parametric errors introduced at the memory of JPEG2000 SOC encoder. This concealment methodology allows usage of defective memories in the structure of JPEG encoder SOC and/or reducing encoder memory supply voltage. Using partially defected memory chips reduces manufacturing cost. Reducing Voltage helps with lowering the power consumption but increasing in the probability of failure of memory cell. This methodology detects and corrects the errors using wavelet characteristic used in DWT step in the flow of encoding. It could be used when decoder has no information about the percentage and location of the errors injected at the encoding time. This study includes a review the characteristic of errors introduced in the coefficient domain of the encoder, analyzing the visual defects in the result of using defected memory at the decoding image. Developing a detection schema and finally developing an iterative methodology to detect and conceal the defects.

Progress:
The JPEG2000 encoder with defective memory is modeled and the effect of using defective memory in the JPEG2000 is analyzed. Experimental and analytical curves relating the PSNR to error percentage is developed. Experimental and analytical curves relating the PSNR to voltage level are obtained. Decoder specific detection filters were developed and a correction methodology for defective coefficients is proposed. Finally, an Iterative Error Concealment (IEC) methodology for decoder is developed. An analysis on the complexity of the IEC is developed.


Project 2 overview:
In his project we are developing new low power cache architecture. Our new architecture enables the memory sub systems to scale the voltage down in L1 instruction cache and L2 unified, Instruction or data cache without loss in correctness and insignificant or no loss in performance. Reduction in memory supply voltage comes with the expense of increase in SRAM cell probability of failure. Therefore in this research we formulate the trend of increased probability of failure in the SRAM cells as a function of voltage. In order to tolerate defect in our cache, we are designing a fault tolerating cache architecture that is optimized for low power performance we expect. The new architecture has two new components in addition to a traditional cache. First addition is an off cache defect map, the Bit Lock Block (BLB) which is a part of remapping circuit sitting next to the cache and is accessed in parallel, and second is an Instruction Defect Cache. This cache is a small direct or associative cache that is word addressed and is a place holder for defective words in the cache. The BLB and IDC combination are supposed to reduce the miss rate dramatically and at the same help with reduction in total power consumption.

Progress:
We did an analytical and experimental curves relating probability of memory cell failure to the memory supply voltage is derived. (for two different technology: 32nm and 65nm). Next, the new architecture is purposed and modeled. A trace driven cache (Dinero IV) is modified to model our architecture and to calculate energy consumption per bench mark. Many Simulations are performed and for each bench mark the energy consumption as a function of voltage and access time is plotted.

Publications: We are working on the first paper on this subject. It will be ready in 20 days.
Project overview

The long term goal of the proposed project is to develop a system level methodology for power optimization for SoCs. In the immediate term, the proposed project will investigate techniques for efficient power modeling of SOC bus architectures, as well as of system-level IP blocks, and their use in the architectural exploration of IP-based SOC designs. On the basis of such power models of the system, we will be able to explore the architectural design space and evaluate various scheduling schemes. Meanwhile, the model is designed to be easily refined as the design process goes through the design flow, providing more accurate estimating of system performance and power consumption. The major purpose of the model is to provide a vehicle for researches on power optimization, such as the HAIM/DOSE (Hierarchically Abstracted IP modeling by Data Organization Space Exploration) exploration flow proposed earlier. The model and exploration flow are based on the COMMEX transaction-level communication architectural framework, on which we will study the H.264 application (the latest video coding standard), and JPEG2000 (the latest still image coding standard).

Progress

During Fall 06, we engaged three students in this project and made steady progress. The student supported on CPCC Fellowship was Sudeep Pasricha, who focused on power estimation for on-chip communication architectures, the overall SystemC modeling framework design as well as a case study of the H.264 decoder. Young-Hwan Park worked on gate-level power data extraction for communication architectures, and the H.264 RTL integration effort. Hooman worked on processor modeling and software simulation. During the fall quarter, we accomplished the following:

- We refined our energy macro-model methodology for system-level power estimation for on-chip communication architectures, specifically the bus matrix communication architecture. We enabled power estimation for multiple technology libraries, from 180 nm down to 65 nm, and verified that our estimation approach is within 5% accuracy of gate-level estimates using tools such as Synopsys Primepower. We conducted several experiments on power estimation of the bus matrix, for various technology libraries, to understand how power consumption of the bus matrix is distributed among its various components such as wires, buffers, arbiters, decoders and MUXs. We also conducted several experiments on how power consumption of the communication architecture varies with shrinking process technologies, and over different technology library variants.
- We initiated worked on developing power estimation schemes for hardware ASIC blocks early in the design flow, at the system level.
- We also initiated work on processor modeling and power estimation for on-chip microprocessors.
- We continued our work on integrating the various components of the H.264 decoder at the RTL level with an AMBA AHB-based communication backbone.

Going forward, our goal for the beginning of 2007 is to explore system level power estimation for the communication architectures, under process variations, which is an extremely important problem facing designers today. We will also focus our efforts on developing power estimation models for other components such as memories, processors and ASIC blocks. We then plan to apply the methodology to study system level power optimization techniques on the H.264 and JPEG2000 application drivers.

Publications
Information Rate Maximization in Wireless Systems Employing Channel Feedback under Affine Precoding

Chitaranjan Pelur Sukumar (Advisor: Ahmed M. Eltawil)

Abstract—The search for optimality in the design of channel precoders and training symbols in block processing communication systems has recently become subject of extensive research. However, very little work has been done in trying to find the best tradeoff in terms of power distribution between information and pilot symbols, for the case where channel estimate information is available via feedback. We intend to solve the problem of maximizing channel capacity in a block fading scenario within a recently introduced affine precoding scheme, when an error-free channel feedback path is available to the transmitter. This novel approach will find the optimal precoders and training vectors based on previously obtained estimates of channel and data symbols.

I. INTRODUCTION

It has been shown [1] that affine precoders are more efficient than conventional linear precoders in that they introduce less redundancy in the transmitted signal to facilitate symbol recovery at the receiver. Our goal in this research is to design the symbol precoder and the training vector in the affine precoding framework to maximize mutual information. This is done on a block by block basis, where the channel information estimated in one block is used to determine the precoder and training matrices for the next block, via an error-free feedback path from the receiver.

Before continuing, we shall briefly comment on some relevant facts, regarding optimality in channel estimation based block equalization schemes. Optimal training has been considered in a number of previous papers. In [2], the authors state that if non-superimposed data and pilots are considered, then it is optimal in the Cramer-Rao Bound (CRB) sense to put data and pilots with higher power in the middle of the data packet. The authors in [3], working with the affine precoding framework and least-squares receivers conclude that equi-spaced pilots have the best mean square error performance. CRB-optimal training sequences under the affine precoding framework has been recently pursued in [4] by considering two distinct strategies: 1) Decoupled channel and symbol estimation. It has been shown that when the power for training is fixed, minimization of the CRB leads to simultaneous maximization of a lower bound on the symbol mutual information. As a result, this requires orthogonality between the precoder and the training vector. A minimum-mean-square-error (MMSE) equalizer can thus be used to enforce this condition. 2) Jointly design of symbol precoder and training vector. In this case, minimizing the CRB with respect to both symbol and channel as one estimated parameter, leads to smaller MSE when compared to strategy 1. Unfortunately, no receiver that estimates channel and symbols jointly has been proved feasible.

II. OUR CONTRIBUTION

Our contribution will differ from the aforementioned approaches in the following ways. First, we maximize the mutual information subject to the total power available for symbols and training, assuming that channel feedback is available. We do not assume fixed separate power allocations for data and training. Secondly, unlike prior art, our approach is based on the optimization of the precoder and training matrices within an algorithm. That is, the precoder and training are optimized based on prior estimation of the channel, as well as its corresponding estimation error. This may lead to different optimization schemes, depending on how the channel is modeled, or equivalently, how the covariance of the estimation error vector in both channel and symbol estimation is computed. The algorithm may run inside one block in a turbo fashion, as well as throughout the blocks. Third, orthogonality between the precoder and the training vector is not necessary when channel and symbols are being recursively estimated i.e., for instance, in OFDM, we do not need separate frequency bins for training and data symbols. We consider a $N$-tap discrete single-input-single-output (SISO) channel, described via a the $P \times P$
pseudocirculant matrix
\[
\bar{H}(z) \triangleq \bar{H}_0 + \bar{H}_1 z^{-1} ,
\]
and assume that interblock interference (IBI) caused by the term in \( \bar{H}_1 \) has been previously removed according to a zero-padding (TZ) or a leading-zeros (LZ) scheme. The remaining effect of the channel is then represented by \( \bar{H}_0 \), which can be an \( M \times P \) matrix (in the LZ case) or a full rank matrix, where \( P = N + M - 1 \) is the minimum length required for the transmitted block. For consistency with standard OFDM schemes, we consider the former case, where a cyclic prefix (CP) is inserted to the transmitted vector. Let \( s \) be the \( M \times 1 \) information data vector. Under the affine precoding scheme, the \( P \times 1 \) transmitted vector \( x \) is given by
\[
x = As + b
\]
where \( A \) is a \( P \times M \) precoder matrix, and \( b \) is the superimposed \( P \times 1 \) training vector used for channel estimation within each block transmission. The resulting channel model after discarding the CP is given by
\[
y = HAs + Ht + v
\]
where \( H \) is now a circulant matrix, with first row given by \([h(0) \ 0 \ \cdots \ h(N - 1) \ \cdots \ h(1)]\). Moreover, we assume the gauss-markov channel model:
\[
h_i = \alpha h_{i-1} + u_i
\]
where \( \alpha \) is a function of the Doppler, \( u_i \) is a vector whose elements are zero mean complex gaussian random variables with variance \((1 - \alpha^2)\sigma_h^2\) and \( i \) is the time index. We are currently working on solving the following optimization problem
\[
I = \max_{A,T} \log |I + \sigma^2 \hat{H}_i A_i A_i^* \hat{H}_i^* R_n^{-1}| \\
\text{s.t. } \sigma^2 tr(A_i A_i^*) + \|t_i\|^2 = P
\]
where \( \sigma^2 \) is the variance of the data symbols, \( P \) is the total transmit power available, \( \hat{H}_i, A_i \) and \( t_i \) are the estimated channel, precoder and training matrices at time instant \( i \) and \( R_n \) is the variance of the effective noise at the input to the symbol estimator. It is worth noting that \( R_n \) is a function of channel noise, the precoder as well as the training matrices.

REFERENCES

**Introduction:** Our research interests are in the general area of video streaming over wired and wireless networks. During fall 2006, we did preliminary research and literature review in two areas: (i) network coding for video streaming over wireless and (ii) streaming of multi-view video. As a result, we identified specific research questions, which we plan to further explore in the next quarters.

**Network Coding for Video:** Network coding is an emerging area that started with [1,2] in the context of multicast networks. In [1,2] it was shown that if intermediate nodes are allowed to perform simple operations on incoming packets, then the network can achieve the min-cut throughput to each receiver. This breakthrough idea has spawned a significant effort in applying network coding to other topologies, in developing practical algorithms and finding applications [3]. In particular, the recent work in [4] used network coding at 802.11 wireless access points, to exploit the broadcast nature of the wireless medium and increase the downlink throughput. During Fall 2006, we reviewed the literature in the above areas and we realized that network coding can further improve the performance of video streaming over a wireless LAN, compared to [4], by selecting network codes at the access point so as to optimize not only the throughput but also the video quality. During the winter quarter 2007, we focused on this problem and further investigated it.

**Multi-View Video Streaming:** Multi-view video streaming problems arise in the context of applications that use multiple cameras, including surveillance, sports broadcasting, and teleconferencing applications. We are interested in the design of multi-view video systems that provide high video quality and user interactivity [5]. For example, consider the broadcasting of a soccer game; there may be several viewers interested in different views of the game (one user may be interested in cameras that show his favorite player, while another viewer may prefer an overview of the field). During fall 2006, we surveyed the problem space of multi-view video streaming, and we came up with a taxonomy of research problems, which is shown in Fig.1.

![Figure 1: Taxonomy of multi-view video streaming research problems](image)

Depending on the application, there can be different number of viewers and/or camera views. We considered several options that available to the user for view selection. The terms “view switching” and “view sweeping” refer to user-defined selection of viewpoint in an arbitrary or in a sequential order, respectively. The terms ‘object-based’ and ‘semantics-based’ refer to view selection based on objects (e.g. a viewer’s favorite soccer player) or events (e.g. highlights, such as goals), respectively. We also considered several possibilities for coding the video streams from different cameras: (i) we could code different views independently from each other (ii) we could use the emerging standard of multi-view coding, [6], to exploit the correlation between multiple views of the same scene for better compression or (iii) we could use object-based coding to facilitate object-based view selection.
Fig. 1 summarizes the problem space and identifies interesting combinations/problems (indicated by squares and circles). During fall quarter 2006, we also formulated some of these problems as optimization problems (the formulations are omitted from this report, for brevity). In the next quarters, we plan to study some of the identified problems in depth.

References:
I. INTRODUCTION

In slow fading environments, single beamforming (i.e., transmit beamforming) techniques are well known to boost the performance of multi-antenna systems, maximizing the signal-to-noise ratio (SNR) at the receiver. The optimal transmit beamforming vector is channel dependent and the transmitter needs to know the beamforming vector perfectly prior to the transmission. Consequently, a robust feedback link will be needed to provide the transmitter side with the channel estimates (or beamforming vector), especially for FDD systems. Limited feedback techniques for single beamforming are well-studied in the literature. However, for the multiple beamforming case, also known as precoded spatial multiplexing, where multiple data streams are transmitted simultaneously using multiple beamforming vectors, the number of papers in the literature is very limited. This quarter our aim was to address the performance of bit-interleaved coded multiple beamforming with limited feedback. Basically we consider the systems we analyzed in [1], [2] where we assume perfect channel state information (CSI) is available to the receiver, on the other hand transmitter has only partial CSI through a channel state information (CSI) is available to the receiver, on the other hand transmitter has only partial CSI through a feedback link. In this scenario with 2 streams using 2 bit and 4 bit feedback. Thus, the number of codewords in the codebook are 4 and 16, respectively. To implement VQ, we used our proposed distortion metric, and a suboptimal centroid condition to improve the codebook iteratively. Due to the local optimality properties of VQ, ten different initial codebooks are generated, and 30 iterations are employed to find a good codebook. Fig. 4 shows that up to 1.5 dB performance improvement can be achieved using VQ for the 2 x 2 case.

In Fig. 5 compares the RVQ and GM methods. GM is an alternative method which has lower complexity compared to matrix codebook approach. As seen from the figure, GM requires 14 bit feedback to achieve the same performance of RVQ with 6 bit feedback.

In Fig. 6 two selection criteria are compared when RVQ is employed. As expected the proposed selection criterion outperforms the one optimized for uncoded precoded spatial multiplexing.

II. SIMULATIONS

In the simulations below, the industry standard 64 states 1/2 rate (133,171) d_{free} = 10 convolutional code is used. The channel is assumed to be quasi-static and flat fading.

**Codeword selection methods:**

1) Random Vector quantization (RVQ): RVQ is a simple approach to codebook design that generates the vectors independently from a uniform distribution on the space of unitary matrices for each channel use.

2) Vector quantization (VQ): VQ takes an initial codebook, according to the proposed distortion metric it iteratively optimizes the codebook.

3) Givens Method (GM): GM exploits the geometry of a unitary matrix and and parameterizes into rotation angles [4].

**Results:**

Figs. 1-3 show the simulation results for 2 x 2, 2 x 3 and 3 x 2, respectively. In all three scenarios, 2 spatial streams are transmitted simultaneously and random codebook approach along with the proposed selection criterion is used. The main observation is that achieving a performance close to the full CSI case requires much larger amounts of feedback compared to the uncoded case. For the uncoded case, 6 bit feedback is usually achieved a satisfactory performance for both single and multiple beamforming case. As seen from Figs. 1-3, there is a significant diversity loss compared to the full CSI case for all feedback scenarios. This loss reaches its maximum when the number of transmit and receive antennas (see Fig. 1). For that case, basically, the diversity order for all limited feedback scenarios reduces to one, which is the diversity order of 2 x 2 uncoded multiple beamforming.

Fig. 4 compares the RVQ and VQ approaches for a 2 x 2 scenario with 2 streams using 2 bit and 4 bit feedback. Thus, the number of codewords in the codebook are 4 and 16, respectively. To implement VQ, we used our proposed distortion metric, and a suboptimal centroid condition to improve the codebook iteratively. Due to the local optimality properties of VQ, ten different initial codebooks are generated, and 30 iterations are employed to find a good codebook. Fig. 4 shows that up to 1.5 dB performance improvement can be achieved using VQ for the 2 x 2 case.

In Fig. 5 compares the RVQ and GM methods. GM is an alternative method which has lower complexity compared to matrix codebook approach. As seen from the figure, GM requires 14 bit feedback to achieve the same performance of RVQ with 6 bit feedback.

In Fig. 6 two selection criteria are compared when RVQ is employed. As expected the proposed selection criterion outperforms the one optimized for uncoded precoded spatial multiplexing.

**REFERENCES**


Fig. 1. Results for the $2 \times 2$ system with 2 streams over flat fading channel using random codebook approach.

Fig. 4. Results for the $2 \times 2$ system with 2 streams over flat fading channel: Comparison of vector quantization and random codebook approach.

Fig. 2. Results for the $2 \times 3$ system with 2 streams over flat fading channel using random codebook approach.

Fig. 5. Results for the $2 \times 2$ system with 2 streams over flat fading channel: Comparison of Givens method and random codebook approach.

Fig. 3. Results for the $3 \times 2$ system with 2 streams over flat fading channel using random codebook approach.

Fig. 6. Results for the $3 \times 2$ system with 2 streams over flat fading channel: Comparison of two selection criteria, $\lambda_{\text{min}}$ vs. proposed.
Introduction:
In a broad sense, cognitive radio encompasses all the solutions to the problem of overcrowded and inefficient spectrum. Recent research has proposed different interpretations of cognitive radio that underlay, overlay and interweave the transmissions of the cognitive user with those of the licensed users. The most common interpretation is the interweave approach, wherein cognitive radio dynamically adapts to spectrum availability by periodically monitoring the radio spectrum, intelligently detecting primary (licensed) user occupancy in the spectrum and then opportunistically communicating over unoccupied channels (spectrum holes) with minimal interference to the primary users.

Summary of Accomplishments:
Over the past quarter, we have been working on a magazine article and a conference paper [1][2]. [1] has been accepted for publication in the May 2007 issue of Communication Magazine (Cognitive Radios for Dynamic Spectrum Access). [2] is still in preparation and is to be submitted to the IEEE Global Telecommunications Conference.

In [1], we explore the throughput potential of cognitive radio technology as revealed by several recent research efforts. In particular, we focus on the overlay and interweave interpretations of cognitive radio and describe conceptual frameworks for secondary communication based on these techniques. In the overlay approach, the secondary transmitter uses the knowledge of the primary message and distributes its power between relaying the primary signal and transmitting its own signal. Further, sophisticated coding techniques are used to deal with interference at the secondary receiver. In the interweave approach, simultaneous primary and secondary transmissions are not allowed - the secondary user communications only when the primary system is inactive. Numerical results comparing the theoretical throughput limits of the secondary users in the overlay and interweave approaches are also provided. We find that the performance improvement of the overlay approach over the interweave approach is contingent on the availability of sufficient side information and disappears as the distance between the primary and secondary transmitters increases.

In [2], we explore the fundamental tradeoff governing the coexistence of heterogeneous wireless devices - the one between regulation and autonomy. Too much regulation/licensing and the system becomes inefficient, too much opportunistic behavior makes the system self-disruptive. We consider a unit radius circular disc containing a given number of transmitter-receiver pairs. The users are separated into two groups - we divide the channel resource into identical partitions and license it to the users in the first group (primary users). Whenever any licensed user has data to transmit, it communicates on the frequency band assigned to it. The remaining users (secondary users) communicate on the channel opportunistically, i.e., they monitor all the frequency bands for primary users and communicate over the spectral holes. The primary user sensing is not perfect - the secondary users can only detect primary users only if they are within a certain radius from the secondary transmitter/receiver. We assume identical traffic models for the primary and secondary users. The key question here is: What is the optimal amount of licensing (the optimal number of primary users)? When the sum rate is plotted as a function of the number of primary users, we find that the optimal fraction of primary users is equal to the traffic duty cycle and lies between the two extremes of fully licensed and fully opportunistic operation.

References:
Abstract—In the DS-UWB receiver, a pulse generator is needed to correlate the input pulse for synchronization. A flexible pulse synthesis is examined and simulated for the implementation of the pulse generator. In the simulation, low band and high band pulse can be generated by changing the pulse width of the baseband signal and frequency of the carrier. To meet FCC mask, a baseband filter is needed to get a dual band operation.

I. INTRODUCTION

In DS-UWB system [1], in order to avoid interference with available IEEE 802.11a system at 5GHz, the 7.5GHz bandwidth is divided into two bands: higher band (6GHz–10GHz) and lower band (3GHz–5GHz). One challenge with the specification of spectrum in DS-UWB is to design a pulse signal which exactly has a dual band characteristic. Normally, current available designs [2–3] generate the pulse which only works on lower band, because of the difficulty to go higher band. In addition, most of the pulse generators are designed based on filtering a wideband signal. Therefore, they are not suitable for flexible design, which is another reason why only one band is chosen. In our recent work, a flexible pulse generator was proposed to be suitable for both lower band and higher band separately.

II. METHOD OF PULSE SYNTHESIS

In order to design a flexible pulse which can switch between lower band and high band of UWB spectrum, we re-use the idea to get a bandpass signal from narrow-band system. In traditional narrow band wireless communication system, a baseband signal with spectrum located around DC is modulated with a single tone sinusoidal signal to be able transmitted in the channel. Since the bandwidth of a baseband signal is inverse proportional to the width of the pulse, the DS-UWB signal has a low duty cycle characteristic because the pulse width is much less than one period of the data. To get a bandpass signal, the low duty cycle pulse is modulated with a sinusoidal carrier signal. Due to the symmetric baseband signal in spectrum, two carriers with 4GHz and 8GHz need to be generated. For circuit implementation, it is better to design one PLL to generate 8GHz carrier and divide by 2 to get the 4GHz carrier instead of two PLLs. The most challenging part is how to get a low duty cycle pulse. According to the bandwidth requirement, the shortest pulse width is 250ps. One more concern is the power consumption of the pulse generator. Because the pulse is on only at certain time in one period, it is better if the circuit can be switched off during off state to save power. Therefore, digital logic circuit is preferred to implement such a pulse generator and the switch transistors need to be fast enough to get a 250ps narrow pulse.

III. SIMULATION RESULTS

Before going details in circuit design, a block level simulation of pulse synthesis was ran under Advanced Design System (ADS). Fig. 1 shows the spectrum of two kinds of DS-UWB signals. The upper one is a pulse which is designed for lower band and the lower one is a higher band pulse. In the simulation, a rectangular low duty cycle pulse is used and out of band signal is also strong because of harmonics of baseband rectangular pulse. In addition, the strong harmonic of the

![Fig. 1. PSD of two DS-UWB pulses](image-url)
baseband pulse will increase difficulty to use both bands of DS-UWB to boost data rate. The overall spectrum with both low band and high band pulse separated in time is drawn in Fig. 2.

![Graph showing spectrum with x-axis as frequency (Hz) and y-axis as PSD (dBm/MHz)](image)

**Fig. 2.** Spectrum of both low and high band DS-UWB pulses

Harmonics at low frequency of two band signals add up and cause the out of band signal (less than 3.1GHz) stronger than the FCC mask (-61dBm/MHz). Therefore, pulse shaping is needed after short pulse is generated, which will be the future work for improving the pulse synthesizer.

In order to get an idea why pulse generator is needed in the synchronizer, a block diagram is shown in Fig. 3. This pulse generator will be used to generate LO pulses, which correlate the input UWB pulse for synchronization. Previous work on wideband delay circuit which is also one part of the synchronizer was published [4].

![Block diagram of pulse generation](image)

**Fig. 3.** Block diagram of pulse generation

REFERENCES


