UNIVERSITY OF CALIFORNIA
IRVINE

CENTER FOR PERVERSIVE COMMUNICATIONS AND COMPUTING

GRADUATE FELLOWSHIP PROJECTS
PROGRESS REPORTS
FALL 2004
# Projects

## Alphabetized According to Student Lastname

<table>
<thead>
<tr>
<th>Project No</th>
<th>Student Name</th>
<th>Project Title</th>
<th>Advisor</th>
</tr>
</thead>
<tbody>
<tr>
<td>25</td>
<td>Enis Akay</td>
<td>MIMO Physical Layer (PHY) Alternatives for 802.11 Wireless LANs</td>
<td>Ender Ayanoglu</td>
</tr>
<tr>
<td>7</td>
<td>Xinyu Chen</td>
<td>Design of Equalizers for High-Speed Wireline Communications Systems</td>
<td>Michael Green</td>
</tr>
<tr>
<td>4</td>
<td>Liang Cheng</td>
<td>Video Coding and Power Consumption for Mobile Handheld Devices</td>
<td>Magda El Zarki</td>
</tr>
<tr>
<td>24</td>
<td>Inanc Inan</td>
<td>MAC Algorithms for 802.11 Wireless LANs</td>
<td>Ender Ayanoglu</td>
</tr>
<tr>
<td>2</td>
<td>Byong-Moo Lee</td>
<td>Adaptive Pre-Distorters for Linearization and Increased Power Efficiency for OFDM-Based Mobile Wireless Communications</td>
<td>Rui de Figueiredo</td>
</tr>
<tr>
<td>15</td>
<td>Yun Long</td>
<td>SOC Power Optimization Framework</td>
<td>Fadi Kurdahi</td>
</tr>
<tr>
<td></td>
<td>Sudeep Pasricha</td>
<td></td>
<td>Nikil Dutt</td>
</tr>
</tbody>
</table>
I. INTRODUCTION

Our earlier work was focused more on achieving the maximum diversity order in wireless systems [1], [2]. Reference [2] combines bit interleaved coded modulation (BICM) with space time block codes (STBC). In the beginning of the fall quarter we worked on an easy-to-implement decoding scheme for BICM STBC systems. The results are going to be published in VTC Spring 2005 [3].

We also investigated beamforming techniques in the context of next generation WLANs. We analyzed the performance of single- and multiple- beamforming techniques. Our findings are going to be presented in VTC Spring 2005 [4], and another paper has been accepted for presentation in ICC 2005 [5].

Later in the quarter we decided to focus more on increasing the data rate of wireless broadband systems by deploying multiple transmit and receive antennas. We implemented a MIMO-OFDM system with BICM at the transmitter end. At the receiver end, we used an easy to implement zero forcing (ZF) technique along with low complexity bit metrics. The initial results showed a high performance system with very low complexity while doubling the data rate. We also compared the new scheme with a MMSE receiver, and multiple-beamforming technique. Simulation results showed that ZF and MMSE achieve the same performance. For a small number of transmit and receive antennas, beamforming (using full feedback) and ZF receiver have similar performance. However, when the number of transmit antennas increases, beamforming provides a substantial gain.

II. LOW COMPLEXITY DECODING OF BICM STBC

It was shown earlier that the combination of bit interleaved coded modulation (BICM) with space time block codes (STBC) for OFDM systems leads to the maximum diversity order in space and frequency [2]. In [3] we presented low complexity decoding of BICM-STBC systems. First, the maximum likelihood (ML) bit metric calculations are simplified from metrics on the complex plane, to metrics on the real line. Furthermore, we presented some log-likelihood approximations to achieve easier to implement soft decision bit metrics with the same high performance. Simulation results showed that the proposed low complexity bit metrics have the same performance as the original ones.

III. DIVERSITY SPATIAL MULTIPLEXING TRADEOFF

In [5] we first formally showed that single beamforming can achieve the maximum spatial diversity order. Simulation results showed significant coding gains (also known as array gains) for beamforming as compared to space time block codes where channel state information is not required at the transmitter.

We investigated beamforming in the context of high speed broadband wireless communications. In this context, we combined beamforming with BICM-OFDM (which is widely used in existing WLANs). We call the resulting system BBO. We provided an analysis showing that BBO can achieve the maximum diversity order in space and frequency by using an appropriate convolutional code. In other words, for \(N\) transmit and \(M\) receive antennas, BBO can achieve a diversity order of \(NML\) for \(L\)-tap frequency selective channels.

In addition to having a substantial diversity order, simulation results showed that BBO introduces significant coding gains when compared to other systems based on STBC with the same diversity order. As the number of transmit antennas increases, the coding gains increase as well, since there is no full-rate complex STBC for more than 2 antennas.

A way to implement a low complexity decoding operation for BBO is also described in [5]. As a result, BBO provides a very high performance (maximum diversity in space and frequency, and high coding gain), and easy to decode system for broadband wireless communications.

In [4], we studied a single-user multi-antenna system with perfect channel state information (CSI) both at the transmitter and receiver. We extended our analytical results of [5] to multiple beamforming (i.e., sending more than one symbol simultaneously). We showed that if \(K\) subchannels are used for a \(N \times M\) system, the diversity order is given by \((N - K + 1)(M - K + 1)\).

IV. HIGH DATA RATE SYSTEMS

We implemented a MIMO-OFDM system using BICM at the transmitter. The coded bits are interleaved and mapped
onto symbols. The symbols are first grouped in vectors of length \( K \) (where \( K \) is the number of subcarriers used in one OFDM symbol). Vectors of \( K \) symbols are then multiplexed to \( N \) transmit antennas. By grouping the symbols in vectors, BICM-OFDM guarantees to achieve full frequency diversity while using the simple bit interleaver of 802.11a.

At the receiver end we implemented different techniques. Simulation results on IEEE channel models showed that ZF and MMSE receivers achieve the same performance. We used very low complexity bit metrics for decoding of BICM. The bit metrics used are based on a distance on the real line.

We compared the performance of ZF MIMO-OFDM system with MIMO-OFDM using beamforming with full feedback. Simulation results show that for \( 2 \times 3 \) and \( 2 \times 4 \) systems over IEEE channels, ZF achieves the same high performance as beamforming. However, when the number of transmit antennas increases, beamforming shows substantial gain compared to ZF MIMO-OFDM system.

REFERENCES

For multi-gigabit transceiver design, the challenges come about due to loss and distortion effects of high-frequency broadband data in the communication channel media. In particular, the loss in a copper cable of length more than a few meters can severely attenuate the high-frequency components of the data signal being transmitted, resulting in reduced performance or even bit errors occurring. In an optical communication channel, the dispersion in the optical single-mode fiber over lengths on the order of 100km (or multi-mode fiber on the order of 100m) can cause severe distortion of the data signal by the time it reaches the receiver. In all cases, equalization and clock/data recovery (CDR) circuit are essential for restoring the data signal and synchronize the data to a form that can be processed correctly by the data processor. Our research investigates new design techniques for CMOS broadband equalizers that are used to compensate for loss and dispersion in copper cable and fiber optic media and CMOS CDR circuit with strong jitter tolerance. The bit rates considered are 10Gb/s and higher. Adaptation for different media characteristics is also considered.

Our 2004 Fall research can be structured in the following parts:

1. Investigate the design of variable threshold equalizer. This circuit is actually a one-tap fractionally spaced equalizer, whose tap spacing is continuously adjustable. The idea of variable threshold is introduced to combat the slicer design challenge due to the inherently low trans-conductance of MOSFETs. Instead of using a fixed threshold, the slicer threshold is continuously controlled by the previous input data. The criterion is that if the previous bit was 0, the threshold is adjusted to a lower level; if the previous bit was 1, the threshold is adjusted to higher level.

2. Investigate the design of eye-tracking CDR circuit with strong jitter tolerance and low jitter transfer characteristic. According to linear PLL theory, to achieve a high jitter tolerance, the loop needs to have a high loop jitter bandwidth. However it will introduce too much jitter to the recovered data and clock. The proposed eye-tracking CDR circuit has a nonlinear jitter transfer characteristic, whose jitter transfer gain approaches to zero near to the transition edges of the incoming data. It thus allows the loop adopts a very high loop bandwidth without transferring a lot jitter into the recovered data and clock. Theoretical analysis of the eye-tracking CDR loop dynamics was also conducted through MATLAB. The preliminary Simulink model has been built up and more investigation is ongoing.

3. Work on integrating the equalizer with the eye-tracking CDR circuit. The whole chip circuit design has been finished. Pre-layout simulation results have shown the functionality of the proposed variable threshold equalizer and the eye-tracking CDR circuit. Chip layout and post-layout verification will be the following research focus.
Abstract

For a typical portable handheld device, the backlight accounts for a significant percentage of the total energy consumption (e.g., around 30% for a Compaq iPAQ 3650). Substantial energy savings can be achieved by dynamically adapting backlight intensity levels on such low-power portable devices. In this research work, we analyze the characteristics of video streaming services and propose an adaptive scheme called Quality Adapted Backlight Scaling (QABS), to achieve backlight energy savings for video playback applications on handheld devices. Specifically, we present a fast algorithm to optimize backlight dimming while keeping the degradation in image quality to a minimum so that the overall service quality is close to a specified threshold. Additionally, we propose two effective techniques to prevent frequent backlight switching, which negatively affects user perception of video. Our initial experimental results indicate that the energy used for backlight is significantly reduced, while the desired quality is satisfied. The proposed algorithms can be realized in real time.

1 Introduction

With the widespread availability of 3G cellular networks, mobile hand-held devices are increasingly being designed to support streaming video content. These devices have stringent power constraints because they use batteries with finite lifetime. On the other hand, multimedia services are known to be very resource intensive and tend to exhaust battery resources quickly. Therefore, conserving power to prolong battery life is an important research problem that needs to be addressed, specifically for video streaming applications on mobile handheld devices.

Most hand-held devices are equipped with a TFT (Thin-Film Transistor) LCD (Liquid Crystal Display). For these devices, the display unit is driven by the illumination of backlight. The backlight consumes a considerable percentage of the total energy usage of the handheld device; it consumes 20%-40% of the total system power (for Compaq iPAQ) [1].

Dynamically dimming the backlight is considered an effective method to save energy [1, 2, 3] with scaling up of the pixel luminance to compensate for the reduced fidelity. The luminance scaling, however, tends to saturate the bright part of the picture, thereby affecting the fidelity of the video quality.

In [2], a dynamic backlight luminance scaling (DLS) scheme is proposed. Based on different scenarios, three compensation strategies are discussed, i.e., brightness compensation, image enhancement, and context processing. In particular, for the luminance compensation, the distortion function is defined in Equation (7) of their paper. However, their calculation of the distortion does not consider the fact that the clipped pixel values do not contribute equally to the quality distortion.

In [3], a similar method, named concurrent brightness and contrast scaling (CBCS), is proposed. CBCS aims at conserving power by reducing the backlight illumination while retaining the image fidelity through preservation of the image contrast. The image distortion is defined as the loss of luminance contrast. Hence, it is proposed to compensate for the backlight dimming while maintaining the contrast, and both the undershot and overshot regions are clipped, as is illustrated in Equation (7) of their paper. Their distortion definition and proposed compensation technique may be good for static image based
applications, such as the graphic user interface (GUI) and maps, but might not be suitable for streaming video scenarios, because their contrast compensation further compromises the fidelity of the images. In addition, both [2] and [3] do not solve the problem associated with frequent backlight switching which can be quite distracting to the end user.

In this research work, we explicitly incorporate video quality into the backlight switching strategy and propose a quality adaptive backlight scaling (QABS) scheme. The backlight dimming affects the brightness of the video. Therefore, we only consider the luminance compensation such that the lost brightness can be restored. The luminance compensation, however, inevitably results in quality distortion. For the video streaming application, the quality is normally defined as the resemblance between the original and processed video. Hence, for the sake of simplicity and without loss of generality, we define the quality distortion function as the mean square error (MSE)(see Equation (1)) and the quality function as the peak signal to noise ratio (PSNR)(see Equation (2)), both of which are well accepted objective video quality measurements.

\[
MSE = \frac{1}{M} \sum_{i=1}^{M} (x_i - y_i)^2
\]  

\[
PSNR(dB) = 10 \log_{10} \sum_{i=1}^{M} \frac{255^2}{(x_i - y_i)^2}
\]

where \(x_i\) and \(y_i\) are the original pixel value and the reconstructed pixel value, respectively. \(M\) is the number of pixels per frame.

Although MSE and PSNR are not always correlated to the subjective quality [4][5], they are still widely used to assess video quality. The detailed discussion of the human visual system and the corresponding perceptual quality is beyond the scope of this research work. It is to be noted that any improved quality metrics may be adopted to replace the MSE/PSNR metrics used here without affecting the validity of our proposed scheme.

As is mentioned in [3], for video applications, the continuous change in the backlight factor will introduce inter-frame brightness distortion to the observer. In our experiments, we find that the “unnecessary” backlight changes fall into two categories: (1) small continuous changes over adjacent frames; (2) abrupt huge changes over a short period. Therefore, we propose to quantize the calculated backlight to eliminate the small continuous change and use a low-pass digital filter to smooth the abrupt changes.

The rest of the report is organized as follows. In Section 2, we introduce the principle of the LCD display - experimental results show that backlight dimming saves energy while the pixel luminance compensation results in minimal overhead. In Section 3, we present our QABS scheme, which includes determining the backlight dimming factor and two supplementary methods to avoid excessive backlight switching. Section 4 shows our prototype implementation, experimental methodology and simulation results. We conclude our work in Section 5.

2 Characteristics of LCD

In this section, we outline the characteristics of the LCD unit from two perspectives, the LCD display mechanism and the LCD power consumption, both of which form the basis for our system design.

2.1 LCD display

The LCD panel does not illuminate itself, but displays by filtering the light source from the back of the LCD panel [2][3]. There are three kinds of TFT LCD panels: transmissive LCD, reflective LCD, and transflective LCD. In transmissive LCD, the LCD pixels are illuminated from behind (i.e., opposite the viewer) using a backlight. The transmissive LCD offers a high quality display and is widely used in laptop personal computers. However, it is not legible with backlight turned off. The reflective LCD has a reflector on the back, which reflects the ambient environment light or uses a frontlight. Compared to transmissive LCD,
reflective LCD uses modest amounts of power for illumination. Hence, most of the handheld devices, such as PDAs and cell phones use reflective LCD. Transflective LCD combines both transmissive and reflective mechanisms. It is not as common as the other two types.

We focus in this report on the reflective LCD [2][3], since it is the most commonly used LCD for handheld devices. Henceforth, when we mention LCD, we refer to reflective LCD and we refer to both backlight and forelight as backlight. As will be shown, our idea is generic to any backlight based LCD.

The perceptual luminance intensity of the LCD display is determined by two components: backlight brightness and the pixel luminance. The pixel luminance can be adjusted by controlling the light passing through the TFT array substrate. Users may detect a change in the display luminance intensity if either of these two components is adjusted. That is, the backlight brightness and the pixel luminance can compensate each other. In Section 2.2, we will show that the pixel luminance does not have a noticeable impact on the energy consumption, whereas the backlight illumination results in high energy consumption. Hence, in general, dimming backlight level while compensating the pixel luminance is an effective way to conserve battery power in hand-held devices.

Let the backlight brightness level and the pixel luminance value be $L$ and $Y$, respectively, and the perceived display luminance intensity $I$. We may denote $I$ using Equation (3).

$$I = \rho \times L \times Y$$

where $\rho$ is a constant ratio, denoting the transmittance attribute of the LCD panel, and as such $\rho \times Y$ is the transmittance of the pixel luminance.

We may reduce the backlight level to $L'$ by multiplying $L$ with a dimming factor $\alpha$, i.e., $L' = L \times \alpha$, $0 < \alpha < 1$. To maintain the overall display luminance $I$ invariable, we need to boost the luminance of the pixel to $Y'$. Since the pixel luminance value is normally restricted by the number of bits that represent it (denoted as $n$), $Y'$ may be clipped if the original value of $Y$ is too high or the $\alpha$ is too low. The compensation of the backlight is described in Equation (4).

$$Y' = \begin{cases} 
\frac{Y}{\alpha}, & \text{if } Y < \alpha \times 2^n \\
2^n, & \text{if } Y \geq (\alpha \times 2^n) 
\end{cases}$$

Combining Equation (4) and Equation (3), we have

$$I' = \begin{cases} 
I, & \text{if } Y < \alpha \times 2^n \\
\rho \times L \times \alpha \times 2^n, & \text{if } Y \geq (\alpha \times 2^n) 
\end{cases}$$

Equation (5) clearly shows that the perceived display intensity may not be fully recovered, instead, it is clipped to $\rho \times L \times \alpha \times 2^n$ if $Y \geq (\alpha \times 2^n)$. In Figure 2, we illustrate the clipping effect of the display luminance.

In Figure 1-a and Figure 1-c, we show an image and its luminance histogram. This image is the first frame of a typical news video clip ("ABC eye witness news") captured from broadcasting TV signal. Figure 1-b and Figure 1-d illustrate the image and its luminance histogram after backlight dimming and pixel luminance compensation. Figure 1-d shows that the pixels with luminance higher than 156 are all clipped to 156. This clipping effect eliminates the variety in the bright areas, which is subjectively perceived as the luminance saturation and is objectively assessed as 30dB with reference to the original image shown in Figure 1-a.

### 2.2 LCD power model

In our experiments, we observe that the backlight dimming can save energy whereas the compensation process, i.e., scaling up the luminance of the pixel, has a negligible energy overhead. We measure the energy saving as a difference of the total system power consumption with backlight set to different levels from that with the backlight turned to the maximum (brightest). Figure 3 shows the plot between the various backlight levels and their corresponding energy consumption for a Compaq iPAQ 3650 running Linux. A more detailed setup of our experiments is described in Section 4. It is noticed that the backlight energy saving is almost linear to the backlight level and can be estimated using Equation (6).
Figure 1: Original image, its luminance histogram, compensated image and its luminance histogram

Figure 2: Clipping effect.

Figure 3: Power saving versus backlight level.

\[ y = a_1 \times x + a_2 \]  \hspace{1cm} (6)

where \( y \) is the energy savings in Watt; \( x \) denotes the backlight level; \( a_1 \) and \( a_2 \) are coefficients. We apply the curve fitting function of MATLAB and obtain \( a_1 = -0.0029567 \) and \( a_2 = 0.73757 \) with the largest residual fitting error as 0.085731.

Contrary to the backlight switching, the pixel luminance scaling is uncorrelated to the energy consumption. In Figure 4, we show that for one specified backlight level (BL) the system energy consumption basically remains stable and is independent of the luminance scaling.

Figure 3 and Figure 4 justify the validity of the generic backlight power conservation approach, i.e., dimming the backlight while enhancing the pixel luminance value. In the next section, we apply this method to the video streaming scenario, discussing a practical scheme to optimize the backlight dimming while taking into consideration the effect on video distortion.

3 Adaptive Backlight Scaling

As explained in Equation (5), the backlight scaling with the luminance compensation may result in quality distortion. The amount of backlight dimming, therefore, has to be restricted such that the video fidelity will not be seriously affected.
3.1 Optimized Backlight Dimming

We define the optimized backlight dimming factor as the one whose induced distortion is closest to a specified threshold. Henceforth, we replace the factor \( \alpha \) with the real backlight level \( Alfa \), \( Alfa = N \times \alpha \) (\( N \) is the number of backlight levels (256 for Linux on iPAQ)), and the optimized backlight dimming is represented as \( Alfa^* \).

In Figure 5, we illustrate the image quality distortion in terms of MSE over different backlight levels. (Note that we use the image shown in Figure 1-a.) We see that as \( Alfa \) increases, the induced video quality distortion due to the brightness saturation monotonously decreases. Hence, for a given distortion threshold, we can find a unique \( Alfa (= Alfa^*) \) for each image. In video applications, for a given distortion, different frames may have distinct \( Alfa^* \), depending on the luminance histogram of that frame. However, it is hard to have an accurate analytical representation of the quality distortion using \( Alfa \) as a parameter. We therefore adopt an optimized search based approach, where we calculate the MSE distortion with different \( Alfa \) until the specified distortion threshold is met. The results of our scheme are accurate and can be used as the benchmark for the design of other analytical methods.

Figure 6 shows the exhaustive searching algorithm for finding \( Alfa^* \) for one image. \( FindAlfa(th) \) takes the distortion threshold (th) as input, and returns the \( Alfa^* \) as output. Note that \( MSE(Alfa) \) calculates the MSE with the specified \( Alfa \) for one frame.

However, the complexity of an exhaustive search shown in Figure 6 is too high. As shown in Equation (2), the per-frame MSE calculation consists of \( M \) multiplications and \( 2M \) additions. \( M \) is the number of pixels in one frame, e.g., \( M = 25344 \) for QCIF format video. We regard the per-frame MSE as the basic complexity measurement unit. We assume that the optimized backlight level is uniformly distributed in \([0, N]\), and thus the complexity of algorithm in Figure 6 is \( O(N) \). In our test, \( N = 256 \). Obviously, the optimized backlight dimming factor can hardly be calculated in real-time.

Therefore, we apply a faster bisection method \([6]\) to improve the algorithm for finding \( Alfa^* \). Since we can easily find an upper bound (denoted as \( u \)) and a lower bound (denoted as \( d \)) on the backlight levels, we get as good an approximation as we want by using bisection. We assume that \( u > d \) and let \( \epsilon \) be the desired precision and present the algorithm in Figure 7.

By using the bisection method, we may achieve the complexity of \( O(\log_2 N) \) in the worst case. For instance, for \( N = 256 \) and \( \epsilon = 1 \), we only need to calculate per-frame MSE at most eight times, which is fast enough for real-time processing.
3.2 Smoothing the backlight switching

It has been discussed in [3] that the backlight dimming factor may change significantly across consecutive frames for most video applications. The frequent switching of the backlight may introduce an inter-frame brightness distortion to the observer. Hence, it is necessary to reduce frequent backlight switching.

In our study, we observe that the calculated $\alpha_\ast$, although based on an individual image, does not experience huge fluctuations during a video scene, i.e., a group of frames that are characterized with similar content. Actually, the redundancy among adjacent frames constitutes the major difference between the video and the static image application and has long been utilized to achieve higher compression efficiency. Hence, the backlight switching should be smoothed out within the scene and most favorably only happen at the boundary of video scenes.

We propose two supplementary methods to smooth the acquired $\alpha_\ast$ in the same video scene. First, we apply a low-pass digital filter to eliminate any abrupt backlight switching that is caused by the unexpected sharp luminance change. In order to lower the system complexity and avoid the oscillation caused by the signal feedback, we choose the finite impulse response (FIR) filter. The passband frequency is determined by the subjective perception of the "flicker moment" and the frame display rate. For example, if we assume 2 seconds as the "flicker moment" and the frame display rate is 10, we would have a sampling frequency $f_s = 10\text{Hz}$ and a cutting frequency $f_c = 0.5\text{Hz}$. Second, we propose to quantize the number of backlight levels, i.e., any backlight level between two quantization values can be quantized to the closest level, by which we prevent the needless backlight switching for small luminance fluctuations during one scene. For example, for an iPAQ 3650, there are as many as 256 backlight levels. However, it is unnecessary to consider all of these levels as candidates for backlight scaling, as the power saving gains for adjacent levels is negligible. Instead, we may quantize these levels to “N” levels (e.g. we use N=5 in our study). We switch the backlight level only if the calculated $\alpha_\ast$ changes drastically enough, so that it falls into another quantized level.

4 Performance Evaluation

In this section, we introduce our prototype implementation, the methodology of our measurement and the performance of the proposed algorithm.

4.1 Prototype Implementation

We have implemented a prototype of a video streaming system that incorporates our QABS based adaptations. Figure 8 shows a high level representation of our prototype system. Our implementation of the video streaming system consists of a video server, a proxy server and a mobile client. We assume that all communication between the server and the mobile client is routed through a proxy server typically located in proximity to the client.
The video server is responsible for streaming compressed video to the client using RTP [7] over UDP [8]; the proxy server transcodes the received stream, adds the appropriate control information, and relays the newly formed stream to the mobile client (Compaq iPAQ 3650 in our case). For the sake of simplicity and without loss of generality, in our initial prototype implementation, we use the proxy server to also double as our video server, i.e., we use the video data stored on the proxy server for our testing purposes.

The proxy server includes four primary components - the video transcoder, the proposed QABS module, the signal multiplexor, and the communication manager. The transcoder uncompresses the original video stream and provides the pixel luminance information to the QABS module; the QABS module calculates the optimized backlight dimming factor based on the user quality preference feedback received from the client (user). The multiplexor is used to multiplex the optimized backlight dimming information with the video stream. The communication manager is used to send this aggregated stream to the client. It is noted that after quantization, the backlight level can be represented by 3 bits, i.e., 30 bits/s for a display rate of 10 fps, which is a negligible bandwidth overhead.

On the mobile client, the demultiplexor is used to recover the original video stream and the encoded backlight information from the received stream. The LCD control module renders the decoded image onto the LCD display, which is perceptible after the backlight illumination. The backlight information is fed to the “Backlight Adjustment Module”, which concurrently sets the backlight value for the LCD. In particular, users may send the quality request to the proxy when requesting for the video, based on his/her quality preference as well as concern for battery consumption.

4.2 Measurement Methodology

For video quality and power measurements, we use the setup shown in Figure 9. The proxy in our experiments is a Linux desktop with a 1GHz processor and 512MB of RAM. All our measurements are made on a Compaq iPAQ 3650 (running the Linux 2.4.18-rmk3 kernel), with a 200MHz Intel StrongArm processor, with 16MB ROM, 32MB SDRAM. The iPAQ 3650 uses a reflective LCD, whose backlight is 256-level programmable. The iPAQ uses a Cisco 350 Series Aironet 11Mbps wireless PCMCIA network interface card for communication. The batteries are removed from the iPAQ during the experiment. We use a National Instruments PCI DAQ board to sample voltage drops across a resistor and the iPAQ, and sample the voltage at 200K samples/sec. We calculate the instantaneous and average power consumption of the iPAQ using the formula $P_{iPAQ} = \frac{V_{R}}{R} \times V_{iPAQ}$ (Figure 9).

4.3 Experimental Results

In our simulation, we use a video sequence captured from a broadcasted “news” program, whose first frame is shown in Figure 1-a. We choose this video as representative of a typical usage of a PDA. This sequence is composed of a set of scenes, which can be roughly characterized by Figure 10. In Figure 10, we show the basic statistics (i.e., the mean and the variance of luminance per frame) of ABC_news. The capturing rate is 10 frame per second, and the whole sequence lasts 1750 frames.

We assume that the users are given three quality options, fair, good, and excellent, which respectively correspond to the PSNR value of 30dB, 35dB, and 40dB. After applying the algorithm “Proc: FastFind-
Figure 10: Basic statistics of abc_news.

Figure 11: Alfa* adapted to three given quality thresholds.

Figure 12: Alfa* before and after filtering and quantization.

Figure 13: Quality before and after Alfa smoothing.
Alfa∗, we obtain the adapted Alfa∗ for these three quality preferences, as is shown in Figure 11. It can be seen that higher video quality needs higher backlight level on average.

In Figure 12, we show Alfa∗ before and after the backlight smoothing process. It is seen that the small variation and the abrupt change of the backlight switching are significantly eliminated after the filtering and quantization. In addition, as we expected, the backlight switching mostly happens at the boundary of major scenes.

In Table 1, we summarize the results of our QABS. The mean Alfa∗ of different quality preferences produces a quality on average very close to the pre-determined quality threshold. It is noted that different quality requirements result in various power saving gains. Higher quality preference must be traded using more backlight energy. Nevertheless, we can still save 29% energy that is supposed to be consumed by the backlight unit if we set the quality preference to be “Excellent”. Note that the “Excellent” quality, i.e., 40dB in PSNR metric, means nearly unnoticeable quality distortion.

<table>
<thead>
<tr>
<th></th>
<th>Fair</th>
<th>Good</th>
<th>Excellent</th>
</tr>
</thead>
<tbody>
<tr>
<td>Mean</td>
<td>149</td>
<td>162</td>
<td>186</td>
</tr>
<tr>
<td>Quality(dB)</td>
<td>30.17</td>
<td>34.28</td>
<td>42.31</td>
</tr>
<tr>
<td>Power Saving(%)</td>
<td>41.8%</td>
<td>36.7%</td>
<td>27.3%</td>
</tr>
</tbody>
</table>

In Figure 13, we show that the filtering and quantization process may lead to instantaneous quality fluctuation, which is contrasted to the consistent quality before backlight smoothing. Nevertheless, we observe that the quality fluctuation is around the designated quality threshold and mostly happens at scene changes. In follow up studies, we will investigate the perceptual influence of these factors, i.e., the luminance saturation and frequent backlight changes.

5 Conclusion and Future Work

In this report, we apply a backlight scaling technique to video streaming applications, and explicitly associate backlight switching to the perceptual video quality in terms of PSNR. The proposed adaptive algorithm is fast and effective for reducing the energy consumption while maintaining the designated video quality. To reduce the frequency of backlight switching, we propose two supplementary schemes that smooth the backlight switch process such that the user perception of the video stream can be substantially improved. In addition, our proposed algorithms are tested on a prototype system.

In our future work, we plan to work on several ideas that can potentially improve our current approach. We enumerate some of these areas below: (1) We find that PSNR and MSE are not satisfactory indicators for determining the distortion caused by the brightness saturation, and they should be replaced by a metric that incorporates luminance contrast; (2) the low-pass filter is essentially heuristic and should be substituted by a more accurate scene detection technique; (3) the video decoding process in the proxy server results in extra overhead, which may be eliminated if we can explore a method to conduct QABS within the compression domain. Additionally, we plan to incorporate the backlight saving algorithm in an overall system power management framework and design a unified cross-layer power solution for hand-held devices with LCD displays.

6 Acknowledgement

We would like to thank Michael Philpott, who helped us with the experiment setup and the power measurements.

References


Design and Implementation of an IEEE 802.11e HCF Simulation Model in ns-2.27

Inanc Inan (Advisor: Ender Ayanoglu)
Center for Pervasive Communications and Computing
Department of Electrical Engineering and Computer Science
The Henry Samueli School of Engineering
University of California, Irvine
Irvine, California 92697-2625
Email: iinan@uci.edu

I. INTRODUCTION

IEEE 802.11 WLAN standard [1] can be considered as a wireless version of Ethernet, which supports best-effort service at the MAC layer. On the other hand, the widespread use of real-time voice, audio, and video applications makes Quality of Service (QoS) a problem. Therefore, the IEEE working group 802.11 is working on a QoS-enhanced standard, called IEEE 802.11e [2].

The Network Simulator, ns-2, is a discrete-time event-driven simulator [3]. It is an open source project written in C++, with an OTcl interpreter as a frontend. The simulator provides substantial support for wired and wireless networking research. Current distribution of ns-2 includes only the 802.11 Distributed Coordination Function (DCF) model in the MAC layer.

This quarter, with my colleague Feyza Keceli, we constructed a functioning IEEE 802.11e Hybrid Coordination Function (HCF) simulation model for ns-2.27. The model includes both HCF Contention Access (EDCA) and HCF Controlled Access (HCCA) functionalities. All HCF QoS frame exchange sequences defined within the draft 802.11e standard document [2] as well as optional dynamic link setup and immediate block ACK policy implementations are supported.

II. HCF SIMULATION MODEL IN NS-2.27

There are several patches for ns-2 available from the World Wide Web providing PCF or EDCA functionalities in the MAC layer. Up to our knowledge, except for the ns-2.26 EDCA extension of Wietholter [4], the extensions are built on daily snapshots of the previous versions of ns-2.1x, which are no longer available.

First, we verified Wietholter’s implementation to be working as stated in the draft standard document [2]. We duplicated simulation results given in [4] using the ns-2.26 environment. Next, we improved and ported the code to ns-2.27. Ported EDCA extension converts legacy ns-2.27 MAC into a multi-dimensional MAC in order to handle access category based transmission. Our improved implementation extends 802.11 MAC header in ns-2.27 by two bytes for additional QoS control field defined in the 802.11e MAC header. It also enables multiple frame transmission in an EDCA TXOP with RTS/CTS frame exchange functionality on top.

Ns-2.27 MAC layer supports assigning a station as an Hybrid Coordinator (HC). On top of the EDCA model, we have built the HCCA model, which provides the HC previously missing centrally controlled access mechanism. All QoS frame types and QoS frame exchange sequences of the 802.11e draft are implemented. The HC transmits beacon frames, starts contention free periods (CFP) and controls transmissions during CFP according to a polling list generated. The polling and traffic specification (TSPEC) lists are configured manually through the Tcl-simulation script. Any polled station has the opportunity to use TXOP assigned for any traffic category. Any unused portion of TXOPs are always returned to HC. Different than the retransmission schemes defined for the EDCA model, recovery from an unexpected reception in HCCA uses the PHY-CCA.indication flag which only decides on the control of the medium, not the success of the transmission. CAP generation within the CP is also applicable. Moreover, optional dynamic link setup and immediate block ack policy, which increase MAC efficiency, are included in the HCCA model.

Our extension also includes an abstract model of IEEE 802.11g PHY [5] which achieves data rates up to 54 Mbps.

III. CONCLUSIONS AND FUTURE WORK

Our HCF model in ns-2.27 provides an invaluable simulation environment for analyzing and improving 802.11e MAC layer performance. In the very short term, via extensive simulations, the HCF model will be verified to be working properly. IEEE 802.11n PHY parameters, channel models and MIMO modes will be inserted to the ns-2.27 code for a more realistic simulation environment.

REFERENCES

Introduction: In the Report we summarize the progress made in the research on the analysis, design, and implementation of adaptive pre-distorters for linearization and increased power efficiency of Solid State Power Amplifiers (SSPAs) and Traveling Wave Tube Amplifiers (TWTAs). They are used in the mitigation of adverse effects of the high peak-to-average-power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM).

Summary of accomplishments: Our accomplishments in Fall 04 can be partitioned into two parts. The first part is algorithm development for both SSPA/TWTA and second part is about hardware implementation for SSPA. We have ready to submit one journal paper for the first part and one conference paper for the second part.

In the first part, we designed new Pre-Distorters for both TWTAs and SSPAs which significantly mitigate the above problem by pre-compensating the distortion caused by the PAPR. It is based on accurate analytical representations of the amplifier and pre-distorter characteristics. These use very few parameters, which, due to their sparseness, can be captured and tracked in real time. The tracking ability and improvement in system performance using our pre-distorter have been verified by both floating-point and fixed-point simulations, where the latter includes the distortion effects from the hardware. The bit-widths for OFDM signals, ADC and DAC have been evaluated, and the bit-width of 10 is shown to be sufficient for the hardware design. A brief description of these results follows.

Figure 1 shows a simple schematic diagram of an OFDM transmitter. The OFDM base-band (BB) signal is passed through the pre-distorter (PD) to compensate for the degradation that occurs later at the HPA. From the PD the signal is sent through a digital-to-analog converter (DAC) to a modulator, where it is transferred to the pass-band and finally passed to the HPA. Our analytical models of the HPA and PD are accurately described by means of only two parameters (SSPA) or four parameters (TWTA). Our simulation results show very good performance of our PD-based system in capturing and tracking the unknown parameters’ values, and also w.r.t. the BER vs. SNR curve.

Figure 1: Diagram of pre-distorter-based OFDM

Since in real systems, bit-width of OFDM base-band (OFDM BB) and DAC/ADC is limited by cost and design constraints, we performed simulations to investigate the performance improvement by using the pre-distorter, and study the distortion effects caused by saturation, overflow, and quantization with different number of bit-widths. The bit-width of 10 is shown to be sufficient for the hardware design. Specifically, we performed algorithm-level and fixed-point-hardware-level simulations. Fixed-point simulations include all the distortion effects in hardware such as round-off error, coefficient quantization, etc. i.e., hardware performance will align with that of bit-match simulations.

Future work: In the future we will analyze, design, and implement Pre-Distorters for nonlinear HPAs with memory.

References:
The long term goal of the proposed project is to develop a system level methodology for power optimization for SoCs. In the immediate term, the proposed project will investigate techniques for efficient power modeling of SOC bus architectures, as well as of system-level IP blocks, and their use in the architectural exploration of IP-based SOC designs. This project aims at power and cost reduction on system level embedded system design through data storage and transfer optimization, especially on high performance stream-based multimedia applications. Such applications intuitively relate to system level pipeline architectures or data flow architectures. In such architectures, tight pipelining has great potential of improving the utilization of embedded memories and the efficiency of data communication. However, it is widely observed that existing system designs adopt loosely coupled functional modules instead, using large amount of on-chip or off-chip memory as buffers that are poorly utilized.

The reason behind it is that the existing local optimization tools, such as compilers and hardware synthesizers, facilitate modular design approach, while system level optimization tools and methodologies are still absent or in a very primitive stage of development.

To solve this problem, we propose a two-stage exploration flow, coupling both the modeling of IPs, as well as power-performance exploration at the transaction level of modeling. In the HAIM (Hierarchically Abstracted IP modeling) stage, the data access pattern on the interface of each module is extracted, removing the computation detail inside the modules while keeping the flexibility and constraints on data accesses. In DOSE (Data Organization Space Exploration) stage, scheduling schemes are explored and performance, power consumption, and area cost are evaluated so as to selecting the most efficient design. The proposed exploration flow will be based on the COMMEX transaction-level communication architectural framework, on which we will study the H.264 application (the latest video coding standard), and JPEG2000 (the latest still image coding standard).

Progress

Although Sudeep Pasricha was working as an intern at Conexant during FQ04 (and was not supported by a CPCC fellowship last quarter), we met frequently to plan execution of the project, and made significant progress on the project. We are building the models of JPEG2000 encoder and H.264 decoder systems in SystemC with bus-cycle-accurate data interfaces. The finished models are projected to be configurable with different parallelism levels, data scheduling schemes, and bus architecture configurations. During this quarter, we built the base of such model, including manual trials of design scheme exploration, C code of the two standards, and the bus model in SystemC. Our future work will combine these efforts, add multi-configuration features, and reach cycle accuracy on all data transfer interfaces. Going forward, our goals are to:

- Manually extract data access patterns from reference code of JPEG2000 encoder and H.264 decoder, keeping the information of allowed tiling and loop transformation, based on which some system design schemes are outlined, used to guide and verify following research work.
- Build Transaction Level Models of AMBA on SystemC with bus-cycle-accuracy. Study its interface with JPEG2000 encoder and H.264 decoder. Analyze the impact of multi-level, multi-layer bus configurations on overall communication efficiency.
- Study different levels of parallelism in the JPEG2000 encoder and H.264 decoder algorithms. Analyze the capability and configuration of bus architectures needed to support such design schemes. Study its impact on memory sharing.
- Continue the development and refinement of the design exploration workflow.
Advances in CMOS technologies have tremendously improved the integration capability and the intrinsic speed of operation; both prove essential to realize the future generation of ultra high-speed ICs in a single chip. The ever increasing demand of portability and low-cost is the driving force for circuit and system designers to design sensitive analog/RF circuits alongside the large complex digital signal processing components on the same die (cf. Fig. 1). Scaled CMOS devices with device $f_T$ of 100GHz or higher continue to take over the territories once belonged to expensive III-V compound semiconductors. The possibility of designing millimeter-wave CMOS ICs operating at 60-GHz will become a reality in the near future. Some of the design issues including high-frequency device noise and on-chip electromagnetic interference through substrate and common supply/ground lines will have significant impact on the reliability and performance of today and next generation CMOS ICs.

Fig. 1. A block diagram of a state-of-the-art SOC along with a list of new design challenges

Fig. 2 demonstrates the general classification of noise in high-frequency ICs, which includes both random disturbances caused by the device and environmental noise sources. The device noise sources, such as thermal noise, shot noise, and high-frequency gate-induced noise are particularly important in the design of low-noise analog/RF circuits, while being negligible in digital ICs. The environmental noise sources, such as power/ground bounce, substrate noise, and electromagnetic crosstalk are mainly caused by large-signal signal variation of neighboring circuits on the chip. These noise sources propagate through substrate or power/ground rails and seriously degrade both digital and analog circuits fabricated on the same die.
Design of noise-aware mixed analog-digital CMOS circuits entails two major challenges:

a. High-frequency device noise sources, such as gate-induced noise in MOS devices, which have been negligible at low frequencies, tend to become a dominant limiting component of the IC sensitivity at very high frequencies. Moreover, gate-induced noise is correlated with channel thermal noise, adding more complexity to the noise analysis of millimeter-wave analog circuits. As a consequence, it is important to study the high-frequency device noise sources, and characterize the effects of these noise sources in ultra high-frequency analog/RF circuits including the LNAs/mixers and oscillators.

b. Environmental noise propagated through power/ground lines and silicon substrate contain wideband spectral components, thereby seriously degrading the performance of noise-sensitive analog/RF circuit blocks in a high-frequency CMOS front end. Substrate and power/ground noise cause delay uncertainty and operation failure in analog circuits. They induce timing jitter in the phase-locked and delay-locked frequency synthesizers, and degrade the linearity and dynamic range of data converters.

The integration capability of CMOS technology makes it possible to realize a more power-efficient system including the digital back-end and high-speed analog front end on the single die. This trend entails an important challenge. The high-level of interactions between the noisy digital or large-signal analog blocks and other noise sensitive analog/RF circuits through various propagation mechanisms (e.g., substrate), it is highly possible that the signal transients of the large-signal switching circuits corrupt the performance of the analog sub-blocks. Substrate coupling degrades signal integrity in mixed-signal ICs where digital gates or large-signal analog circuits may inject noise into the substrate, especially during clock/signal transitions, introducing hundreds of millivolts of disturbance in the substrate potential. Substrate and power/ground (P/G) noise are thus the most dominant environmental noise sources in mixed analog/digital ICs.

A major component of the P/G noise is the inductive noise. In fact, faster clock speeds and larger number of devices and I/O drivers as dictated by Moore’s Law have resulted in increased amount of this type of noise in the power and ground planes (i.e., also known as the simultaneous switching noise (SSN)). It is a critical and challenging design task to control the amount of inductive noise that is inserted into the power planes. Package pins, bondwires, and on-chip IC interconnects all exhibit parasitic inductances. When an inductor current experiences time-domain variation, a voltage fluctuation is generated across the inductor. This voltage is proportional to the parasitic inductance and the rate of change of the current flowing through the inductor. As a result, when large-signal switching cells in a circuit experience signal transitions, the voltage levels at the power distribution lines of the chip fluctuate. The impact of the P/G bounce becomes most pronounced when a large number of I/O drivers switch simultaneously.