Home
>
Journals & magazines
>
IEE Proceedings - Computers and Digital Technique...
>
Volume 153
Issue 4
IEE Proceedings - Computers and Digital Techniques
Volume 153, Issue 4, July 2006
Volumes & issues:
Volume 153, Issue 4
July 2006
-
- Author(s): A.B.T. Hopkins and K.D. McDonald-Maier
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 197 –207
- DOI: 10.1049/ip-cdt:20050194
- Type: Article
- + Show details - Hide details
-
p.
197
–207
(11)
The introduction of complex systems-on-chip (SoC) devices with multiple processor cores presents new challenges for embedded systems developers. Novel development tools specifically targeting complex SoC will help overcome these challenges, but are typically limited by inadequate debug support facilities within the SoC. High-quality debug support with advanced features is essential to take full advantage of complex SoC devices in challenging applications while simultaneously reducing development time. Here, existing strategies for providing comprehensive SoC debug support targeting hard real-time applications, such as automotive control, where development challenges are overwhelming are reviewed. This overview includes an evaluation of the available solutions and their suitability for use with the next generation of complex SoC based on multiple processor cores. It is shown that many existing solutions do not readily permit developers to take advantage of the complex features integrated into the next generation of SoC. The essential features of debug support for multiple processor core SoCs are summarised and discussed. Recommendations are made for SoC designers and for the future direction of research in this area, with the aim of providing a more suitable foundation for new development tools. Such tools are badly needed for all hard real-time embedded systems and are paramount to managing the development complexity introduced by SoC devices with multiple highly interactive processor cores and active peripherals. - Author(s): G. Jervan ; Z. Peng ; T. Shchenova ; R. Ubar
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 208 –216
- DOI: 10.1049/ip-cdt:20050064
- Type: Article
- + Show details - Hide details
-
p.
208
–216
(9)
The energy minimisation problem for system-on-chip testing is addressed. A hybrid built-in self-test architecture is assumed where a combination of deterministic and pseudorandom test sequences are used. The objective of the proposed technique is to find the best ratio of these sequences so that the total energy is minimised and the memory requirements for the deterministic test set are met without sacrificing test quality. Unfortunately, exact algorithms for finding the best solutions to the above problem are computationally very expensive. Therefore, an estimation methodology for fast calculation of the hybrid test set and two different heuristic algorithms for energy minimisation were proposed. Experimental results have shown the efficiency of the proposed approach for finding reduced energy solutions with low computational overhead. - Author(s): T. Bjerregaard and J. Sparsø
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 217 –229
- DOI: 10.1049/ip-cdt:20050067
- Type: Article
- + Show details - Hide details
-
p.
217
–229
(13)
Shared, segmented, on-chip interconnection networks, known as networks-on-chip (NoC), may become the preferred way of interconnecting intellectual property (IP) cores in future giga-scale system-on-chip (SoC) designs. A NoC can provide the required communication bandwidth while accommodating the effects of scaling microchip technologies. Equally important, a NoC facilitates a truly modular and scalable design flow. The MANGO (message-passing asynchronous network-on-chip providing guaranteed services over open core protocol (OCP) interfaces) NoC is presented, and how its key characteristics (clockless implementation, standard socket access points, and guaranteed communication services) make MANGO suitable for a modular SoC design flow is explained. Among the advantages of using clockless circuit techniques are inherent global timing closure, low forward latency in pipelines, and zero dynamic idle power consumption. Time division multiplexing, generally used to provide bandwidth guarantees in clocked NoCs, however, is not possible in a clockless environment. MANGO provides an alternative, high-performance solution to providing hard, connection-oriented service guarantees, using clockless circuit techniques. In-depth circuit details are presented, and the 0.13 µm standard cell implementation of a 5×5 routing node, for use in a mesh type NoC, is described. - Author(s): R. Ebendt and R. Drechsler
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 231 –242
- DOI: 10.1049/ip-cdt:20050181
- Type: Article
- + Show details - Hide details
-
p.
231
–242
(12)
Reduced ordered binary decision diagrams (BDDs) are a data structure for efficient representation and manipulation of Boolean functions. They are frequently used in logic synthesis and formal verification. In recent practical applications, BDDs are optimised with respect to new objective functions. The exact optimisation of BDDs with respect to path-related objective functions is investigated. First, the path-related criteria are studied in terms of sensitivity to variable ordering. Second, a deeper understanding of the computational effort of exact methods targeting the new objective functions is aimed at. This is achieved by an approach based on dynamic programming that generalises the framework of Friedman and Supowit. A prime reason for the computational complexity can be identified using this framework. For the first time, experimental results give the minimal expected path length of BDDs for benchmark functions. They have been obtained by an exact branch and bound method that can be derived from the general framework. The exact solutions are used to evaluate a heuristic approach. Apart from a few exceptions, the results prove the high quality of the heuristic solutions. - Author(s): C. Cao ; M. O'Nils ; B. Oelmann
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 243 –248
- DOI: 10.1049/ip-cdt:20050048
- Type: Article
- + Show details - Hide details
-
p.
243
–248
(6)
An efficient way to obtain finite-state machines (FSMs) with low-power consumption is to partition the machine into two or more sub-FSMs and then use dynamic power management where all sub-FSMs not active are shut down, with the effect of reducing dynamic power dissipation. Thus, FSM partitioning algorithms and register-transfer-level power estimation functions are the main focus of the paper as these are key issues in the design of a computer-aided design tool for synthesis of low-power partitioned FSMs. An implementation architecture is targeted, which is based on both synchronous and asynchronous state memory elements that enable larger power reductions than fully synchronous architectures do. Power reductions of up to 77% have been achieved at a cost of an 18% increase in area. - Author(s): X. Wang and S.G. Ziavras
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 249 –260
- DOI: 10.1049/ip-cdt:20045136
- Type: Article
- + Show details - Hide details
-
p.
249
–260
(12)
Recent advances in multi-million-gate platform field-programmable gate arrays (FPGAs) have made it possible to design and implement complex parallel systems on a programmable chip that also incorporate hardware floating-point units (FPUs). These options take advantage of resource reconfiguration. In contrast to the majority of the FPGA community that still employs reconfigurable logic to develop algorithm-specific circuitry, our FPGA-based mixed-mode reconfigurable computing machine can implement simultaneously a variety of parallel execution modes and is also user programmable. Our heterogeneous reconfigurable architecture (HERA) machine can implement the single-instruction, multiple-data (SIMD), multiple-instruction, multiple-data (MIMD) and multiple-SIMD (M-SIMD) execution modes. Each processing element (PE) is centred on a single-precision IEEE 754 FPU with tightly-coupled local memory, and supports dynamic switching between SIMD and MIMD at runtime. Mixed-mode parallelism has the potential to best match the characteristics of all subtasks in applications, thus resulting in sustained high performance. HERA's performance is evaluated by two common computation-intensive testbenches: matrix–matrix multiplication (MMM) and LU factorisation of sparse doubly-bordered-block-diagonal (DBBD) matrices. Experimental results with electrical power network matrices show that the mixed-mode scheduling for LU factorisation can result in speedups of about 19% and 15.5% compared to the SIMD and MIMD implementations, respectively. - Author(s): P.-Y. Hsiao ; C.-H. Chen ; H. Wen ; S.-J. Chen
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 261 –269
- DOI: 10.1049/ip-cdt:20050199
- Type: Article
- + Show details - Hide details
-
p.
261
–269
(9)
A computational field-programmable gate array (FPGA) realisation for edge detection that is particularly immune to noise by a digital approximated Gaussian smoothing filter is described. The proposed systolic array architecture was examined for convolution operation in order to put simplicity and regularity to the design. Moreover, most of the presented processing structures are highly pipelined, so that the goal of real-time computing is substantially achieved with the processing frame rate reaching up to 280 frames per second. For an efficient hardware mapping, the absolute difference mask algorithm was adopted because of its regularity and independent operations, as well as its important property of performing one-pixel-edge localisation. A scalable first in, first out (FIFO) design was also proposed to make the edge detector applicable to five different image sizes. The FPGA realisation on the presented versatile development platform shows that the proposed design improves both the speed and the hardware usage. This is attributed to the utilisation of the proposed parallel and pipelined structure so that a fast operating speed of 73.6 MHz, which is about 265 times faster than the digital signal processing environment, is obtained in the present investigation. - Author(s): E. Özer ; R. Sendag ; D. Gregg
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 270 –282
- DOI: 10.1049/ip-cdt:20050160
- Type: Article
- + Show details - Hide details
-
p.
270
–282
(13)
The viability of bus interconnection models is explored, using the multiple-valued logic (MVL) paradigm to reduce the cost and energy consumption of off-chip and on-chip address, data and instruction buses within system-on-a-chip platforms. Data can be transferred over the buses using ternary, balanced ternary or quaternary number systems, rather than binary. This allows more compact bus design with a fewer number of bus lines, which can result in lower input/output pin cost for off-chip buses. Reducing the number of bus lines also allows us to increase the distance between the adjacent bus lines using the same silicon area. This further reduces interwire capacitance and may lead to significant on-chip bus energy reduction for low-power embedded systems. First, a combinatorial probabilistic view of digit transition patterns in binary and MVL number systems is provided. This is followed by an empirical study conducted by running various applications to measure bus switching activities as well as total bus energy consumption of real-world applications. It is observed that the number of bus transitions in a multiple-valued bus, particularly in a quaternary bus, is significantly less than the number of bus transitions in a binary bus. Our experimental results show that MVL bus models, replacing the binary equivalent, can be viable interconnection structures and are able to provide up to 29, 29 and 30% reduction in energy consumption for off-chip address, data and instruction buses, respectively. These savings are 55, 53 and 62% for on-chip quaternary address, data and instruction buses, respectively using 0.25 µm technology. - Author(s): F.R. Boyer ; H.G. Epassa ; Y. Savaria
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 4, p. 283 –290
- DOI: 10.1049/ip-cdt:20050170
- Type: Article
- + Show details - Hide details
-
p.
283
–290
(8)
A variable speed processor (VSP) that can adjust its clock period at each cycle, according to the instruction flow in a pipelined program, is presented. This allows performance enhancement and energy consumption reduction, which is an important consideration for the next generation of embedded processor designs. With little change to the standard synchronous design, speed can be enhanced without increasing energy or speed can be maintained with energy savings. The VSP concept is validated by coupling a Nios® processor with a variable period clock synthesiser (VPCS). No modifications to the core other than extracting internal signals from the pipeline are needed to control the VPCS. The VPCS cleanly switches between period lengths at each cycle, over a wide range of possible lengths and with any resolution depending on available clock phases. One VPCS design, in CMOS 0.18 µm, consumes less than 10 µW/MHz and is able to instantly switch inside the 4–250 MHz range. The VSP design is implemented with the Altera® Embedded System platform, in its Stratix® FPGA. With the proposed method, the dynamic energy consumed per program loop is reduced by 14%, while the processing time is reduced by 3.6% compared to the original standard Nios® processor running the same program at its maximum frequency (133 MHz).
Debug support for complex systems on-chip: a review
Hybrid BIST energy minimisation technique for system-on-chip testing
Implementation of guaranteed services in the MANGO clockless network-on-chip
Exact minimisation of path-related objective functions for binary decision diagrams
Synthesis tool for low-power finite-state machines with mixed synchronous/asynchronous state memory
Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration
Real-time realisation of noise-immune gradient-based edge detector
Multiple-valued logic buses for reducing bus energy in low-power systems
Embedded power-aware cycle by cycle variable speed processor
Most viewed content for this Journal
Article
content/journals/ip-cdt
Journal
5
Most cited content for this Journal
We currently have no most cited data available for this content.