Home
>
Journals & magazines
>
IEE Proceedings - Computers and Digital Technique...
>
Volume 153
Issue 5
IEE Proceedings - Computers and Digital Techniques
Volume 153, Issue 5, September 2006
Volumes & issues:
Volume 153, Issue 5
September 2006
-
- Author(s): W.M. Lim and M. Benaissa
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 291 –301
- DOI: 10.1049/ip-cdt:20050165
- Type: Article
- + Show details - Hide details
-
p.
291
–301
(11)
Subword parallel (SWP) architectures for Galois field multiplication and division over GF(2m) to meet the flexibility against performance requirements of an application-specific instruction set processor for applications within the domain of GF(2m) are presented. Suitable choices of basis, algorithm and architecture are addressed. Techniques for mapping an underlying Galois field arithmetic operation into these SWP architectures are described in the context of suitable well-known GF(2m) division and multiplication algorithms. The results of a detailed complexity analysis undertaken to quantify the configuration overheads as well as employing these in a SWP processor for cryptography are presented. - Author(s): J. Bhadra ; M.S. Abadir ; D. Burgess ; E. Trofimova
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 302 –312
- DOI: 10.1049/ip-cdt:20050204
- Type: Article
- + Show details - Hide details
-
p.
302
–312
(11)
In modern high-performance microprocessors, embedded memories account for approximately half the area and more than 50% of the transistors. Because of their ubiquitous nature, modelling memories remain an immensely important part of the design methodology. Adding to the challenge of memory modelling is a complication that arises from the requirement that the memories need to be modelled for each individual debug methodology – testing, formal verification, validation, emulation and so on. A tool (MemGen) that automates generation of all memory models required by testing, verification, and emulation methodologies is described. MemGen is a robust pivot of our overall design methodology, and is currently in routine use in all live design projects in Freescale Semiconductor's high-performance design centre. Results obtained from using MemGen-generated embedded memories in real-life design projects of Freescale G2 and G4 microprocessors have been presented. - Author(s): E. Atoofian and A. Baniasadi
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 313 –322
- DOI: 10.1049/ip-cdt:20050084
- Type: Article
- + Show details - Hide details
-
p.
313
–322
(10)
Energy-efficiency benefits of bypassing trivial computations in high-performance processors are studied. Trivial computations are those computations whose output can be determined without performing the computation. Bypassing trivial instructions reduces energy consumption while improving performance. The present study shows that by bypassing trivial instructions and for the subset of SPEC'2K and MiBench benchmarks studied here, it is possible to improve energy and energy-delay up to 15.6 and 30.6%, respectively, in an optimistic scenario and by 10.8 and 21.7% in a pessimistic scenario, over a conventional processor. - Author(s): E. Khan ; M. Watheq El-Kharashi ; F. Gebali ; M. Abd-El-Barr
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 323 –334
- DOI: 10.1049/ip-cdt:20050192
- Type: Article
- + Show details - Hide details
-
p.
323
–334
(12)
An emerging system design methodology in designing a reconfigurable HMAC-hash unit is utilised. This methodology directly maps a design described in a high-level language, Handel-C, to field programmable gate array platforms. The Handel-C approach narrows the gap between performance and flexibility and thus, reduces the risk of translating a high-level prototype into hardware description languages. It allows for a high degree of flexibility from two viewpoints: the language level of abstraction and the hardware reconfiguration. A detailed case study is considered: a reconfigurable HMAC-hash unit that implements six standard hash functions: MD5, SHA-1, RIPEMD-160, HMAC-MD5, HMAC-SHA-1 and HMAC-RIPEMD-160. The performance of the designed unit has been enhanced by applying pipelining, parallelism and reconfigurability through the usage of the Handel-C methodology. The use of Handel-C resulted in the HMAC-hash unit architecture that is better in speed than most of the previously designed units. At the same time, the area cost for putting the six standard algorithms on the same hardware core is also kept to a minimum. It is found that the time required to design, implement and test the designed unit using this methodology is reasonably low compared with the time required using other design approaches. - Author(s): J.-T. Yan
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 335 –347
- DOI: 10.1049/ip-cdt:20060077
- Type: Article
- + Show details - Hide details
-
p.
335
–347
(13)
As VLSI circuits are scaled into advanced deep-submicron (DSM) dimensions, interconnection delay plays an important role for any performance-driven design. In general, the techniques of wire sizing and buffer insertion can be further used to reduce the timing delay of any interconnection net. Basically, the concept of uniform wire sizing cannot lead to the timing optimisation of any interconnection net. On the basis of the analysis of optimal wire width on one wire segment, the wire planning with optimal wire widths is proposed to use less routing area to reduce the timing delay of any interconnection net. Furthermore, given a compact floorplan with a set of interconnection nets, on the basis of the analysis of buffer locations on one wire segment and the construction of a recursive buffer-location graph, an area-driven buffer block planning with optimal wire sizing (ABBP_OWS) algorithm is proposed to insert the feasible buffers into the given floorplan for each net without destroying the timing constraint of any routing net, and the time complexity of the proposed ABBP_OWS algorithm is proved to be O(mn2), where m is the number of interconnection nets and n is the number of circuit blocks in the floorplan. Finally, the experimental results show that the proposed ABBP_OWS algorithm uses less routing area and floorplan area to meet more interconnection nets on all the tested benchmark circuits for interconnect-driven floorplanning. - Author(s): P.Y. Hsiao ; X.Z. Chen ; C.C. Lin ; C.H. Hua ; C.C. Chang
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 348 –354
- DOI: 10.1049/ip-cdt:20050200
- Type: Article
- + Show details - Hide details
-
p.
348
–354
(7)
Thinning is a very important operation in the pre-processing stage of fingerprint recognition. With the availability of fast thinning hardware, real-time image processing applications can be achieved. The authors introduce a detailed hardware architecture design of a thinning processor used in an embedded fingerprint recognition system. The proposed thinning algorithm has a parallel-pipelining structure suited to hardware realisation, which is implemented and verified using FPGA. Equipped with a modification unit array, a designated operating schedule, and an address generator based on systolic counter, this thinning processor is able to perform a thinning operation within 0.07 s at 40 MHz for a 512×512 picture, which is at least 40 times faster than software execution. Consequently, the proposed thinning processor was successfully integrated into a real-time fingerprint recognition system. - Author(s): G. Reinman and G. Pitigoi-Aron
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 355 –361
- DOI: 10.1049/ip-cdt:20050161
- Type: Article
- + Show details - Hide details
-
p.
355
–361
(7)
The trace cache is a technique that provides accurate, high bandwidth instruction fetch. However, when a desired instruction trace is not found in the cache, conventional instruction fetch and decode must be used to satisfy the trace request. Such auxiliary fetch hardware can be expensive in terms of energy, area and complexity. An approach to combine a trace cache and conventional instruction fetch hardware using a decoupled design is explored. The design enables the processor to dynamically switch between trace ID and PC-based prediction methods and helps to hide the latency associated with the instruction memory path. The decoupled design with accelerated slow path instruction delivery and no instruction cache is able to provide comparable benefit to a front-end with an 8 kB instruction cache (within 2% of the instructions per cycle with the cache). High tolerance can be demonstrated for both trace table misses and increased memory latency when scaling down the size of the trace table and scaling up the L2 access latency. - Author(s): M. Krstić ; E. Grass ; C. Stahl ; M. Piz
- Source: IEE Proceedings - Computers and Digital Techniques, Volume 153, Issue 5, p. 362 –372
- DOI: 10.1049/ip-cdt:20050210
- Type: Article
- + Show details - Hide details
-
p.
362
–372
(11)
A novel request-driven globally asynchronous locally synchronous (GALS) technique for the system integration of complex digital blocks is proposed. For this new GALS technique, an asynchronous wrapper compliant is developed and evaluated. This proposed GALS technique is applied to a baseband processor compatible with the wireless LAN standard IEEE 802.11a. The developed GALS baseband processor chip is fabricated and measured. Besides improvements of the system integration process, a 5 dB reduction in electromagnetic interference, 30% reduction in instantaneous supply current variation, and similar dynamic power consumption as in the synchronous baseband processor is achieved.
Flexible GF(2m) arithmetic architectures for subword parallel processing ASIPs
Bottom-up approach in automated embedded memory model generation for high-performance microprocessors
Improving energy-efficiency in high-performance processors by bypassing trivial instructions
Applying the Handel-C design flow in designing an HMAC-hash unit on FPGAs
Simultaneous wiring and buffer block planning with optimal wire-sizing for interconnect-driven floorplanning
Employing pipelined thinning architecture for real-time fingerprint verifier
Trace cache miss tolerance for deeply pipelined superscalar processors
System integration by request-driven GALS design
Most viewed content for this Journal
Article
content/journals/ip-cdt
Journal
5
Most cited content for this Journal
We currently have no most cited data available for this content.