Unlike conventional synchronous circuits, asynchronous circuits are not coordinated by a clocking signal, but instead use handshaking protocols to control circuit behaviour. Asynchronous circuits have been found to offer several advantages, including high energy efficiency, flexible timing requirements, high modularity, low noise/EMI, and robustness to PVT variations. At the same time, growing pressures on the electronics industry for ever smaller, more efficient ICs are pushing the limits of conventional circuit technologies. These factors are spurring growing interest in asynchronous circuits amongst both the academic research and commercial RD communities. This book introduces a wide range of existing and potential applications for asynchronous circuits, each accompanied with the corresponding circuit design theory, sample circuit implementations, results, and analysis. It serves as an essential guide for academic researchers and students looking to broaden their thinking in advancing asynchronous applications and design methodologies, and provides practical advice to industrial engineers when considering the incorporation of asynchronous circuits in their own applications.
Inspec keywords: radiation hardening (electronics); field programmable gate arrays; timing circuits; formal verification; network-on-chip; asynchronous circuits; analogue circuits; high-speed integrated circuits
Other keywords: asynchronous sensing; field-programmable gate arrays; network-on-chips; formal verification; ultra-low supply voltages; NCL design tools; asynchronous circuits; extreme temperatures; power-performance balancing; analog electronics; high-speed circuits; clock distribution networks; radiation hardness; logic design
Subjects: Network-on-chip; General electrical engineering topics; Logic circuits; Analogue circuit design, modelling and testing; Analogue processing circuits
The chapter introduces the concept of asynchronous circuits by considering the advantages of asynchronous circuits and presents an overview of their circuit applications.
We have presented the appropriateness of the QDI (and pseudo-QDI) asynchronous -logic design approach to realize circuits and systems suitable for full -range DVS (from the nominal voltage near- V t voltage sub- V t voltage regions). Both block -level and gate -level pipeline structures have been presented. Using the block -level pipeline structure, we have presented an SSAVS system embodying block -level QDI asynchronous pipelines for a WSN with the objective of lowest possible power operation for the prevailing throughput and circuit conditions-V DD adjusted to within 50 mV of the minimum voltage, yet high operational robustness with minimal overheads. High robustness has been achieved by adopting the asynchronous QDI protocols, and the embodiment of our proposed PCSL. A reduced -overhead design has further been shown by adopting the asynchronous pseudo-QDI protocols, and the embodiment of PCSL. Using the gate -level pipeline structure, we have presented our proposed SABB cell design approach and evaluated an asynchronous QDI KS pipeline adder embodying SABB for full -range DVS operation. In summary, we show that QDI (and pseudo-QDI) asynchronous -logic, coupled with either PCSL or SABB cell design approaches, provides a low-cost high -reliability solution for circuits and systems exclusively designed for error free DVS.
Asynchronous circuit designed on delay-insensitive NCL and multithreshold CMOS techniques inherits the benefit on power reduction but degrades the speed. Circuit pipeline and parallel architecture are applied to migrate the performance drawback. In the first part of the chapter, the throughput and latency of the NCL micropipeline are derived for the digital signal processing circuit optimization, including an example on generic FIR design with same performance as its synchronous count part. Scalable parallel computing architecture that incorporates homogeneous units is designed in Section 3.2 for performance escalation. Besides that, DVS achieves balanced control of performance and power consumption. An effective fullness variance predicting algorithm is implemented to employ the DVS more aggressively in a wider range of system workloads. The platform fabricated using the MITLL 90 nm process consumes 49.364 pJ per data with the best performance when the DATA to DATA cycle time is 6.02 μ,s. The schemes on fme-grain core states control and heterogeneous architecture are presented as research topics on power -performance balancing. Core enable and disabling sequence and fi ne-grain state control earns the maximum benefit of DVS. Common data I/O ports with NULL cycle reduction and asynchronous arbitration network are incorporated in the heterogeneous platform to make a highly modular interface for both horizontal and vertical scaling. Those methodologies demonstrate the advantage of asynchronous circuit in large scale, multithreads and scalable computing applications.
Modern digital systems based on complementary metal-oxide-semiconductor (CMOS) integrated circuits (IC) are increasingly sensitive to power consumption and heat generation which has direct impact on the system's performance and reliability. Power consumption of a system can be effectively reduced by techniques such as supply voltage scaling, downsizing transistors, or limiting switching activity. Supply voltage scaling is amidst one of the most effective way to reduce power dissipation. The continuance reduction of supply voltage will require transistors to operate in subthreshold region. Process technology developed with transistors optimized for subthreshold operation offers the essential building blocks to construct digital systems that are capable of operating at ultra-low supply voltage and consuming significantly less power.
Very few integrated circuits, whether they are microprocessors or ASICs, are designed with purely digital or analog components. Mixed-signal circuits and systems straddle this divide, and this chapter examines how analog components may be interfaced with asynchronous logic. This chapter presents two example systems, which cover three methods of closing the feedback loop to maintain asynchronous operation. In the first example, an asynchronous serializer/deserializer (SerDes), the analog components have known end-states and are physically included in the loop of the asynchronous logic stage. Since the completion detection occurs directly, this system maintains its quasidelay-insensitive (DI) operation. For the second example, a successive approximation analog-to-digital converter (SAR ADC), the circuit cannot be fully included in the loop, and two options for maintaining asynchronous operation are described. Due to the nature of the authors' interests and respective circuit applications, the example circuits also work in spatially distributed implementations. In these cases, different parts of the circuit may have significantly different operating environments, and may even be designed in different IC fabrication processes. Some readers of this chapter will be analog or mixed-signal designers who are curious about how delay insensitive logic can be used in their work. For that audience, a brief overview of the ring oscillator metaphor is presented here and will clarify the behaviors described later in the examples.
In this chapter, we have presented a few examples of designs from the domain of sensors that illustrate the promise of asynchronous techniques in the design of sensing systems. Such systems often impose strict performance requirements. For example, in wide area distributed networked sensors, there are constraints of tight energy budgets and low area costs. In the case of image sensors, a large array of tiny sensory pixels are packed into a single chip, and pose performance challenges if rigid global synchronization is used. Since transduction is usually followed by analog -to -digital conversion (ADC) and digital signal processing (DSP), these processing techniques must also be specialized to be highly energy efficient. Furthermore, sensing often involves long idle periods, so all sensing systems must have very low idle power consumption. In this chapter, we saw examples of frameless image sensors, asynchronous sensor processors, and continuous -time ADC and DSP, all of which exemplify the power and promise of asynchronous design in the fi eld of sensing.
This chapter explores the design and test of high-speed complementary metal oxide semiconductor (CMOS) self -timed circuits. Section 7.1 describes how the properties of CMOS technology itself limit how fast a self -timed circuit can run. Section 7.2 presents our Link and Joint model, a unified point of view of self -timed circuits that allows reasoning about them independently of circuit families and handshake protocols. The model separates communication and storage, done in Links, from computation and fl ow control, done in Joints. The model also separates actions from states. Special go signals enable or disable Joint actions on an individual basis. The individual go signals make it possible to initialize, start, and stop self -timed operations reliably, which is crucial for design as well as for at -speed test, debug, and characterization. Section 7.3 examines design and test aspects of the Weaver, a self -timed nonblocking 8 x 8 crossbar switch designed using the Link and Joint model. We report measured test results from a working Weaver chip in 40 nm CMOS with speeds up to 6 Giga data items per second. With 72 bit wide data items, this amounts to 3.5 Tera bits per second for the full crossbar.
In this chapter, different GALS approaches for the implementation of embedded NoC architectures were presented. The GALS approach allows for the reduction of the resource requirements at an increased scalability of the NoC without sacrificing performance. The three approaches of synchronous, mesochronous, and asynchronous NoCs were compared. For the mesochronous NoC special synchronizers between the links were implemented. For the asynchronous NoC, the routers were completely realized as an asynchronous circuits. The results have shown that modern design methods (CCOpt design fl ow) allow a good scaling of MPSoCs even for synchronous NoCs. Nevertheless, the asynchronous NoC showed lower area and energy requirement compared to the mesochronous and synchronous implementation, while still providing a comparable performance. When comparing a place and route of an MPSoC, the asynchronous NoC leads to 3.1% less area requirements. The power consumption of an asynchronous router is only 22.4% (0.94 mW in idle state) or 53% (3.94 mW during communication) of the power consumption of a clock -based router. In the last section of the chapter, the global clock tree for an MPSoC with 256 CPUs was examined. The synchronous and mesochronous NoC show almost the same power consumption of about 7.7 mW. Using the asynchronous NoC reduces the power consumption by about 25% (5.78 mW). In addition, the mesochronous and asynchronous variants achieve a 2.6% higher clock frequency.
Field-programmable gate arrays (FPGAs) are chips that can be electronically programmed to function as an arbitrary digital circuit or system. They were originally used to replace discrete gates in interface electronics, and over the past three decades have evolved to being used in the place of application-specific integrated circuits (ASICs) in low volume and cost-constrained situations. Modern commercially available FPGAs are sophisticated integrated circuits capable of implementing digital chips with millions of gates. In addition, some of them also have special-purpose I/O macros to support memory interfaces, as well as serial links to support high-throughput communication. FPGAs are widely used to prototype digital logic. This chapter discusses some of the challenges with using standard FPGAs to prototype asynchronous logic and summarizes research efforts that have created alternate FPGA architectures for asynchronous logic.
This chapter presented successful physical testing results of multiple NCL circuit designs of varying size and complexity across a very large temperature range. For high-temperature applications, a SiC process developed by Raytheon was leveraged and exhibited circuits functioning at temperatures exceeding 500 °C. For low temperature applications, the industry standard IBM 0.5 μm SiGe process was leveraged and exhibited circuits functioning as temperatures approached absolute zero. Through all these tests, the NCL circuits required no special considerations (due to environmental effects on the device level) to maintain correct operation across these wide temperature swings. In the same conditions, synchronous systems would require significant effort (either through complex logical design changes or physical setup considerations) in order to meet their timing constraints which always leads to a large amount of overhead incurred. These results have proven the flexibility and robustness advantage that asynchronous systems have over synchronous designs.
Asynchronous circuits are inherently suitable for radiation-exposed environments due to their quasidelay insensitivity (QDI) and multirail logic systems. If an ionizing -radiation event is detected, the QDI property provides the ability to delay the current operation within the circuit until the effect has subsided. The dual -rail design provides additional support in this area because in many cases both rails must be affected in order for an SEU to occur. In addition to mitigating SEEs through asynchronous circuit -level architectures, radiation hardening techniques can be applied to transistor -level layout designs and circuit components, such as the DFF, for increased reliability.
Side channel attacks (SCAs) remain a great threat to hardware security. In most CMOS circuitries, electrical behaviors are correlated to processed data which makes them vulnerable to SCAs. Dual-rail circuitries present an advantage in mitigating SCAs due to the inherited balance in data representation. NCL circuits present more stable power traces compared to industry standard synchronous counterparts; however, NCL circuits are still vulnerable to some SCAs due to the lack of balance in data propagation. In this chapter, the vulnerability of NCL circuits to SCAs is explained, and more secure dual-rail design methodologies are presented. Derived from NCL, dual-spacer dual-rail delay-insensitive logic or D3L methodology produces crypto hardware with great resilience against SCAs. D3L resilience, overheads associated with it as well as improved methodologies for overhead reduction are explained in this chapter.
In this chapter, we discuss the use of asynchronous clock distribution networks (ACDNs) to provide the timing for SFQ circuits. In particular, we review the hierarchical chains of homogeneous clover-leaves clocking, or (HC)2 LC [1], a self-adaptive clocking technique designed to be resilient in such uncertain environments. (HC)2 LC inherits its robustness from its asynchronous nature that adapts to the spatially correlated cell delays, trading-off reasonable area, and power overheads for higher reliability and improved scalability.
Uncle (Unified NULL Convention Logic Environment) is a tool for creating NULL Convention Logic (NCL) designs, which can be downloaded for free. * This chapter discusses Uncle internals and a detailed walk-through of an example design.
In this chapter, the author describe an equivalence checking methodology for NULL Convention Logic (NCL) circuits. Note that currently, there are no commercial equivalence checkers for quasi-delay insensitive (QDI) circuits. For commercial applications, NCL circuits, and QDI circuits in general, are often synthesized from synchronous intellectual property designs. The resulting NCL design may then be further optimized and tinkered with. Therefore, the author have designed an equivalence checker that can be used in two ways: (1) to verify the functional equivalence of two NCL designs and (2) to verify the equivalence between an NCL design and a synchronous design.
The chapter highlights a set of applications where asynchronous circuits outperform their synchronous counterparts due to one or more of their advantages, such as no clock tree, flexible timing requirement, robust operation, improved performance, high energy efficiency, high modularity and scalability, and low noise and emission.