Digital storage
More general concepts than this:
More specific concepts than this:
Filter by subject:
- Electrical and electronic engineering [421]
- Circuit theory and circuits [421]
- Electronic circuits [421]
- Digital electronics [421]
- Computer and control engineering [421]
- Computer hardware [421]
- Computer storage equipment and techniques [421]
- Digital storage [421]
- Memory circuits [408]
- Semiconductor storage [386]
- [186]
- http://iet.metastore.ingenta.com/content/subject/b2500,http://iet.metastore.ingenta.com/content/subject/b2570,http://iet.metastore.ingenta.com/content/subject/c5100,http://iet.metastore.ingenta.com/content/subject/b2570d,http://iet.metastore.ingenta.com/content/subject/b0000,http://iet.metastore.ingenta.com/content/subject/b1265b,http://iet.metastore.ingenta.com/content/subject/c5120,http://iet.metastore.ingenta.com/content/subject/c5200,http://iet.metastore.ingenta.com/content/subject/b1265a,http://iet.metastore.ingenta.com/content/subject/b6000,http://iet.metastore.ingenta.com/content/subject/b6100,http://iet.metastore.ingenta.com/content/subject/c1000,http://iet.metastore.ingenta.com/content/subject/b0200,http://iet.metastore.ingenta.com/content/subject/b1265f,http://iet.metastore.ingenta.com/content/subject/b2570a,http://iet.metastore.ingenta.com/content/subject/c1100,http://iet.metastore.ingenta.com/content/subject/b0100,http://iet.metastore.ingenta.com/content/subject/c5130,http://iet.metastore.ingenta.com/content/subject/b0170,http://iet.metastore.ingenta.com/content/subject/c6000,http://iet.metastore.ingenta.com/content/subject/c6100,http://iet.metastore.ingenta.com/content/subject/c5400,http://iet.metastore.ingenta.com/content/subject/c5210,http://iet.metastore.ingenta.com/content/subject/b0170n,http://iet.metastore.ingenta.com/content/subject/b2560,http://iet.metastore.ingenta.com/content/subject/c5310,http://iet.metastore.ingenta.com/content/subject/c5260,http://iet.metastore.ingenta.com/content/subject/c7000,http://iet.metastore.ingenta.com/content/subject/b2550,http://iet.metastore.ingenta.com/content/subject/b3000,http://iet.metastore.ingenta.com/content/subject/b6120,http://iet.metastore.ingenta.com/content/subject/c7400,http://iet.metastore.ingenta.com/content/subject/b0240,http://iet.metastore.ingenta.com/content/subject/b3100,http://iet.metastore.ingenta.com/content/subject/b3120,http://iet.metastore.ingenta.com/content/subject/c1140,http://iet.metastore.ingenta.com/content/subject/c7410,http://iet.metastore.ingenta.com/content/subject/c4000,http://iet.metastore.ingenta.com/content/subject/c5220,http://iet.metastore.ingenta.com/content/subject/c5470,http://iet.metastore.ingenta.com/content/subject/a,http://iet.metastore.ingenta.com/content/subject/a8000,http://iet.metastore.ingenta.com/content/subject/b1210,http://iet.metastore.ingenta.com/content/subject/b2570f,http://iet.metastore.ingenta.com/content/subject/c5440,http://iet.metastore.ingenta.com/content/subject/c7410d,http://iet.metastore.ingenta.com/content/subject/a8600,http://iet.metastore.ingenta.com/content/subject/a8620,http://iet.metastore.ingenta.com/content/subject/a8620w,http://iet.metastore.ingenta.com/content/subject/b2100,http://iet.metastore.ingenta.com/content/subject/b6120b,http://iet.metastore.ingenta.com/content/subject/c5600,http://iet.metastore.ingenta.com/content/subject/b2800,http://iet.metastore.ingenta.com/content/subject/b7000,http://iet.metastore.ingenta.com/content/subject/c5320e,http://iet.metastore.ingenta.com/content/subject/b1100,http://iet.metastore.ingenta.com/content/subject/b2560r,http://iet.metastore.ingenta.com/content/subject/b6135,http://iet.metastore.ingenta.com/content/subject/b6140,http://iet.metastore.ingenta.com/content/subject/c5260b,http://iet.metastore.ingenta.com/content/subject/c5340,http://iet.metastore.ingenta.com/content/subject/b1130,http://iet.metastore.ingenta.com/content/subject/b1130b,http://iet.metastore.ingenta.com/content/subject/b1220,http://iet.metastore.ingenta.com/content/subject/b1265h,http://iet.metastore.ingenta.com/content/subject/b3120j,http://iet.metastore.ingenta.com/content/subject/b6200,http://iet.metastore.ingenta.com/content/subject/c5320z,http://iet.metastore.ingenta.com/content/subject/c6120,http://iet.metastore.ingenta.com/content/subject/c6130,http://iet.metastore.ingenta.com/content/subject/b0240g,http://iet.metastore.ingenta.com/content/subject/b0240z,http://iet.metastore.ingenta.com/content/subject/b2200,http://iet.metastore.ingenta.com/content/subject/b2550r,http://iet.metastore.ingenta.com/content/subject/b2860,http://iet.metastore.ingenta.com/content/subject/b2860f,http://iet.metastore.ingenta.com/content/subject/b7200,http://iet.metastore.ingenta.com/content/subject/c1140g,http://iet.metastore.ingenta.com/content/subject/c1140z,http://iet.metastore.ingenta.com/content/subject/c4200,http://iet.metastore.ingenta.com/content/subject/c5110,http://iet.metastore.ingenta.com/content/subject/e,http://iet.metastore.ingenta.com/content/subject/b2120,http://iet.metastore.ingenta.com/content/subject/b8000,http://iet.metastore.ingenta.com/content/subject/c5180,http://iet.metastore.ingenta.com/content/subject/c5230,http://iet.metastore.ingenta.com/content/subject/c5610,http://iet.metastore.ingenta.com/content/subject/e1000,http://iet.metastore.ingenta.com/content/subject/b0250
- b2500,b2570,c5100,b2570d,b0000,b1265b,c5120,c5200,b1265a,b6000,b6100,c1000,b0200,b1265f,b2570a,c1100,b0100,c5130,b0170,c6000,c6100,c5400,c5210,b0170n,b2560,c5310,c5260,c7000,b2550,b3000,b6120,c7400,b0240,b3100,b3120,c1140,c7410,c4000,c5220,c5470,a,a8000,b1210,b2570f,c5440,c7410d,a8600,a8620,a8620w,b2100,b6120b,c5600,b2800,b7000,c5320e,b1100,b2560r,b6135,b6140,c5260b,c5340,b1130,b1130b,b1220,b1265h,b3120j,b6200,c5320z,c6120,c6130,b0240g,b0240z,b2200,b2550r,b2860,b2860f,b7200,c1140g,c1140z,c4200,c5110,e,b2120,b8000,c5180,c5230,c5610,e1000,b0250
- [172],[142],[107],[80],[68],[68],[62],[59],[52],[47],[39],[39],[37],[35],[34],[32],[31],[29],[28],[27],[27],[25],[24],[23],[23],[22],[21],[20],[18],[17],[17],[17],[16],[16],[16],[16],[16],[14],[14],[14],[13],[13],[13],[13],[13],[13],[12],[12],[12],[12],[12],[12],[11],[11],[11],[10],[10],[10],[10],[10],[10],[9],[9],[9],[9],[9],[9],[9],[9],[9],[8],[8],[8],[8],[8],[8],[8],[8],[8],[8],[8],[8],[7],[7],[7],[7],[7],[7],[6]
- /search/morefacet;jsessionid=cdbi5dp102hqt.x-iet-live-01
- /content/searchconcept;jsessionid=cdbi5dp102hqt.x-iet-live-01?operator4=AND&pageSize=100&sortDescending=true&facetNames=pub_concept_facet+pub_concept_facet+pub_concept_facet+pub_concept_facet&value3=c&value4=c5300&value1=c5320&value2=b1265&facetOptions=2+3+4+5&option1=pub_concept&option2=pub_concept_facet&option3=pub_concept_facet&option4=pub_concept_facet&sortField=prism_publicationDate&operator3=AND&operator2=AND&operator5=AND&option5=pub_concept_facet&value5=
- See more See less
Filter by content type:
Filter by publication date:
- 2016 [22]
- 2013 [20]
- 2011 [18]
- 2018 [18]
- 2010 [16]
- 2012 [16]
- 2019 [16]
- 2005 [14]
- 2008 [14]
- 2017 [14]
- 2015 [13]
- 1998 [12]
- 2003 [12]
- 2006 [12]
- 1997 [11]
- 2007 [11]
- 2009 [11]
- 2004 [10]
- 2014 [10]
- 1991 [8]
- 1989 [6]
- 1990 [6]
- 1992 [6]
- 1993 [6]
- 1995 [6]
- 2001 [6]
- 1996 [5]
- 1999 [5]
- 2000 [5]
- 1985 [4]
- 1986 [4]
- 1987 [4]
- 1994 [4]
- 1982 [3]
- 1983 [3]
- 1988 [3]
- 1979 [2]
- 2002 [2]
- 1972 [1]
- 1975 [1]
- 1976 [1]
- 1977 [1]
- 1978 [1]
- 1981 [1]
- 1984 [1]
- See more See less
Filter by author:
- Chunyu Peng [4]
- H.J. Mattausch [4]
- K.-S. Chung [4]
- S. Kim [4]
- T. Kim [4]
- A.A.M. Amin [3]
- B.R. Wilkins [3]
- Bai-Sun Kong [3]
- H. Bahn [3]
- J. Kim [3]
- J. Park [3]
- Jun Zhang [3]
- L.G. Johnson [3]
- M.-K. Lee [3]
- P.C. Liu [3]
- P.M. Carter [3]
- S. Lee [3]
- S.M.S. Jalaleddine [3]
- Supriya Karmakar [3]
- Wenjuan Lu [3]
- Xiulong Wu [3]
- Xuan Li [3]
- Y.H. Song [3]
- Young-Hyun Jun [3]
- Zhiting Lin [3]
- A. Losavio [2]
- A. Thiede [2]
- A. Vigilante [2]
- Aminul Islam [2]
- B.-G. Park [2]
- C. Chappert [2]
- C. Edwards [2]
- C. Maglaras [2]
- C. Toumazou [2]
- C.M. Choi [2]
- Chuck Yoo [2]
- D. Park [2]
- D.W. Langer [2]
- Dong-SunMin [2]
- E. Bushehri [2]
- E. Lee [2]
- E.K.F. Lee [2]
- F. Lombardi [2]
- Fanfan Shen [2]
- Faquir C. Jain [2]
- Feng Zhang [2]
- G. Campardo [2]
- G. Ghibaudo [2]
- G. Ripamonti [2]
- G.A. Constantinides [2]
- G.P. Vanalli [2]
- Ghasem Pasandi [2]
- H. Chen [2]
- H. Kim [2]
- H. Li [2]
- H. Shin [2]
- H. Wang [2]
- Hee Chul Lee [2]
- Huang Wei [2]
- Hyunchan Park [2]
- I. Shin [2]
- J. Jang [2]
- J.-O. Klein [2]
- J.H. Park [2]
- J.S. An [2]
- J.S. Choi [2]
- Jae-Goo Lee [2]
- Jiexin Luo [2]
- Jing Chen [2]
- Junning Chen [2]
- K.-W. Kwon [2]
- Kyeong-Sik Min [2]
- Massoud Pedram [2]
- N.C. Battersby [2]
- Nemat H. El-Hassan [2]
- O. Kim [2]
- P. Fleming [2]
- P. Ostropolski [2]
- P. Pulici [2]
- P.G. Gulak [2]
- P.P. Stoppino [2]
- P.Y.K. Cheung [2]
- Qing'an Li [2]
- R.M. Lea [2]
- S. Choi [2]
- S. Nakata [2]
- S. Yoo [2]
- S.K. Saha [2]
- S.W. Chung [2]
- Soumitra Pal [2]
- T. Hayashi [2]
- T. Lessio [2]
- T. Yoshimura [2]
- V. Avendaño [2]
- V.H. Champac [2]
- Vivek Gupta [2]
- Wan Wang Gen [2]
- Weiwei He [2]
- Wing Hung Ki [2]
- Woo Young Kim [2]
- See more See less
Filter by access type:
Vertical cell transistor is necessary to drastically reduce the chip size of the dynamic random access memory. This structure has a great advantage in terms of shrinkage, but it also has the disadvantage of increasing the OFF-state current by causing floating body effect (FBE). For the first time, it is demonstrated that a stretched tunnelling diode, which consists of a p+ layer next to the n+ active layer in the buried body, leads to a drastically suppressed FBE. The OFF-state current is sharply reduced by about seven orders compared with a conventional structure. Furthermore, the decrease in the OFF-state current is at minimum when the length of the stretched p + region is approximately half the channel length (L p/L=1/2).
Large-scale machine-learning (ML) algorithms require extensive memory interactions. Managing or reducing data movement can significantly increase the speed and efficiency of many ML tasks. Towards this end, the authors devise an energy efficient in-memory computing (IMC) kernel for linear classification and design an initial prototype. The authors achieve a power savings of over 6.4 times than a conventional discrete system while improving reliability by 54.67%. The authors employ a split-data-aware technique to manage process, voltage, and temperature variations and to achieve fair trade-offs between energy efficiency, area requirements, and accuracy. The authors utilise a trimodal architecture with a hierarchical tree structure to further decrease power consumption. The authors also explore alternatives to the hierarchical tree structure with a significantly reduced number of linear regression blocks, while maintaining a competitive classification accuracy. Overall, the scheme provides a fast, energy efficient, and competitively accurate binary classification kernel.
With the increase in processing cores performance have increased, but energy consumption and memory access latency have become a crucial factor in determining system performance. In tiled chip multiprocessor, tiles are interconnected using a network and different application runs in different tiles. Non-uniform load distribution of applications results in varying L1 cache usage pattern. Application with larger memory footprint uses most of its L1 cache. Prefetching on top of such application may cause cache pollution by evicting useful demand blocks from the cache. This generates further cache misses which increases the network traffic. Therefore, an inefficient prefetch block placement strategy may result in generating more traffic that may increase congestion and power consumption in the network. This also dampens the packet movement rate which increases miss penalty at the cores thereby affecting Average Memory Access Time (AMAT). The authors propose an energy-efficient caching strategy for prefetch blocks, ECAP. It uses the less used cache set of nearby tiles running light applications as virtual cache memories for the tiles running high applications to place the prefetch blocks. ECAP reduces AMAT, router and link power in NoC by 23.54%, 14.42%, and 27%, respectively as compared to the conventional prefetch placement technique.
This study presents a new energy-efficient design for static random access memory (SRAM) using a low-power input data encoding and output data decoding stages. A data bit reordering algorithm is applied to the input data to increase the number of 0s that are going to be written into the SRAM array. Using SRAM cells which are more energy-efficient in writing a ‘0’ than a ‘1’ benefits from this, resulting in a reduction in the total power and energy consumptions of the whole memory. The input data encoding is performed using a simple circuit, which is built of multiplexers and inverters. After the read operation, data will be returned back to its initial form using a low-power data decoding circuit. Simulation results in an industrial and a predictive CMOS technology show that the proposed design for SRAM reduces the energy consumption of read and write operations considerably for some standard test images as input data to the memory. For instance, in writing pixels of Lenna test image into this SRAM and reading them back, 15 and 20% savings are observed for the energy consumption of write and read operations, respectively, compared with the normal write and read operations in standard SRAMs.
Using conventional memory technologies, for example, static random access memory (RAM) (SRAM), dynamic RAM (DRAM) and flash memory, it is difficult to fulfill the market requirements for higher density and lower power dissipation [1]. Therefore, semiconductor organizations are thinking that it is difficult to supply the expanding market interest for the higher density and lower power nonvolatile memories [2]. The recent invention of memristor device has given hope to semiconductor organizations by offering a less demanding approach to expand the density by utilizing the current fabrication technology [3]. This is conceivable on the grounds that memristor devices just require two terminals to work, which utilize less wafer space, reduce the complexity of circuit interconnections and encourage highdensity integration when used as a part of crossbar structures [4-7]. Besides all these features of memristor, it also has some additional characteristics like low power and non-volatility [8]. But the main limitation of the memristor-based memory cell is its slow write time access [9]. Transmission gate is capable of providing rail-to-rail swing and can easily pass both logic “0” and logic “1” [10]. These advantages help to overcome the problem of slow write time access of memristor. The objective of this chapter is to understand what a memristor is and how can a memristor be modeled for its current-voltage (I-V) characteristics. Further, this chapter deals with the concepts of transmission gates, then using the designed memristor and transmission gates, a DRAM cell was designed. The designed memory cell was simulated using HSPICE tool. The result shows that the memristor-based DRAM cell can replace the conventional memory cell in future to achieve higher density and lower power dissipation.
This chapter explores the design space of proposed M7T, MPT8T, M8T, M9T and MI-12T SRAM cells implemented at 45 nm technology node which are suitable for subthreshold operation. For quick comparison, Figure 3.45 shows the comparative design space exploration (DSE) chart of SRAM cells at 45 nm technology, respectively. The thorough analyses on the impacts of read stability, write ability, average write delay, average read delay and leakage power consumption in hold mode have been summarized in Table 3.14. The proposed memory cells exhibit improvement in performance over C6T.
Phase change memory (PCM) functions by thermally induced phase change of chalcogenide material, typically from disordered highly resistive amorphous phase with short range atomic order and low free electron density, to a low resistance crystalline phase with long range atomic order and high free electron density, or vice versa [1,2]. PCM is one of the potential emerging nonvolatile memory (NVM) technologies to replace flash memory and be the technology for storage class memory due to its desirable properties such as short access time, long data retention, high endurance, scalability, CMOS compatibility and multibit storage [3-8]. Hence it is time to have an accurate electrical model of the PCM in order to realise a straightforward and timely implementation of PCM in an integrated circuit. This chapter presents the electrical circuit model of multibit PCM cell that accurately simulates the temperature profile, the crystalline fraction and the resistance of the cell as a function of the programming pulse. Also, the precise modelling of the drift phenomenon of resistance and threshold voltage at the amorphous phase is presented. The presented model's I-V characteristics are correlated with experimental data to demonstrate the validity of the developed PCM model. Next this chapter presents the analysis of PCM cells on a nanocrossbar as a memory system. The effect of connecting wires resistance in the performance of the PCM array structure, the amount of energy lost across each PCM cell and programmed state of the PCM cell is also discussed. It has been shown that the energy consumed in connecting wires decreases the power supplied to PCM cells thus resulting in higher programmed low resistive state (Rcrystalline). Additionally, methods to mitigate the programmedRcrystalline reliability issue are discussed in detail. Finally, the chapter concludes with the discussion on PCM-based memory application in implementing a logic function using the look-up-table (LUT), that is, PCM-based LUTs.
As the transistor size scales down exponentially to nanometric dimensions, the susceptibility of electronic circuits to radiation increases drastically. Protection against the radiation is important in the field of biomedical, aerospace, communication and computing. Flip-flops (FFs) and static random access memories (SRAMs) are used to store the data in many critical applications where their performance must be resilient to radiation exposures to guarantee reliability. Therefore development of resilient FFs and SRAM are the challenging and demanding problems. In this chapter, different approaches are analysed to design these radiation hard circuits.
In modern SOC design SRAM has become an integral part owing to its capability to form a bridge and overcome the speed mismatch problem between the high speed processor and the low speed data storage devices. Because of the read and write operation of SRAM cell having conflicting transistor sizing requirement, it is very difficult to maintain the transistor size to satisfy both the needs. The destructive nature of read operation enforces a serious thought about SRAM cell data stability. Transistor sizing and cell stability already being a critical problem becomes even more critical when we consider the process and temperature variation. Thus the SRAM should be designed with keeping the process and temperature variation in mind. The random fluctuation in device parameters such as transistor width, length, oxide thickness, oxide capacitance and doping concentration leads to variation in the threshold voltage and other transistor characteristics. The change in these transistor characteristics alters the SRAM cell performance. Hence this should also be taken care of to ensure the cell performance to be in the desired range even in presence of random fluctuation during fabrication process. The various performance measure such as SNM, write SNM, speed and power consumption must be tested under worst process corner as well over a wide temperature range to ensure that they lie in the acceptable range during worst operating condition. Also these parameters must be tested using Monte Carlo simulation to ensure a robust operation in presence of random fluctuation during fabrication process.
VLSI, or Very-Large-Scale-Integration, is the practice of combining billions of transistors to create an integrated circuit. At present, VLSI circuits are realised using CMOS technology. However, the demand for ever smaller, more efficient circuits is now pushing the limits of CMOS. Post-CMOS refers to the possible future digital logic technologies beyond the CMOS scaling limits. This 2-volume set addresses the current state of the art in VLSI technologies and presents potential options for post-CMOS processes. VLSI and Post-CMOS Electronics is a useful reference guide for researchers, engineers and advanced students working in the area of design and modelling of VLSI and post-CMOS devices and their circuits. Volume 1 focuses on design, modelling and simulation, including applications in low voltage and low power VLSI, and post-CMOS devices and circuits. Volume 2 addresses a wide range of devices, circuits and interconnects.
Multicore processors are widely used in today's real-time embedded systems to satisfy the performance and predictability requirements as well as reduce cost. A vast majority of multicore embedded systems are running several tasks with mixed-criticality, in which the non-functional requirements of the tasks are different or even conflicting. A major challenge in mixed-criticality systems is to maximise the efficiency of shared resources while satisfying the criticality requirements. Shared memory is a key component that should be well managed and memory controller plays the main role in this case. Several memory controllers have been introduced in the literature for multicore processors. In this article, the authors performed a deep investigation on three state-of-the-art memory controllers using gem5 full-system simulator and Xilinx ISE Design Suite, and compared them in terms of predictability and performance. Then, the authors proposed a memory controller that provides the same predictability as the most predictable existing controller while improving the performance by 12.3%.
Non-filament 3D vertical RRAM (VRRAM) is a promising technology for emerging high-density memory applications. In this Letter, the layer-dependent resistance variability at both low resistance state (LRS) and high resistance state (HRS) is assessed on state-of-the-art 8-layer 3D VRRAMs. 2048 devices are measured with the aid of an FPGA-controlled relay matrix which enables automated switching of devices without changing the cabling. The results show LRS exhibits little layer dependence while both the average and the standard variation of HRS decrease from the top layer to the bottom layer. A qualitative speculation is proposed to explain the observation based on the high-resolution transmission electronic microscope image. This work is beneficial for future 3D VRRAM circuit design and performance improvement.
This chapter presents a frame memory compression method used in video coding. Frame memory compression compresses the data to be stored in the frame memory in order to reduce the external bandwidth and the related power consumption, as shown in Figure 7.1. When the pixels of a motion-compensated frame have to be written into external DRAM, the frame memory compression engine compresses those pixels. During motion estimation (ME), the frame memory compression engine decompresses the compressed pixels of previous frames and passes them to the video codec. As shown in Figure 7.2, most of the frame memory compression algorithms are composed of three stages: prediction, entropy coding and memory organization [1], which are respectively introduced in Sections 7.1, 7.2 and 7.3.
This article presents a review of physical, analytical, and compact models for oxide-based RRAM devices. An analysis of how the electrical, physical, and thermal parameters affect resistive switching and the different current conduction mechanisms that exist in the models is performed. Two different physical mechanisms that drive resistive switching; drift diffusion and redox which are widely adopted in models are studied. As for the current conduction mechanisms adopted in the models, Schottky and generalised hopping mechanisms are investigated. It is shown that resistive switching is strongly influenced by the electric field and temperature, while the current conduction is weakly dependent on the temperature. The resistive switching and current conduction mechanisms in RRAMs are highly dependent on the geometry of the conductive filament (CF). 2D and 3D models which incorporate the rupture/formation of the CF together with the variation of the filament radius present accurate resistive switching behaviour.
Increasing demands to process large amounts of data in real time leads to an increase in the many-core microprocessors, which is posing a grand challenge for an effective and management of available resources. As communication power occupies a significant portion of power consumption when processing such big data, there is an emerging need to devise a methodology to reduce the communication power without sacrificing the performance. To address this issue, we introduce a cognitive I/O designed toward 3D-integrated many-core microprocessors that performs adaptive tuning of the voltage-swing levels depending on the achieved performance and power consumption. We embed this cognitive I/O in a many-core microprocessor with DRAM memory partitioning to perform energy saving for application such as fingerprint matching and face recognition.
A novel offset-cancelling current-mode sense amplifier (OC-CSA) to improve the sensing speed of embedded flash (eFlash) memory is proposed. To reduce the precharge time related to random address transitions, this OC-CSA adopts a new precharge acceleration scheme. Moreover, by employing offset cancellation, double sensing margin and strong positive feedback techniques, the new OC-CSA can achieve a sensing speed 1.6 × faster than other OC-CSAs. A 130 nm 1 Mb eFlash macro to confirm the high-speed capability of the proposed OC-CSA is fabricated. The experiment results show the new OC-CSA can achieve a read access time of 11 ns at VDD = 1.3 V.
This study proposes a single bit-line and disturbance-free static random-access memory (SRAM) cell for ultra-low voltage applications. SRAM cell with read-decoupled and cross-point structure addresses both the read-disturb and half-select stability issues; nevertheless, the write-ability is degraded due to the stacked pass transistors. In this study, the authors propose a single-ended 8T bit-cell and dual word-line control technique that can simultaneously improve the read stability, half-select stability, and write-ability without additional peripheral circuits, which is advantageous for bit-interleaved ultra-low voltage operations. A 4 kb test chip was implemented in a 90 nm complementary metal–oxide–semiconductor process to verify the proposed design. Silicon measurements indicate that the proposed design can operate at a voltage as low as 360 mV with 2.68 μW power consumption.
Due to the Von Neumann bottleneck, in-memory-computing, as a new architecture, has drawn considerable attention and is becoming an candidate of next generation electronics system. It presents an inmemory-computing approach for multiplier design based on Multilevel-cell (MLC) of Resistive random access memories (RRAMs). The paper proposes a Look-up-table (LUT) operations to optimize the speed, area and power of the multiplier circuits. The proposed MLC function of RRAM revealed that RRAM could have a multilevel stable resistance by adjusting the operating voltage. The simulation results show that, taking a 16-bits multiplier as an example, the circuits of this paper has a calculation speed thatis increased by 35.7 percent and an area that is decreased by 14 percent under the similar power consumption conditions when compared with other traditional 16-bits multiplier.
In high-density three-dimensional (3D) memory technology, a stacking method is used to create memory devices and access devices at the intersections of bit lines and word lines. For this application, access devices should have a high on/off ratio, high current density for writing cycles, and high endurance. Consequently, an arsenic–tellurium–germanium–silicon nitride compound (AsTeGeSiN) threshold switching device with a high current density of 104 A/cm2 above the threshold voltage (V th) is reported as a good candidate for use in access devices. In addition, scaling down of access devices as well as memory devices is essential for high-density 3D memories. However, in AsTeGeSiN threshold switching devices, fast degradation by pulse cycling in smaller devices is observed. To find the main cause of fast degradation by pulse cycling in smaller devices, the low-frequency noise properties are examined. The rapid increase in the trap density (N T) in small devices is the main cause of fast degradation by pulse cycling in AsTeGeSiN devices. On the basis of this evaluation, the author examines the effect of annealing temperature and annealing time on the pulse endurance in smaller devices. Using an annealing temperature of ∼600°C improves the cycling endurance of smaller devices.
High performance of GPGPU comes from its super massive multithreading, which makes it more and more widely used especially in the field of throughputoriented. Data locality is one of the important factors affecting the performance of GPGPU. Although GPGPU can exploit intra/inter-warp locality by itself in part, there is still large improvement space for that. In our work, we analyze the characteristics of different applications and propose memory request based warp scheduling to better exploit inter-warp spatial locality. This method can make some warps with good inter-warp locality run faster, which is beneficial to improve the whole performance. Our experimental results show that our proposed method can achieve 24.7% and 11.9% average performance improvement over LRR and MRPB respectively.
NAND flash memories, due to their several advantageous characteristics, have recently dominated the data storage industry and its global market. Currently, multi-level cell memories, in which each cell can store more than one bit of data resulting in higher data storage capacities, have gained a considerable amount of research interest. However, this comes at the cost of several limitations and increased performance degradation. Various studies have shown that among several error sources in multi-level cell memories, inter-cell interference is the most significant one. Therefore, to mitigate the devastating effect of the interference, simple, feasible, and yet efficient equalisation techniques become essential for achieving desired data reliability. In this study, first, a thorough analysis on deriving the distribution of the interference-free and interference-affected data is carried out. Then, novel low-complexity equalisation methods are proposed, and their beneficial complexity-performance trade-offs compared with the existing techniques are illustrated. Finally, simulation results are presented to show that the proposed algorithms considerably improve the error performance, while maintaining the low-complexity constraints.
This study compares the performance and reliability of classical complementary metal-oxide-semiconductor (CMOS) gates with Schmitt trigger (ST) ones. The ST hysteresis, caused by the added positive feedback transistors, improves the design static noise margin (SNM) and offers noise immune operation. Hence, ST-based circuits are expected to operate more reliably than the ones implemented using classical CMOS. Although many research papers have been focused lately on using ST design concepts for implementing more reliable static random access memory (SRAM) cells, significantly less work was devoted to the application of ST concepts in the combinatorial logic domain. Moreover, available research on ST-based logic gates had only focused on the low-voltage/power applications range. The authors are going to look at the whole voltage range and performance spectrum to compare and understand not only the SNMs and the power consumption (at different frequencies and voltage levels) but also the delay and the power-delay-product of ST-based logic gates. These will be compared with classical CMOS as well as with optimally sized CMOS and ST-based logic gates. This study should give a clear picture of the potential advantages ST could offer for combinatorial logic in advanced CMOS technology nodes and of their application range.
This work investigates the delay performance of junctionless silicon nanotube (JLSiNT) field-effect transistor (FET) based 6T SRAM cell. The study demonstrates that the delay performance of symmetric drain/source DS-JLSiNT FET (inner gate covers drain, channel, and source regions) based 6T SRAM gets improved when the inner gate of nanotube covers only either drain and channel regions (D-JLSiNT FET) or source and channel regions (S-JLSiNT FET) because of improved I on/C gg. The improvement in read (write) access time is ∼22% (17%) and ∼9% (20%) when DS-JLSiNT FET is replaced by D-JLSiNT FET and S-JLSiNT FET, respectively, in DS-JLSiNT FET based 6T SRAM. Furthermore, due to partial covering of inner gate, the gate electrostatic integrity is reduced which decreases the ratio of on-current to off-current (I on/I off) resulting in degraded static noise margin (SNM). However, the deterioration in write SNM, hold SNM, and read SNM are almost minimal (∼0.3, 0.9, and 2%, respectively) for S-JLSiNT FET based SRAM as compared to DS-JLSiNT FET based SRAM. However, the deterioration in SNMs is aggravated for D-JLSiNT FET based SRAM as compared to DS-JLSiNT FET based SRAM. Thus, S-JLSiNT FET is the best configuration for designing of JLSiNT FET based 6T SRAM cell.
In static random access memory (SRAM), some cells are not selected for writing, but due to the distribution of the word line signals in the SRAM array, their word line signal is activated. Therefore, they may be mistakenly written. Such cells are called half-selected cells. This study presents two schemes, one for single-ended and the other for differential sensing SRAMs, to eliminate the half-selection disturbance. In the first proposed scheme, the content of the desired row of the SRAM array is read before the write operation and is written back on the corresponding write bitlines. This operation results in eliminating the possibility for noise to be written onto the half-selected cells. In the second scheme, a simple read operation is performed before the write operation. The authors applied their half-selection resilient schemes to 8 and 6 T SRAMs. Simulation results show that in the presence of radioactive particles, by applying their write-back scheme to 8 T SRAM and their read-before-write scheme to the conventional 6 T SRAM, the failure rate is reduced from an average of 56 and 20%, respectively, to 0. The proposed schemes do not degrade write-ability of the SRAM cells, and are bit-addressable. Moreover, their proposed schemes consume smaller amounts of power compared with their rivals.
Generalised concatenated (GC) codes are well suited for error correction in flash memories for high-reliability data storage. The GC codes are constructed from inner extended binary Bose–Chaudhuri–Hocquenghem (BCH) codes and outer Reed–Solomon codes. The extended BCH codes enable high-rate GC codes and low-complexity soft input decoding. This work proposes a decoder architecture for high-rate GC codes. For such codes, outer error and erasure decoding are mandatory. A pipelined decoder architecture is proposed that achieves a high data throughput with hard input decoding. In addition, a low-complexity soft input decoder is proposed. This soft decoding approach combines a bit-flipping strategy with algebraic decoding. The decoder components for the hard input decoding can be utilised which reduces the overhead for the soft input decoding. Nevertheless, the soft input decoding achieves a significant coding gain compared with hard input decoding.
Graphics processing units (GPUs) are playing more important roles in parallel computing. Using their multi-threaded execution model, GPUs can accelerate many parallel programmes and save energy. In contrast to their strong computing power, GPUs have limited on-chip memory space which is easy to be inadequate. The throughput-oriented execution model in GPU introduces thousands of hardware threads, which may access the small cache simultaneously. This will cause cache thrashing and contention problems and limit GPU performance. Motivated by these issues, the authors put forward a locality-protected method based on instruction programme counter (LPC) to make use of data locality in L1 data cache with very low hardware overhead. First, they use a simple Program Counter (PC)-based locality detector to collect reuse information of each cache line. Then, a hardware-efficient prioritised cache allocation unit is proposed to coordinate data reuse information with time-stamp information to predict the reuse possibility of each cache line, and to evict the line with the least reuse possibility. Their experiment on the simulator shows that LPC provides an up to 17.8% speedup and an average of 5.0% improvement over the baseline method with very low overhead.
Fast search block matching algorithm (BMA)-based video coding provides reasonable good quality video with minute cost of computation. In fast BMA, clock cycles required to read pixel data are quite more compared with matching operation due to erratic location of candidate macroblocks (CMBs). With aim of reduction in number of clock cycles, parallel memory system is used in this study, which can accelerate reading of CMBs and speedup motion vector (MV) computation. Novel concept of register array is introduced to organise CMBs, which expedite computation hungry search process. Owing to shape of register array, lesser space is needed to store CMBs and architecture addresses wide range of search patterns. The proposed sum of absolute difference processor with parallel memory system computes MV of 1 macroblock in 28 clock cycles in average case. Compared to single memory system, it saves 68% and 80% clock cycles in CMB access of initial search and intermediate search process, respectively. Hardware architecture is tested with Xilinx Virtex5 field programmable gate array. The proposed fixed 8×8 macroblock size architecture processes 354 high definition (HD) (1080p) frames per second (fps) and configurable architecture processes 201 HD fps which is more than adequate for real-time encoding.
An open-loop per-pin skew compensation with lock fault detection is presented. The proposed circuit employs an open-loop reference selector, a two-stage open-loop delay lock method which is separated by a coarse and fine lock for fast lock-in time, and a fault lock detecting scheme to prevent lock fault by dead zone of samplers. A unidirectional scan method ahead the fine lock stage to minimise pin-to-pin skew errors after calibration is also applied. The circuit was fabricated with 55 nm CMOS technology with a 1 V supply voltage and an area of 0.0036 mm2 for one de-skewing module. The measured result shows that the skew error at 1 GHz operation was reduced to <6 ps after skew calibration when the skew between input/output (IO) pins was 230 ps, and the lock-in time was 11 clock cycles.
We measured the impact of the thermoelectric effect, especially the Peltier effect, upon the operation of phase change memory (PCM) in which the contact resistance between the phase change material and the electrode dominates the total resistance. A PCM device having a pillar structure of diameter 500 nm was fabricated using GeCu2Te3 (GCT) material. During read operation, the set state (with crystalline PCM) showed ohmic contact between the PCM and the electrode, whereas the reset state (with amorphous PCM) showed Schottky contact. The Schottky contact between the amorphous GCT and the electrode showed a bias polarity dependence of set operation owing to the Peltier effect, which is one of the thermoelectric effects. As PCM devices scale down to the nanometre scale, research on contact resistance and various related effects will become more important.
Solid state drives (SSDs) achieve a significantly better performance than hard disks by internally implementing channel level, chip level, and die level parallelism. However, the plane level parallelism has not been sufficiently exploited, because of the constraint that the target pages of the planes must be in the same location. To overcome this constraint, a policy that enforces the multi-plane operation by matching the position of the target pages while wasting clean pages is proposed. However, this policy excessively increases the number of block erasures, which leads to reducing the stability and lifetime of the SSD. To solve this problem, this Letter proposes a policy that determines whether to perform the multi-plane operation considering the number of wasted clean pages. The performance evaluation using representative server workloads shows that the proposed policy improves an average performance by up to 28.82% over the policy that does not perform the multi-plane operation, without significantly increasing the number of block erasures.
This Letter proposes an on-chip data strobe transmission circuit for dynamic random access memory (DRAM). The on-chip differential repeaters with cross-coupled latches are adopted to prevent the sampling margin reduction. A node monitoring circuit has been proposed to prevent short-circuit currents of the on-chip differential repeaters and cross-coupled latches caused by high impedance inputs. When compared with the conventional differential signalling, the proposed circuit can save the short-circuit current of 6.2 mA per a single write operation. The chip has been fabricated in 350 nm CMOS technology and the active chip area is 0.189 mm2.
A novel write bitline (BL) charge sharing write driver (CSWD) and a half- read BL (RBL) pre-charge scheme is presented for a single-ended 8T static random access memory (SRAM). Before write enable (WE) signal assertion, CSWD equalises the write BLs by allowing their charge sharing. Both write BLs are equalised at the middle value of supply voltage using leakage current compensation block. Afterwards, as WE signal is asserted, CSWD produces the rail-to-rail levels at write BL pair. Charging of a BL from half- to essentially reduces the write dynamic power dissipation by 50%. Half- pre-charging is used for RBL to achieve low-power read operation. Read port is powered by virtual ground rail to improve the RBL leakages. The authors compare the proposed 8T design (P8T) with conventional 6T (C6T) and 8T (C8T) designs in a 45 nm technology node. Write power dissipation is reduced by 42% and dynamic read power is reduced by more than 39%. Overall leakages are reduced by more than 18% compared with C6T and ratio of the RBL is improved by more than two orders of magnitude compared with conventional 8T (C8T).
Among resistive random access memory (RRAM) architectures, one transistor one memristor (1T1R) crossbar is the most fledged one. For 1T1R crossbar, a logic operation-based Design for Testability and parallel test algorithm, which is an improvement of March C*-1T1R test algorithm, are proposed. The pass-fail fault dictionary of the proposed test algorithm is analysed. Analytical results show that the proposed test algorithm can detect all the modelled faults caused by the parametric variation of memristors and traditional RAM. Compared with March MOM, March C* and March C*-1T1R, the test time of the proposed test algorithm is reduced with a little area overhead for a large size crossbar.
Shifting market trends towards mobile, Internet of things, and data-centric applications create opportunities for emerging low-power non-volatile memories. The attractive features of spin-torque-transfer magnetic-RAM (STT-MRAM) make it a promising candidate for future on-chip cache memory. Two-bit multiple-level cell (MLC) STT-MRAMs suffer from higher write energy, performance overhead, and lower cell endurance when compared with single-level counterpart. These unwanted effects are mainly due to write operations known as two-step (TT) and hard transitions (HT). Here, the authors offer a solution to tackle write energy problem in MLC STT-MRAM by minimising the number of TT and HT transitions. By analysing real applications, it was observed that specific locations within a cache block undergo much more TT and HT transitions resulting in hot locations when compared with other ones (cold locations). These hot locations are more detrimental to the lifetime and reliability of MRAM device. In this work, the authors propose a simple and intuitive dynamic encoding scheme that eliminates all TT and HT at hot locations, hence reducing energy consumption and improving MLC STT-MRAM lifetime. Results on PARSEC benchmarks demonstrate the effectiveness and scalability of the proposed approach to potentially prolong MLC STT-MRAM lifetime.
A building block for computing in memory systems is introduced. Based on the previously introduced racetrack memory proposed by IBM, a racetrack memory is used not only to store data but also to perform bitwise majority-based computations by coupling the memory with inputs provided by controllable magnets. This solution is defined as racetrack logic. Micromagnetic simulations are used to confirm that the proposed solution is technically viable.
This Letter proposes a new scheme to eliminate the bit-line leakage current of static random access memory. The proposed scheme utilises a four-input sense amplifier to amplify the voltages of self-compared bit-line pairs. The bit-lines of the proposed structure have no series capacitances and are directly connected to the sense amplifier input. By this way, read delay and error caused by the leakage current of bit-lines will be eliminated. Simulation results in SMIC 28 nm CMOS process design kits show that the proposed scheme has better stability and can decrease delay time by 41.1% at 0.9 V supply voltage compared with the X-Calibration technology.
High-density and high-speed charge-trapping AND flash memory array is fabricated for the first time. A reliability of 104 endurance cycles and uniform program/erase characteristics along with a threshold voltage window >3 V is obtained. The AND array has several advantages, such as high read current drivability regardless of the number of word-lines, immunity to back-pattern dependency, and fast bit-sensing speed based on a parallel connected cell array structure, which are highly appropriate for three-dimensional (3D) stacking. Finally, a novel 3D stacked vertical-AND array is proposed to surpass the limitations of the conventional 3D NAND flash memories.
Fin field-effect transistors (FinFETs) are replacing the traditional planar metal–oxide–semiconductor FETs (MOSFETs) because of superior capability in controlling short channel effects, leakage current, propagation delay, and power dissipation. Planar MOSFETs face the problem of process variability but the FinFETs mitigate the device-performance variability due to number of dopant ions. This work includes the design of static-random access memory (SRAM) cell using FinFETs. The performance analysis of the ST11T, proposed ST13T SRAM cell, and with power gating sleep transistors is given in this study using the Cadence Virtuoso Tool (V.6.1). Owing to its improved gate controllability and scalability, the FinFET transistor structure is better than the conventional planar complementary MOS technology. The proposed design aims at the power reduction and speed improvement for the SRAM cell. From the result it is clear that optimised proposed FinFET-based ST13T SRAM cell is 92% more power efficient with the use of power gating technique, i.e. sleep transistors approach and having 12.84% less delay due to the use of transmission gates in the access path.
Spin-torque transfer RAM (STT-RAM) is a promising candidate to replace SRAM for larger Last level cache (LLC). However, it has long write latency and high write energy which diminish the benefit of adopting STT-RAM caches. A common observation for LLC is that a large number of cache blocks have never been referenced again before they are evicted. The write operations for these blocks, which we call dead writes, can be eliminated without incurring subsequent cache misses. To address this issue, a quantitative scheme called Feedback learning based dead write termination (FLDWT) is proposed to improve energy efficiency and performance of STT-RAM based LLC. FLDWT dynamically learns the block access behavior by using data reuse distance and data access frequency, and then classifies the blocks into dead blocks and live blocks. FLDWT terminates dead write block requests and improves the estimation accuracy via feedback information. Compared with STT-RAM baseline in the lastlevel caches, experimental results show that our scheme achieves energy reduction by 44.6% and performance improvement by 12% on average with negligible overhead.
Spintronic memory [spin-transfer torque-magnetic random access memory (STT-MRAM)] is an attractive alternative technology to CMOS since it offers higher density and virtually no leakage current. Spintronic memory continues to require higher write energy, however, presenting a challenge to memory hierarchy design when energy consumption is a concern. This study motivates the use of STT-MRAM for the first-level caches of a multicore processor to reduce energy consumption without significantly degrading the performance. The large STT-MRAM first-level cache implementation saves leakage power. Moreover, the use of small level-0 cache regains the performance drop due to STT-MRAM long write latencies. The combination of both reduces the energy-delay product by 65% on average compared with CMOS baseline. The proposed STT hierarchy also shows good scalability over the CMOS with a few benchmarks which scale significantly better. The PARSEC and Splash2 benchmark suites are analysed running on a modern multicore platform, comparing performance, energy consumption and scalability of the spintronic cache system to a CMOS design.
Three kinds of parameters are considered to speed up the read operation of phase change memory: bit line parasitic parameters, read transmission gate parasitic parameters and current mirror parasitic parameters. A set reference cell and a reset reference cell are used in the reference circuit. Simulated in 130 nm process, the read access time of 1-Mb phase change memory (PCM) is 6.7 ns. In Monte Carlo simulations, the worst read access time is 13.8 ns compared to conventional 85 ns.
A new pathway to design floating gate quantum dot (QD) non-volatile RAM (QDNVRAM) cells that possess high-speed low-voltage Erase capabilities not possible with conventional floating gate NV memories is presented. This is achieved by directly accessing the QD floating gate layer with an additional drain (D2) during the Erase operation. Experimental data on fabricated long-channel (10 μm/14 μm) QDNVRAM cell shows ‘Erase’ pulse duration of ∼4 μs at voltage of about 10 V using drain D2 which is over two-order smaller than the ‘Write’ pulse value. Quantum mechanical simulations are also presented. QDNVRAM fabrication process is compatible with CMOS processing.
Recently, multi-level cell (MLC) spin-transfer torque random access memories (STT-RAMs) are attracting great attentions as an alternative to static or dynamic random access memories. They have the benefits of capacity, but the penalties of performance, and power consumption caused by a complicated two- or three-phase access. An MLC STT-RAM controller that eliminates the MLC STT-RAM penalties for multimedia applications is proposed. The key ideas are frame-level data-to-memory mapping and frame-type aware frame assignment techniques that make a two- or three-phase access no longer required. Experimental results show that the proposed MLC STT-RAM controller achieves 56.1% higher memory performance, and 4.2% lower memory power consumption than the conventional controller for industrial multimedia applications.
This work investigates the effect of channel engineering on the short channel performance of considered sub-20-nm 3D NAND flash memory. Here, the threshold voltage roll-off (ΔV th), subthreshold swing and drain induced barrier lowering metrics is studied to evaluate the short channel effects (SCEs) for the examined device. The effect of variation in doping density on SCEs of proposed channel engineered NAND flash memory is also studied. Based on the observation, a thin layer of high doping concentration in the centre of the channel, covering 25% of channel area, has been found to improve the SCE of NAND flash memory compared with the device with uniform channel doping while maintaining sufficient drive current.
Soft errors in semiconductor memories occur due to charged particle strikes on sensitive nodes. Technology and voltage scaling increased dramatically the susceptibility of static random access memories (SRAMs) to soft errors. In this study, the authors present AS8-SRAM, a new asymmetric memory cell that enhances the soft error resilience of SRAMs by increasing the cells critical charge. They run Simulation Program with Integrated Circuit Emphasissimulations and system level experiments to validate the AS8-SRAM cell characteristics at circuit level and evaluate the energy and reliability effectiveness of an AS8-SRAM-based cache memory. The authors’ results show that AS8-SRAM presents up to 58 times less failures in time compared to six-transistor SRAM. Moreover, based on embedded benchmarks experimentations, AS8-SRAM achieves up to 22% reduction in energy-delay product without any considerable loss in performance.
In most embedded microprocessor based System on chips (SoCs), cache has become a major source of power consumption due to its increasing size and high access rate. Power optimization of cache based on Compare-based adaptive clock gating (CACG) is proposed to reduce the power waste due to cache idle. By detecting the cache's working state, the CACG can automatically turn off its clock when it is in idle state, saving a large percentage of dynamic power. Measurements of a real SoC chip fabricated under TSMC 65nm CMOS process show that an average of 30.3% power reduction is gained in Dhrystone test benchmark at a cost of negligible area overhead and no virtually performance loss.
A fuzzy static RAM (SRAM) is proposed, which is applicable in fuzzy logic and many multiple-valued logic (MVL) applications. The new structure is basically an extension to the binary SRAM cell. Two cross-coupled voltage mirror circuits are used to be able to hold an arbitrary voltage value. The proposed design forms a robust and reliable structure, which is capable of operating with more than 95% accuracy in spite of imperfect fabrication of carbon nanotube FETs. Another exceptional advantage is its ultra-low-power consumption in MVL environments. It consumes 38.7 and 99% less static power compared with the SRAMs with regular ternary and quaternary components, respectively.
Software Programmable Memories, or SPMs, are raw on-chip memories that are not implicitly managed by the processor hardware, but explicitly by software. For example, while caches fetch data from memories automatically and maintain coherence with other caches, SPMs explicitly manage data movement between memories and other SPMs through software instructions. SPMs make the design of on-chip memories simpler, more scalable, and power efficient, but also place additional burden for programming of SPM-based processors. Traditionally, SPMs have been utilised in embedded systems, especially multimedia and gaming systems, but recently research on SPM-based systems has seen increased interest as a means to solve the memory scaling challenges of many-core architectures. This study presents an overview of the state-of-the-art in SPM management techniques in many-core processors, summarises some recent research on SPM-based systems, and outlines future research directions in this field.
This study introduces an inexact, but ultra-low power, computing architecture devoted to the embedded analysis of bio-signals. The platform operates at extremely low voltage supply levels to minimise energy consumption. In this scenario, the reliability of static RAM (SRAM) memories cannot be guaranteed when using conventional 6-transistor implementations. While error correction codes and dedicated SRAM implementations can ensure correct operations in this near-threshold regime, they incur in significant area and energy overheads, and should therefore be employed judiciously. Herein, the authors propose a novel scheme to design inexact computing architectures that selectively protects memory regions based on their significance, i.e. their impact on the end-to-end quality of service, as dictated by the bio-signal application characteristics. The authors illustrate their scheme on an industrial benchmark application performing the power spectrum analysis of electrocardiograms. Experimental evidence showcases that a significance-based memory protection approach leads to a small degradation in the output quality with respect to an exact implementation, while resulting in substantial energy gains, both in the memory and the processing subsystem.
Hereby a novel thin film-based configuration of redox resistive switching memory (ReRAM) based on cheap and abundant copper sulphide (CuS) is reported. The devices working mechanism is based on the junction of two layers of CuS stacked nanocrystal with different stoichiometry (CuS and Cu2−x S). CuS thin films were deposited using a fast, easy and low-temperature drop-casting technique. The devices shown memresistive characteristics, with well-defined ON and OFF resistance states, inducible by voltage pulses. A polynomial model has been proposed to characterise the devices considering both space-charge-limited current and ionic diffusion.
Harmonic resistive multipliers that exploit the unique current–voltage (I–V) characteristics of Ag/Ge–(Sb)–Te/Pt resistive-switching devices are demonstrated. For a Ge17Sb29Te54-based device, an antisymmetric non-linear I–V curve with a hump-like structure at ±0.4 V is obtained, whereas for a Ge51Te49-based device, an asymmetric non-linear I–V curve with SET switching at +0.4 V and RESET switching at –0.1 V is observed. The Ge17Sb29Te54-based device performs third harmonic multiplication for a 160 MHz input at 0 dBm, and sixth harmonic multiplication at 5 dBm under unbiased conditions without any matching circuit. For the latter device, biasing at a voltage of ±0.4 V leads to fifth harmonic multiplication, which is absent for a 0 dBm input under unbiased conditions. No harmonic multiplication is observed for an unbiased Ge51Te49-based device due to its high resistance, but biasing at the switching voltage of 0.4 V leads to fourth harmonic multiplication for a 0 dBm input. The unique non-linear characteristics of these devices suggest their potential for radio frequency applications.
At nanoscale grain boundaries and surface scattering effects lead to an increase of connecting wires electrical resistivity with decreasing wire dimensions. This increase of resistivity leads to significant power loss across connecting wires in nanocrossbars. In this study, the resistance of connecting wire as a function of material properties and feature size is calculated. Then the effect of the connecting wires resistance in phase change memory (PCM) performance in PCM-based passive nanocrossbar was evaluated. The performance metrics tested are: programmed resistance levels, programming duration, and energy consumption. Based on the simulation results, it was found that the power consumed in connecting wires decreases the power supplied to PCM cells. This reduction in power results in higher programmed low resistive state (R ON). The effect of connecting wire resistance on PCM performance is studied as a function of the wire size, cell position on the nanocrossbar, and nanocrossbar size. Simulation results showed that the programmed R ON is inversely proportional to feature size. Moreover, it increases up to almost 40%, with decreasing feature size to 40 nm. Moreover, programmed R ON increases proportionally with increasing nanocrossbar size. Moreover, R OFF/R ON ratio drops almost 90% of targeted ratio at 1 kbit nanocrossbars. Furthermore, cells closer to supply sources are the least affected by wire resistance, while cells furthest from supply are the most affected. Finally, at the end of this study two methods are suggested to resolve the programmed R ON reliability issue caused by energy drop across connecting wires.
This Letter presents a self-controlled physical unclonable function (PUF) circuit and its application in encrypting on-chip memories of IC bank cards. The PUF circuit is based on cross-coupled NAND gates. Voting and Hamming code address the stability of its outputs. The Monte Carlo simulation and field-programmable gate array board are used for verification. The Voting method improves the error rate 79%. The Hamming codes and convertor can correct every error bits of PUF outputs. The PUF outputs, data address and time-stamp are encrypted by SM4 (Chinese block cipher algorithm standard) to generate KEY. The data does XOR with the KEY.
The gradual erasing operation from reset state to set state adjusting pulse amplitude, duration time and falling time respectively in phase change device using Ge1Cu2Te3 is investigated. For this procedure, a relatively high voltage and increased falling time, which was able to produce both long-term potential and long-term depression in the time interval between pre-spike and post-spike is choosing. The results suggested that the presence of synaptic behaviour was due to controlled falling time rather than pulse amplitude.
To simplify power-gating requirements in ultra-low-power architectures, design strategies for low-power non-volatile flip-flops (F/Fs) are sought, for which the utilisation of spintronic devices offers a promising option. A D F/F that utilises a five-terminal spintronic device for non-volatile state holding in an intrinsically self-complementing fashion is introduced. This self-complementing device can be exploited to reduce overhead interfacing circuitry in order to realise a compact ten transistors with one spintronic device D F/F with instant store and restore functionality, while consuming <9 µW of power.
Caches are used to improve memory access time and energy consumption. The cache configuration which enables the best performance often differs between applications due to diverse memory access patterns. The authors present a new concept, called switchable cache, where multiple cache configurations exist on chip, leveraging the abundant transistors available due to what is known as the dark silicon phenomenon. Only one cache configuration is active at any given time based on the application under execution, while all other configurations remain inactive (dark). They describe an architecture to enable seamless integration of multiple cache configurations, and a novel design space exploration methodology to rapidly pre-determine the optimal set of configurations at design-time, for a given group of applications. For design spaces containing trillions of design points, the authors’ exploration methodology always found the optimal solution in less than 2 s. The switchable cache improved memory access time by up to 26.2% when compared to a fixed cache.
A ferroelectric-gated graphene field-effect transistor was fabricated by consecutively stacking two distinct graphene–ferroelectric hybrid ribbons at right angles. Two graphene layers play different roles. One graphene layer acts as a gate electrode and the other graphene layer acts as a channel between two electrodes, source and drain. Electric gating at the gate graphene modulates the resistance of the channel graphene. By means of ferroelectric polarisation, bistable resistance states of the channel graphene could be recorded, and the retention time of bistability was estimated to be 460 days by extrapolating of two resistance values in time–resistance relationships. Furthermore, the underlying concept to fabricate bistable memory device was extended to the methodology to realise a logic-gate device by stacking three distinct graphene–ferroelectric hybrid ribbons.
A multilayer (ML)-based resistive RAM (RRAM) is proposed to achieve the performance of logic and memory application. Multilevel resistive switching characteristic of ML-based RRAM is demonstrated in aspect of data storage application. An innovative method is proposed to simultaneously achieve quaternary addition and data storage application, resistance serves as physical state variable instead of voltage or charge in the computing system. An instance of quaternary non-volatile addition is demonstrated in an experiment, which shows the ML-based RRAM is promising for non-volatile computational application as well as higher integration capability.
Novel two-transistor embedded memory – floating body gate cell – is implemented on planar SOI CMOS technology without adding extra masks. Since channel current is designed for memory cell write operations, this cell demonstrates ultra-fast write speed which is comparable with static RAM cell. The decoupled write and read structure ensures small operation power consumption and avoid false read. The low operation voltages of this cell lead to the excellent endurance performance. In addition, retention time is greatly enhanced due to the gate-to-drain underlap design.
As the scale of graphene-based non-volatile memory is reduced, the ratio of access resistance R A to total channel resistance R TOT is increased. To investigate the effect of the R A on I–V characteristics, we fabricated devices with various access lengths L A and self-aligned structure. Proposed structure using self-aligned gate minimises L A, and thereby improves the drain current, ‘on/off’ current ratio I ON/I OFF and transfer characteristics. In proposed structure, ‘off’ current is increased from 0.16 to 0.28 mA because R TOT was reduced; ‘on’ current increased from 0.35 to 0.72 mA, but I ON/I OFF increased from 2.18 to 2.57. Proposed structure also had larger memory window (8.5 V) than did conventional devices (6.7 V).
An inverted bit-line sense amplifier (BLSA) equipped with offset compensation capability for low-power DRAM applications is proposed. The sequential operation of the inverted BLSA allows us to eliminate the edge dummy array in an open bit-line structure resulting in 1.7% less total chip area despite of 10% area penalty of the proposed BLSA occupied by extra switches. For 8-Gb DRAM in 20-nm class technology, the read failure induced by Vth variability is completely removed due to the offset cancellation. The proposed BLSA maintains the gradual increase of the sensing delay when decreasing the power supply down to 0.6 V, while intrinsic read fail prevails below 0.9 V with the conventional one.
The demand for ever smaller and portable electronic devices has driven metal oxide semiconductor-based (CMOS) technology to its physical limit with the smallest possible feature sizes. This presents various size-related problems such as high power leakage, low-reliability, and thermal effects, and is a limit on further miniaturization. To enable even smaller electronics, various nanodevices including carbon nanotube transistors, graphene transistors, tunnel transistors and memristors (collectively called post-CMOSdevices) are emerging that could replace the traditional and ubiquitous silicon transistor. This book explores these nanoelectronics at the circuit and systems levels including modelling and design approaches and issues. Topics covered include self-healing analog and radio frequency circuits; on-chip gate delay variability measurement in scaled technology node; nanoscale finFET devices for PVT aware SRAM; data stability and write ability enhancement techniques for finFET SRAM circuits; low-leakage techniques for nanoscale CMOS circuits; thermal effects in carbon nanotube VLSI interconnects; lumped electro-thermal modeling and analysis of carbon nanotube interconnects; high-level synthesis of digital integrated circuits in the nanoscale mobile electronics era; SPICEless RTL design optimization of nanoelectronic digital integrated circuits; green on-chip inductors for threedimensional integrated circuits; 3D network-on-chips; and DNA computing. This book is essential reading for researchers, research-focused industry designers/developers, and advanced students working on next-generation electronic devices and circuits.
Six-transistor static random-access memory (6T SRAM) cell is the fundamental building block of memory cache in modern microprocessors. Each bit of data is stored in an individual 6T SRAM cell in the memory subsystem. Read data stability and write ability of 6T SRAM cells are degraded with the scaling of CMOS technology. Conventional circuit techniques for achieving wider voltage margins during read and write operations cause significantly larger silicon area and increased power consumption. Several alternative FinFET memory design techniques are presented in this chapter for achieving stronger data stability during read operations and wider voltage margin during write operations without causing area and power consumption overheads in the memory subsystems of microprocessors.
This chapter describes nanoscale FinFET devices and their application in SRAM design. It also discusses variability of nanoscale integrated circuits (ICs) and introduces variability-aware memory design. In the previous two chapters, process variations were discussed for analog and digital ICs. However, this chapter focuses on futuristic memory design. A comprehensive variability including process, voltage and temperature (PVT) variations has been discussed for future SRAM design. After analysing the results of PVT-aware designs, it is found that sensitivity-driven IG-FinFET-based SRAM is the most suitable technique for reliable and high-density memories. The design of SRAM using a post-CMOS device, namely FinFET widely adopted in semiconductor industry has been specifically elaborated.
Operation of the 1-transistor, 1-capacitor dynamic random access memory cell that allows for two-bit operation, double the typical storage capacity, is explored. By using a metal-ferroelectric-semiconductor field-effect transistor, a second bit is captured in the ferroelectric layer polarisation resulting from negative and positive polarisation states. As a result, new modes of operation are created giving non-volatile, long-term storage as well as decreased power consumption and radiation hardening. A typical write and read operating cycle is outlined in-depth and used to verify operation indicating four distinct states representing the two bits. The resulting empirical data gives a comprehensive presentation of the read cycle of the memory cell. Methods for determining the polarisation state of the transistor are also explored and used to determine the average value for measured channel resistance using three types of transistors, each having different channel width and length.
Data retention characteristics are investigated in charge trapping flash memory. The physical root cause of the non-Arrhenius behaviour, which is the general retention characteristic in charge trap flash memories, is numerical modelling that the charge loss mechanism is associated with the trap energy level in the charge storage area. For expression of the charge loss in the relatively shallow traps, multiphonon emission model is adopted. Finally, the ratio of the relatively shallow traps to middle and deep level traps is extracted in a sample data.
Single-error correction, double-error detection (SEC–DED) codes are a type of error-correction codes widely used in electronics to protect memory devices from data corruption. Odd-weight-column SEC–DED codes are a type of these codes where the parity-check matrix is built with every column including an odd number of ones. With this approach, double errors have an even-weight syndrome and can be differentiated from single errors and, consequently, easily detected. There are applications, such as avionics or space, where a multiple error usually affects adjacent cells. Adapting SEC–DED codes to protect against triple-adjacent errors is interesting in these applications. A modification to existing odd-weight-column SEC–DED codes to add triple-adjacent error detection (TAED), creating SEC–DED–TAED codes, is presented. The implementation of the additional triple-adjacent detection logic for these codes can be performed with limited performance and area overhead.
A reliable majority voter circuit using a nanometre spin transfer torque magnetic tunnel junction (STT-MTJ) is presented. The circuit tolerates single transient faults and manages process variations due to technology downscaling. The use of this magnetic device brings non-volatility memory to logic circuits and promises to overcome the rising standby power issue. By using the STMicroelectronics fully depleted silicon on insulator 28 nm design kit and a precise STT-MTJ compact model, electrical simulations have been carried out to show its low-power and high reliability performances.
To increase storage capacity and I/O bandwidth, modern solid-state drives embed multiple NAND packages that consist of one or multiple dies in a parallel architecture. Each die can process NAND read/write/erase operations independently. A dynamic die binding method for write requests that is intended to exploit this parallel processing capability is proposed. This scheme stripes data to idle dies first, and unlike existing dynamic binding schemes, when idle dies are lacking it selects dies with the lowest accumulated write loads, thereby achieving wear levelling by ensuring long-term write load balancing. Thus, it can prevent situations in which some dies are worn out more quickly than others. A performance evaluation demonstrates that our approach offers slightly better performance compared with an existing dynamic binding scheme and completely resolves the problem of imbalanced write loads.
A gain cell embedded dynamic random access memory (eDRAM) with a noble charge injection technique is presented. The gain memory cell is composed of dual-threshold two logic N-type MOSs implemented in a generic triple-well CMOS process. A negative-voltage toggle on the parasitic junction diode formed between the pocket p-well and the cell data node couples up the cell storage voltages. It results in a much enhanced retention time in a compact bit area. Moreover, the technique exhibits much strong immunity from the write disturbance. Measured results at 85°C from a 110 nm 64 kbit prototype eDRAM incorporating the proposed technique demonstrate 69% enhanced retention time and 86% smaller write disturbance loss compared with the conventional one.
With the advent of multiple cores on a single chip, it is common for the systems to have multi-level caches. Multiple levels of cache reduce the pressure on the memory bandwidth by allowing applications to store their frequently accessed data in them. The levels of cache nearer to the core filter the locality in the application access, which can result in high miss rates at farther levels. This piece of study revolves around one question: are all levels of cache needed by all applications during all phases of their execution? The study observes the effect of 2-level and 3-level cache hierarchies on the performance of different applications. On the basis of this study, this study proposes an application aware cache management policy called ‘SkipCache’, which allows an application to choose a 2-level or 3-level cache hierarchy during run-time. SkipCache dynamically tracks the applications at shared last-level cache (LLC) to identify the applications that do not obtain advantage by using the LLC. Such applications can completely skip the LLC so that other co-scheduled cache friendly applications can efficiently use it. Evaluation of SkipCache in a 4-core chip multi-processor with multi-programmed workloads shows significant performance improvement. SkipCache is orthogonal to other cache management techniques and can be used along with other optimisation techniques to improve the system performance.
Currently flash memory is widely used for data storage of mobile communication and automobile electronics. In this study the control program of multiple-execution has been designed to suppress the components and level of electromagnetic interference (EMI) noise generated during the data writing/reading of flash memory. The method designed by us can reduce the EMI noise levels of writing and reading operations by 4.39 dB and 2.91 dB respectively, while reducing the EMI noise components by 58.3% and 54.5% respectively, such that the noise interference by flash memory can be effectively reduced when it is used for various electronic devices. Especially for the automobile electronic systems, where the reduction of EMI noise interference can assure driving safety, and the reception sensitivity can be enhanced by suppressing the interference of RF module resulted from platform noise.
Quantum dot gate field-effect transistor (QDGFET) generates three states in their transfer characteristics. A successful model can explain the generation of third state in the transfer characteristics of the QDGFET. The innovative circuit design using QDGFET can be used to design different ternary logic. This Letter discusses the design of ternary logic static random access memory using QDGFET.
The presence of voltage controlled negative differential resistance was observed in conduction characteristics recorded at room temperature for 300 nm thick spin-coated films of graphene oxide (GO) sandwiched between indium tin oxide (ITO) substrates and top electrodes of sputtered gold (Au) film. The GO crystallites were found from the X-ray diffraction studies to have an average size in the order of 7.24 nm and to be preferentially oriented along (001) plane. Raman spectroscopy suggested that the material consisted of multilayer stacks with the defects being located at the edges with an average distance of 1.04 nm apart. UV visible spectroscopy studies suggested that the band gap of the material was 4.3 eV, corresponding to direct transitions. The two-terminal ITO/GO/Au devices exhibited memristor characteristics with scan-rate dependent hysteresis, threshold voltage and On/Off ratios. A value of >104 was obtained for On/Off ratio at a scan rate of 400 mVs−1 and 4.2 V.
In this paper, a new voltage mirror circuit by using carbon nanotubes (CNTs) technology is presented. This circuit is specifically proposed for the application of duplicating multiple-valued and fuzzy dynamic random access memories. The given structure prevents any voltage drop for the capacitor inside the memory cell. As a result, any fanout circuit can be driven. The new structure can be utilised for different multiple-valued logic systems without a change. The unique characteristics of carbon nanotube field effect transistor (CNFET) technology are exploited in this paper to meet the desired design goals. It demonstrates the potentials of CNFET technology in a realistic very large-scale integration application. The proposed design is highly tolerant to D CNT variation and it is also immune to misaligned CNTs. Simulation results demonstrate that it provides sufficient driving capability with reasonable accuracy.
Programmable logic devices permit a new way to practice yield improvement: redundancy at configuration time. By doing so, the authors avoid the overheads of traditional redundancy: explicit spares, replacement logic and on-chip non-volatile memory. This presentation describes a method for avoiding defects that also does not require a unique place-and-route for each fielded chip. Formal analysis and experimental results show the feasibility of the method for standard, unmodified field-programmable gate arrays.
An efficient replica bitline (RBL) technique for reducing the variation of sense amplifier enable (SAE) timing is proposed. Both RBLs and four-fold replica cells compared with the conventional RBL technique are utilised to favour the desired operations. Simulation results show that the standard deviation of SAE can be suppressed by 44.25% and the cycle time is also reduced by ∼30% at a 0.8 V supply voltage in TSMC 65 nm technology. Additionally, the area of the proposed scheme is nearly the same as that of the conventional RBL scheme.
The probabilistic switching of resistive random access memory (RRAM) can be utilised to implement physical unclonable functions (PUFs). By setting the operation condition at a switching probability of 50%, devices in a RRAM array are randomly settled into state ‘0’ or ‘1’ after programming. The RRAM switching probability provides a natural source of randomness that could be exploited in the PUF to generate security primitives. The feasibility and characteristics of the proposed PUF are analysed by simulation based on measured RRAM switching probability. With good scalability and stochastic mechanisms, RRAM may prove to be a promising candidate for security applications.
A low-cost memory data scheduling method based on two N/2-depth single-port memories is proposed for reconfigurable fast Fourier transform (FFT) bit-reversed data reordering tasks. To make single-port memories have the equivalent ability to read and write data simultaneously, two types of read and write address generation methods are proposed. Based on the proposed data scheduling method, the bit-reversal circuits are designed for continuous data reordering tasks. The proposed bit-reversal design is implemented for a maximum 8 k flexible length FFT processor. Compared with the other two conventional methods, the proposed bit-reversal method can reduce memory area cost by 53.8 and 46.1%, respectively.
Multiple way tables in which items can be placed on several buckets are used in many computing applications. Some examples are cache memories and multiple hash tables structures. In most cases, the items are stored in electronic memories that are prone to soft errors that can corrupt the stored items. To avoid data corruption, memories can be protected with a parity bit or with an error correction code. It is shown that most single bit errors can be detected in multiple way tables without adding a parity bit. This can be done by placing the items in a predetermined order in the multiple ways of the table.
Cache has been introduced into many Graphics processing units (GPUs) to decrease the frequency of data transfer between high-performance computing units and low-speed long-latency external memory. The traditional index mapping scheme designed originally for CPU cache exploits only the spatial locality in address space. The access to graphics data always has region locality on the frame buffer: there are high spatial localities in both X and Y directions. It may generate more conflict misses on some limited cache lines, which eventually results in high cache miss ratio and a performance drop. Traditional CPU cache cannot be used directly in GPU. We propose a new conflict-avoiding GPU cache called XY - type cache with a new index mapping scheme, whose cache line indices are computed from both X and Y coordinates of pixels and the cache index distribution is consistent with the region locality on the frame buffer. Our evaluation results show that the proposed XY -type GPU cache can reduce cache miss ratio by 88% at most via scattering the cache accesses to all lines evenly, and can completely avoid the bad effect caused by frame resolution. Since the cache miss ratio in direct-mapped or 2-way set-associative structure is approximate to or even lower than that in fully-associative structure which is the best case in terms of lowering cache line conflicts, XY -type GPU cache can be designed with lower complexity and lower consumption power.
In most algorithms that use quaternion numbers, the key operation is a quaternion multiplication, of which the efficiency and accuracy obviously determine the same properties of the whole computational scheme of a filter or transform. A digit (L-bit)-serial quaternion multiplier based on the distributed arithmetic (DA) using the splitting of the multiplication matrix is presented. The circuit provides the facility to compute several products of quaternion components concurrently as well as to reduce the memory capacity by half in comparison with the known DA-based multiplier, and it is well suited for field programmable gate array (FPGA)-based fixed-point implementations of the algorithms. Apart from a theoretical development, the experimental design results which are obtained using a Xilinx Virtex 6 FPGA are reported.
This study documents the speeds of various SRAM buffer memories that are possible in a contemporary fast SiGe heterojunction bipolar transistor (HBT) BiCMOS process. An SRAM in a 0.13 µm HBT BiCMOS technology using current mode logic (CML)-style circuits serves as a basis for the discussion. This basic SRAM design features a CML decoder, CML word line driver, bipolar sense amplifier for achieving high speed and CMOS 6T memory cells for high density. The BiCMOS technology is especially useful for realising ultra-high-speed SRAMs for low level cache memory in high-clock rate computer systems, but when reorganised can also be utilised in analogue-to-digital converter (ADC) systems to store digitalised data. Speed and power tradeoffs can be made using different bias strategies, CML logic levels and different generations of SiGe HBTs. A demonstrated 128 kb SRAM macro consumes 2.7 W at 4 GHz using a −3.4 and −1.5 V supply voltage for the bipolar and CMOS circuits, respectively, and has dimensions of 3.5 mm × 3.6 mm by using IBM 8HP SiGe technology, which provides an HBT with a f T of 210 GHz. This macro can be integrated into large scale, ultra-wide bus SRAMs using heterogeneous silicon and 3D technology. Simulation indicates that with the next generation of SiGe HBTs, this SRAM macro can operate at 5 GHz, while consuming the same amount of power or alternatively consume 0.73 W, which is 73% less power consumption compared to 8HP, while operating with the same frequency of 4 GHz. Reorganising the memory for a 4 way-interleaved ADC, it can accept data written at 9.5 GS/s for 8HP designs, and 11.9 GS/s for 8XP designs.
The bit cell is a key component that determines the VDDmin and power consumption of a sub-threshold static random access memory (SRAM). A new bit cell with a pnn-type latch structure is proposed. The analysis and measurement results indicate that the pnn bit cell outperforms the conventional bit cells in terms of VDDmin and power reduction.
Magnetic random access memory based on magnetic tunnel junctions (MTJs) is among the most attractive technologies of emerging non-volatile memories. However, the integration of spin-based devices in integrated circuits is still hindered by a lack of established standard electrical simulator models. Many of such models have been proposed during the past decade which can be classified into two categories: the first ones are based on the physical Landau-Lifshitz-Gilbert (LLG) equation describing real-time MTJ magnetic switching dynamics; the second one uses analytical expressions for switching thresholds derived from the LLG equation. The aim of this reported work was to investigate for the first time the capability of each strategy to fulfil the need of industrial standard electrical simulation tools and pave the path towards a standard industrial model. Multi-simulator compatibility, efficient runtime, accuracy and reliability are the three main assets of a device model. It is shown that using the Cadence® tools suite with the Spectre® simulator, the LLG modelling strategy overcomes the analytical approach in terms of accuracy and speed with a 7× faster runtime. Both models require nearly the same hardware memory resources.
An error-correcting code (ECC) immune to bit errors can make memory performance severely degraded since incomplete-word ECC write requests lead to inefficient operations on a dual in-line memory module (DIMM). A DIMM controller efficient for such ECC operations is proposed. The key idea is that read-to-write and write-to-read operations caused by incomplete-word ECC write requests are split into independent read and write operations, and then the read and write operations are individually scheduled under data coherence constraints. Experimental results show that the proposed DIMM controller achieves 11% shorter memory latency, and 9.3% higher memory utilisation, on average, than the latest conventional DIMM controller in industrial multimedia applications. Moreover, it achieves up to 2.1 times higher memory performance on synthetic benchmarks.
In this paper, the authors propose a novel static random access memory (SRAM) that employs the adiabatic logic principle. To reduce energy dissipation, the proposed adiabatic SRAM is driven by two trapezoidal-wave pulses. The cell structure of the proposed SRAM has two high-value resistors based on a p-type metal-oxide semiconductor transistor, a cross-coupled n-type metal-oxide semiconductor (NMOS) pair and an NMOS switch to reduce the short-circuit current. The inclusion of a transmission-gate controlled by a write word line signal allows the proposed circuit to operate as an adiabatic SRAM during data writing. Simulation results show that the energy dissipation of the proposed SRAM is lower than that of a conventional adiabatic SRAM.
Coding schemes for storage channels, such as optical recording and non-volatile memory (Flash), with unknown gain and offset are presented. In its simplest case, the coding schemes guarantee that a symbol with a minimum value (floor) and a symbol with a maximum (ceiling) value are always present in a codeword so that the detection system can estimate the momentary gain and the offset. The results of the computer simulations show the performance of the new coding and detection methods in the presence of additive noise.
This study presents an efficient method for converting a normalised binary number x (1 ≤ x < 2) into a binary logarithm. The algorithm requires less memory and fewer arithmetic components to achieve 23 bits of fractional precision than other algorithms using uniform and non-uniform piecewise linear or piecewise polynomial techniques and requires less than 20 kbits of ROM and a maximum of three multipliers. It is easily extensible to higher numeric precision and has been implemented on Xilinx Spartan3 and Spartan6 field programmable gate arrays (FPGA) to show the effect of recent architectural enhancements to the reconfigurable fabric on implementation efficiency. Synthesis results confirm that the algorithm operates at a frequency of 42.3 MHz on a Spartan3 device and 127.8 MHz on a Spartan6 with a latency of two clocks. This increases to 71.4 and 160 MHz, respectively, when the latency is increased to eight clocks. On a Spartan6 XC6SLX16 device, the converter uses just 55 logic slices, three multipliers and 11.3kbits of Block RAM configured as ROM.
In order to address to the issue that disk array technology exists a high CPU usage and low efficiency, we proposed the RAID6 checksum algorithm based on GPU-accelerating, the main research in it was RAID6 PQ generation algorithm. On the basis of RAID6 CPU checksum algorithm, we studied the characteristics of the parallel, and developed the RAID6 algorithms of GPU version. By the system testing, we compared the difference between GPU parallel algorithms and the efficiency of CPU implement, in the case of different quantity of disk and different stripe size. Then it has been proved that it can improve the efficiency of calculate and reduce the CPU utilization by implementing the RAID6 checksum algorithm on the GPU.
This paper presents a highly integrated sinusoid reference generating system based on Microblaze, suitable for precise testing of protective relays. Microblaze, a soft CPU core, was embedded in a FPGA that integrates an ADC interface, a DAC interface, an EEPROM and other peripherals. The prototype and transfer functions of various types of digital filters were designed with reference to the characteristic of the required output signal. Their functional behaviour were implemented in the system on chip using a recursion algorithm. As a critical factor in the digital reference design, a detailed discussion has been performed to introduce the theory of three types of closed-loop control, i.e. amplitude, frequency and phase control. A digital PI control algorithm was implemented in the system to satisfy the control target. The experimental results indicate that the relay evaluation system, using this sinusoid reference, operates correctly. The paper will demonstrate how the performance of the output sine signal improves, as compared with the normal sine reference, especially when outputting low amplitude signals. The research methodology of this reference system and this highly integrated circuit are significant for the optimization of a relay testing system, and provide a theoretical justification and a feasible implementation for a precise relay testing system.
A new last-level cache replacement policy for systems with a phase-change memory (PCM) main memory is presented. The proposed policy aims at reducing the write traffic to PCM by considering the fine-grained dirtiness of cache blocks when making a replacement decision. Experimental results show that the proposed policy reduces the write traffic to the PCM by 26 and 17% on average and up to 52 and 33% compared to not recently used and re-reference interval prediction, respectively.
A swap time-aware garbage collection (STGC) policy for the NAND flash-based swap system is proposed, which focuses on reducing the cleaning cost and improving the degree of wear-levelling. STGC calculates the cleaning index value of each block to select a victim block and the normalised value of the elapsed swap time of each valid page within the victim blocks to identify the hot valid page and cold valid page. Trace-driven simulations with a synthetic trace show that the STGC outperforms the existing garbage collection policies.
A low-complexity twiddle factor generation structure for fast Fourier transform (FFT) is proposed. In FFT, twiddle faction generation and multiplication occupies more area than the other mathematical operations. The proposed structure reduces the twiddle factor generation part by removing the redundancies in the conventional structure and compressing the twiddle factor ROM contents. With the proposed structure, the twiddle factor generation part is reduced by 32–45% compared with that of the conventional structure.
Design and implementation of a fully table look-up digital pulse-width modulation (DPWM) controller for high-frequency DC–DC buck conversion is presented. The controller comprises a 1 bit analogue comparator, a digital error process unit and a fully table look-up multi-phase DPWM. The interface of analogue-to-digital conversion is performed with the analogue comparator. Moreover, the proposed programmable memory is based on the table look-up multi-phase approach for the functions of the proportional-integral-derivative (PID) compensation, which alleviates the penalty of using large chip-area multipliers. As a result, the approach is very suitable for system-on-a-chip (SOC) implementation. A prototype test chip is realised to validate the mechanism of the proposed architecture.
The impact of dynamic variability due to low-frequency fluctuations on the operation of CMOS inverters, which constitute the basic component of SRAM cell, is investigated. The experimental methodology to characterise the effect of dynamic variability in a CMOS inverter is first established based on fast I–V measurements of the load current following the application of a ramp input voltage V in(t). It is shown that, for small ramp rise times, the load current characteristics I DD(V in) exhibit a huge sweep-to-sweep dispersion due to low-frequency noise. The impact of such dynamic variability sources on the inverter's output characteristics V out(V in) is finally demonstrated, revealing a 20% noise margin reduction for the smallest inverter cell.
A design for an integer motion estimator of high-efficiency video coding (HEVC) is presented. HEVC supports the 64 × 64 coding tree unit, the recursive quad-tree coding unit structure and the asymmetric motion-partitioning mode in a high compression ratio. These features require a structure of integer motion estimation that is more complex than that of H.264/AVC. The new structures of a memory read controller and a sum of absolute difference (SAD) summation block are proposed. The new memory read controller reduces the internal memory read time, and the new SAD summation block structure supports the recursive quad-tree coding unit structure and the asymmetric motion-partitioning mode. The proposed design is implemented in Verilog HDL and synthesised using the 65 nm CMOS technology. The gate count is 3.56 M, and the internal static random access memory is about 20 kbyte. The operation frequency is 250 MHz when a 4 K-Ultra high definition (UHD) (3840 × 2160P at 30 Hz) sized video is encoded.
A novel management scheme for the write buffer in solid-state drives (SSDs) is presented. The proposed scheme exploits the future buffer reference pattern by using I/O commands information in native command queuing (NCQ) of SATA SSDs. Through the trace-driven simulations, it is shown that the proposed scheme improves the performance of the write buffer significantly in terms of several metrics including the hit ratio.