New Publications are available for Multiprocessing systems
http://dl-live.theiet.org
New Publications are available now online for this publication.
Please follow the links to view the publication.The reconfigurable self-routing property of mixed banyan-type network
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.1469
A 2<sup xmlns="http://pub2web.metastore.ingenta.com/ns/">n</sup>×2<sup xmlns="http://pub2web.metastore.ingenta.com/ns/">n</sup> n-stage, routable, bit-permuting network is called a banyan-type network, which has self-routing property by guide transform and trace transform. Some researchers analyzed self-routing property of a banyan-type network and tandem connection of banyan-type networks not about the number of arbitrary stages. When network is constituted by many mixed banyan-type nodes, self-routing property only applied in node not network level. So firstly this passage analyses self-routing property of cascading banyan-type network based on bit-permuting exchange. Secondly it proposes a method to resolve self-routing problem of the n (n21og<sub xmlns="http://pub2web.metastore.ingenta.com/ns/">2</sub><sup xmlns="http://pub2web.metastore.ingenta.com/ns/">N</sup>) stages N×N banyan-type network . Then it defines two reconfigurable manipulation on bit-permuting to divide switching source to different slices. Finally it uses the method and the definition to realize self-routing in the network level of mixed banyan-type nodes.Future of computer hardware
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0007
Summary form only given. "Experienced prophets concerning the future." are careful with predictions With this statement in mind, in this paper recent developments in computer hardware are considered along their technical possibilities and limitations. Beyond available standard multicore CPU hardware, this involves the recent impact of specialized accelerator hardware: Currently, general purpose graphic processing units (GPGPUs) and programmable specialized hardware such as FPGA processors or e.g. IBM's Cell processor are successfully used in the field of high-performance computing. This specialized co processor technology gives rise to hybrid-type processors integrating CPU and GPUs. Upcoming candidates for such hybrid-architectures are Intel's "SandyBridge" processor architecture arising from the "Larabee" project, the upcoming systems from AMD including the graphics accelerator technology of ATI Technologies into their CPUS or the future fusion processors recently announced by the companies ARM and NVIDIA under project codename "Denver". The data-level parallelism required for these architecture to achieve suitable high-performance levels has an impact on the development and use of computational electromagnetics (CEM) algorithms and simulation tools: The required use of Single Instruction Multiple Thread (SIMT) operations on massively parallel compute-kernel structures results in a severe performance sensitivity with respect to a controlled flow of the data streams. As a result, these specialized computer architectures will favour numerical schemes which implicitly support these features. High data locality and an intrinsic parallelism usually results in a good ratio between the computational workload for the floating point units (FPUs) and the need for data movement. This can be found e.g. in higher order Discontinuous Galerkin FEM time domain formulations. More generally, discretized field formulations using explicit time integration schemes commonly are easier to parallelize than those based implicit schemes, where complicated solution schemes are required for the algebraic systems of equations involved. Beyond the currently valid paradigm of massive parallelism which follows several decades of a steady increase in the average wall-clock speed of CPU architectures, current research on computational architectures also focuses on reconfigurable systems. First systems using hybrid-core computing by CPUs with programmable FPGAs are becoming available and are already in use for data intensive applications. Future systems featuring fully reconfigurable special-purpose cores that are capable to adapt to the computational task at hand are currently under development. The possible full impact of such reconfigurable core systems to the field of computational electromagnetics is yet an open subject to speculation, although some research results on FDTD implementations hard-coded on FPGAs have been published already in the past recent years. Another important topic to be addressed with future computer hardware is the need to optimize the ratio of computational performance in relation to the electric power consumption (Flop per Watt): On a technical system level, a high electric power density is cause for concern because of resulting thermal stresses and often results in the need for expensive cooling measures. On a macroscopic economical level, the costs of the electric energy consumption of the computer system itself and its external cooling systems result in increasing total costs of ownership (TCO). With an increasing need for very large scale CEM computations e.g. in computational electromagnetic compatibility testing, these TCO costs are no longer negligible even for medium scale compute-clusters in an industrial or research environment. In addition, the reduction of electric energy consumption of the compute cores is an essential technical necessity in the design of future high-performance computer systems in the exa-flop scale. With these future super computers to be available approximately in 2018 (following Moore's law) the contemporary peta-flop super computers are to be exceeded in terms of computational performance by three orders magnitude, but they may not so exceed these in terms of electric energy consumption.Dynamic hardware-software task switching and relocation mechanisms for reconfigurable systems
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0554
With technology development, hardware tasks can be configured at run-time. Idle hardware tasks in a reconfigurable logic should be replaced by other hardware tasks through a task switching mechanism. For the efficient use of system resources, task switching and relocation mechanism are proposed in this work. During task switching and relocation, three issues including the choice of switchable points, the maintenance of correct transparent communication, and the transfer of context data need to be solved. The methods for designing both hardware tasks and software tasks are proposed so as to allow users to integrate their designs into a system with task switching and relocation. Template functions are proposed and implemented to allow tasks to save/restore their context data. The control flows of task switching and relocation also guarantee the correctness and consistency of task communication relations after switching or relocation. Finally, several implementation examples are provided to prove the correctness of the proposed mechanism. For the greatest common divisor (GCD) example without an operating system, software task switching takes 2% of the total software execution time, while hardware task switching takes 23% of the total hardware execution time. As far as task relocation is concerned, the software to hardware relocation takes 3% of the total execution time, while the hardware to software relocation takes 5% of the total execution time. The high difference in software and hardware task switching time is due to the latency incurred by the context data transfer and the execution of the hardware driver process.Hierarchical decoder for filter-based low-power BTB
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0552
In this paper we propose a hierarchical decoder to augment power saving of the sentry-table filter based low power branch target buffer (BTB) of branch predictor in modern processors. The sentry table scheme filtrates unnecessary accesses of branch target buffer to reduce dynamic power consumption, yet the power saving was found hysterically bound by the power dissipation of decoder, especially when the BTB size grows. The proposed hierarchical decoder (H-DEC) can significantly offset the effect of decoder power dissipation. We use CACTI tool, SimpleScalar and Watch simulators and SPEC2000 benchmarks to conduct our experiments. From our empirical studies, power savings for BTB can be further improved from 19-38% to 68-91%; and those for branch predictor from 17-21% to 37-81%.Parallel reconfigurable computing and its application to hidden Markov model
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0542
Parallel processing techniques are increasingly found in reconfigurable computing, especially in digital signal processing (DSP) applications. In this paper, we design a parallel reconfigurable computing (PRC) architecture which consists of multiple dynamically reconfigurable computing units. The hidden Markov model (HMM) algorithm is mapped onto the PRC architecture. First, we construct a directed acyclic graph (DAG) to represent the HMM algorithms. A novel parallel partition approach is then proposed to map the HMM DAG onto the multiple DRC units in a PRC system. This partitioning algorithm is capable of design optimization of parallel processing reconfigurable systems for a given number of processing elements in different HHM states.An improved implementation of Montgomery algorithm using efficient pipelining and structured parallelism techniques
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0479
The Montgomery modular multiplication plays an important role in public-key encryption algorithm. We present an efficient hardware implementation of Montgomery algorithm. Regarding its application in RSA, we also present two techniques for more efficient implementation of modular exponentiation. The experiment results show that by reducing the number of operations in implementing Montgomery modular multiplication and modular exponentiation, it could work faster and be more power ef. The algorithms are implemented in Verilog HDL and simulated in Modelsim to verify the results. The synthesis process is performed based on Altera Stratix III FPGA in Quartus II. With the help of Stratix PowerPlay Early Power Estimator, we are able to evaluate the power consumption.Congestion-and energy-aware run-time mapping for tile-based network-on-chip architecture
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0578
The mapping of application tasks to processing elements (PE) connected by a network-on-chip (NoC) has a significant impact on the overall performance and power consumption of the applications. In this work, a novel dynamic task mapping algorithm is proposed to reduce the overall latency and power consumption of a given set of applications. Applications are modeled by task graphs. Each task graph represents a application and is composed of several tasks. A task is mapped as close to its parent task as possible, based on a candidate spiral search (CSS) method for candidate PEs. The CSS starts to search an empty PE for mapping the requested task with the Manhattan distance between the requested task and its parent task is equal to one. If there is not any candidate available, the Manhattan distance is increased by one until an empty PE is found. Further, the aggregate communication load (ACL) of each candidate PE is also monitored. A task is primarily mapped to a candidate PE which is in the candidate set found by CSS with the minimal ACL. The proposed Rotating Mapping Algorithm (RMA) thus employs CSS to reduce communication latency and ACL to achieve load balancing, which implies lower power consumption. Experiments demonstrate the feasibility and benefits of the proposed method compared with some state-of-the-art task mapping techniques. The Proposed algorithm at most reduces 42.73% in the total energy consumption and 29.32% in the global average delay.Modeling technique for simulation time speed-up of performance computation in transaction level models
http://dl-live.theiet.org/content/conferences/10.1049/ic.2010.0142
Modeling embedded systems at transaction level facilitates the architecting of hardware and software resources according to non-functional requirements. Raising the level of abstraction, Transaction Level Modeling (TLM) represents a good compromise between modeling accuracy and simulation speed. However, in complex pipelined architectures, the efficiency of exploration and performance evaluation is limited by the number of involved transactions and by the various non-functional properties to assess. In this paper we propose a technique to improve the creation of transaction level models and the description of properties related to resources of the system architecture. This technique is based on the separation of concerns between the evolution of a system model at transaction level and the computation of non-functional properties. The considered case study is a wireless communication receiver based on the Long Term Evolution (LTE) protocol. The proposed technique is used to evaluate the related computing complexity according to various system configurations.Towards high-level executable specifications of heterogeneous systems with SystemC-AMS: application to a Manycore PCR-CE lab on chip for DNA sequencing
http://dl-live.theiet.org/content/conferences/10.1049/ic.2010.0154
The paper presents the systemC/systemC-AMS implentation of a for DNA analysis lab-on-chip that encompasses several disciplines such as analog, digital, chemical kinetic reactions, optics and embedded software. The corresponding system virtual prototype takes as input an initial DNA concentration as well as the expression of the gene as a DNA string, and is able to compare the input string to a huge database of reference samples, thus providing a way to efficiently detect mutation and pathologies. The presented model is composed of three parts: (1) DNA amplification by Polymerase Chain Reaction (PCR), (2) molecular separation by Capillary Electrophoresis (CE) and optical detection of fluorescently labeled molecules, (3) automated DNA sequencing on a digital manycore architecture running a highly multi-threaded software. The platform also digitally assists the two former AMS parts. The simulated model can be used as a simulatable specification at a very high level of abstraction and can be seen as the first refinement step towards the design of complex heterogeneous bio-compatible designs.Complete verification of weakly programmable IPs against their operational ISA model
http://dl-live.theiet.org/content/conferences/10.1049/ic.2010.0125
This paper suggests an operational instruction set architecture (OISΛ) model for specifying weakly programmable IPs (WPIPs). WPIPs are application-specific programmable System-on-Chip (SoC) modules such as application-specific instruction set processors (ASIPs). The individual instructions of WPfPs often implement large segments of an application algorithm corresponding to hundreds of conventional RISC instructions. The pipeline structure of a WPIP design is commonly deter mined by basic operations of the application algorithm. For this reason, the pipeline is designed in a bottom-up manner where the components for the individual operations are developed first. Our OISA model reflects this design style by specifying the instruction semantics in terms of predefined operations that are associated with specific pipeline stages. After creation of the OISA model a property set can be generated automatically that uniquely specifies the entire design. Moreover, the verification process used to design the OISA model explicitly reveals hardware restrictions imposing constraints on the software to be considered by the programmer.Electromagnetic engineering simulation techniques using parallel EM simulation tools
http://dl-live.theiet.org/content/conferences/10.1049/cp.2009.1334
In this paper, we present some basic ideas to use EM software package to accurately simulate the EM problems. The basic procedure in the EM simulation includes excitation setting, mesh design, boundary setting and parallel processing design for a parallel code. In the paper, we introduce some simulation techniques using different software packages and use these software packages to simulate some typical examples.Design of DDR2 SDRAM controller for video post processing pipeline
http://dl-live.theiet.org/content/conferences/10.1049/cp.2009.1904
To satisfy real-time high-definition video processing requirement of video post processing pipeline, this paper proposes a novel DDR2 controller design which efficiently and selectively integrates the DDR2 SDRAM controller created by Xilinx MIG (Memory Interface Generator) and the control module of MPMC (Multi-Port Memory Controller) . The DDR2 controller is implemented as a part of the whole pipeline of a video post processing processor which has been verified in the Xilinx XUP5 Lxt-110t FPGA. Many experimental results have shown that this DDR2 controller demonstrates properties of low-latency, highthroughout, high bus utilization compared to the individual MIG and MPMC controllers, and meets the real-time HD processing requirements for this video post processing processor.Implementing complex and multiple DSP systems on chip: developing a "tops-down" approach to multicore processor architectures
http://dl-live.theiet.org/content/conferences/10.1049/ic_20080611
TI has over a decade of successful history in multi-core processor and system-on-chip (SoC) design. A generic bottom-up SoC architecture approach will be compared to application-domain focussed various multi-core architectures. Heterogeneous designs will be compared to homogeneous solutions. Various performance indicators will be discussed, such as application scope, design challenges, power consumption and development tool chains. The talk will close by addressing some current-day virtualization challenges in multi-core processor design. (16 pages)Challenges of programming multi-core microprocessors
http://dl-live.theiet.org/content/conferences/10.1049/ic_20080614
In this presentation we describe some of the programming difficulties posed by multi-core microprocessors. The presentation begins with a review of existing techniques for implicitly deriving parallel programs from sequential code and for writing explicitly parallel programs. We claim that many of the programming abstractions for parallel program have been honed for the developed of closed world software like operating system kernels and are not suitable for application development in a modular manner. We then review some new technology from various research groups around the world that shows promise for multi-core development. Examples of mechanisms describe include join patterns, transactional memory and nested data parallelism. We also describe some of the considerable verification challenges confronted by parallel program developers and then review some advances in formal analysis which may help to mitigate this issue. Finally, we consider the evolution of current multi-processor architectures and discuss whether there are alternative ways of organizing processors, memories and other compute elements to support a high level parallel processing programming paradigm. (30 pages)Design of a hardware accelerator for multi-layer maze routing in VLSI and its implementation on Virtex-II pro
http://dl-live.theiet.org/content/conferences/10.1049/ic_20070724
This paper proposes a new approach for the implementation of Lee's grid-based Manhattan routing algorithm on Virtex-II Pro FPGA. The grid-based algorithm was also implemented using C Data Structures and a comparison is drawn between the two approaches. A hardware accelerator for an 8 x 8 x 4 grid has been successfully implemented and tested on Xilinx Virtex-II Pro FPGA. Performance of hardware accelerator shows speed-up of 40-60% over software implementation for an 8 x 8 x 4 grid. The success of this approach demonstrated here favors further research to develop larger size arrays with multi-layer routing on high end FPGAs and to evaluate the performance of these designs.Recent advances in COTS architectures applied to avionics
http://dl-live.theiet.org/content/conferences/10.1049/ic.2007.1668
Presents a collection of slides covering the following topics: COTS architecture; military electronics; military system; open system; plastic encapsulation; weapon system; logistics improvement; integrated circuit; microelectronics chip; CMOS development; neutron collision; multicore processor; and graphical editor.Parallisation of the MLFMM for distributed memory systems
http://dl-live.theiet.org/content/conferences/10.1049/ic.2007.1286
This paper describes the efficient parallisation of the multilevel fast multipole method (MLFMM) for distributed memory systems. Data is distributed between the processes to obtain a good load balance and to minimise the number of communications. The total runtime efficiency is highly dependent on the efficiency of the matrix-vector-product, as well as the computation of the preconditioner. (4 pages)Maximising system-performance with symmetric multiprocessing
http://dl-live.theiet.org/content/conferences/10.1049/ic_20050683
This article discusses the following topics: multicore systems (hardware trends, operational modes, symmetric multiprocessing, asymmetric multiprocessing, bound multiprocessing) and design considerations for multicore development (partitioning applications, architecting for multicore, optimizing for multicore). (8 pages)Real-time architecture for a large distributed surveillance system
http://dl-live.theiet.org/content/conferences/10.1049/ic_20040096
The latest generation of surveillance systems can be categorised as concurrent, distributed and large real-time systems. The most common and well-known approach to design these systems is based on object oriented technology. We present another approach to design an intelligent distributed surveillance system. The approach method is known as "real time network approach" or MASCOT, a design method for designing and implementing large real-time concurrent systems. The basic notion is that the flow of data through the system is controlled solely by a set of concurrent processes. We introduce the fundamental concepts of MASCOT and its possible contributions to the creation of good designs for such intelligent distributed surveillance systems. We finish with the illustration of a distributed real time surveillance system using this approach.Modelling the IXP1200 network processor using OO techniques
http://dl-live.theiet.org/content/conferences/10.1049/cp_20040554
We examine how the IXP1200 network processor can be modelled using object-oriented techniques. The parallel object oriented specification language (POOSL) is an example of an object-oriented language. POOSL is an expressive modelling language for analysing parallel real time distributed systems. We examine the Intel IXP1200 network processor and discuss how POOSL can be utilised to allow an evaluation of a system before implementing it with hardware and software components. With the case study of the IXP1200, we illustrate the suitability of POOSL for system level modelling and design exploration.Towards a custom vector-parallel machine for TLM
http://dl-live.theiet.org/content/conferences/10.1049/cp_20040433
This paper describes an application specific vector processor for TLM. The core is based on an open-source microprocessor with a custom vector coprocessor. In architectural simulation the vector unit is able to achieve a speedup of 10 times compared with an ordinary scalar processor when tested on a mesh of one million nodes with a vector length of 16. The vision for this design is to apply multiple vector cores in the form of a shared-memory vector multiprocessor, and thus improve the performance even further.A functional methodology for parallel image processing development
http://dl-live.theiet.org/content/conferences/10.1049/cp_20030538
Parallel image processing has been a topic of interest for many years, but has not yet delivered a design methodology which can maintain portability across rapidly changing parallel implementation technologies. The two hurdles to be overcome are the difficulties of re-arranging application code into a parallel form and the lack of a general and universally supported parallel machine model. This paper addresses these problems using a functional language transformation, illustrated in the context of a particular application. We show that the overhead introduced by the transformation is relatively small, but the benefit derived is substantial, since the functional programming discipline enforces an implementation-independent definition of core parallel requirements, which can then be mapped onto a broad set of parallel architectures, ranging from shared and distributed memory conventional multiprocessors to direct hardware implementations constructed using silicon compilers and FPGA technology.A multiprocessor architecture for PDE-based applications
http://dl-live.theiet.org/content/conferences/10.1049/cp_20030508
The paper discusses a parallel, multiprocessor implementation of methods based on partial differential equations (PDE). It focuses on the implementation requirements of segmentation by active contours. Here, the key task is the accurate computation of an implicit description of the contours, i.e. the distance map. This paper demonstrates the use of the proposed architecture by giving a hardware implementation of the distance transform based on the parallel Massive Marching algorithm. The extension of the non-linear filtering implementation is also proposed.Multi-level fast multipole for antenna modelling
http://dl-live.theiet.org/content/conferences/10.1049/cp_20030075
The method of moments (MoM) is considered as the benchmark against which other full-wave electromagnetic modelling techniques are compared. Unfortunately, this technique carries a heavy computational burden, which limits its applicability to relatively small problems. The fast multipole method (FMM) (see Coifman, R. et al., IEEE Antennas and Propag. Magazine, vol.35, no.3, p.7-12, 1993), reduces the computational burden considerably and can be applied on multiple levels (see Song, J.M. et al., IEEE Trans. Antennas and Propag., vol.45, no.10, p.1488-93, 1997). The overhead for the multi-level fast multipole algorithm (MLFMA) is greater than the direct application of MoM for small problems, but the benefits become apparent for problems with greater than approximately 2000 unknowns. An implementation of MLFMA has been developed. This software has been ported to parallel computing platforms using the MPI (message passing interface) library. The greatest benefits can be gained by using this technique on the most powerful computers. The scattering from a whole fighter aircraft can be calculated up to GHz frequencies using a parallel computer. The software is being developed to add the capability to model installed antenna performance. Some of the technical challenges are described.A parallel OPF approach for large-scale power systems
http://dl-live.theiet.org/content/conferences/10.1049/cp_20020028
In this paper, the Newton-Krylov algorithm of power flow calculation is reviewed. A rectangular coordinate voltage formulation of nonlinear optimal power flow (OPF) is introduced and adopted into the unlimited point algorithm, and a Newton-Krylov algorithm similar to that of power flow calculation is proposed to solve the unlimited point transformed optimality conditions for the original OPF problem. After verifying the proposed algorithm with several power systems, shared-memory parallel implementation is carried out on a Sun Sparc 4PEs workstation. The observations of the parallel characteristics of the Newton-Krylov algorithm and its core part, Bi-CGSTAB, are made based upon the test results on both shared-memory and distributed-memory (including PC cluster) parallel computers.Novel methods for enabling public key schemes in future mobile systems
http://dl-live.theiet.org/content/conferences/10.1049/cp_20020438
It is essential to enable public key schemes in future mobile systems to solve current problems in authentication and key management for end-to-end security. We propose new procedures for enabling public key schemes in future mobile terminals. The proposed procedures are based on the complex public key computations that can be performed either in the SIM card or in the terminal itself Multiple cryptoprocessors are also used to decrease the processing time required to perform the complex public key computations.Real-time simulation for power systems based on parallel computing - an empirical study
http://dl-live.theiet.org/content/conferences/10.1049/cp_20000454
Computer simulation is a versatile and commonly used tool for the design and evaluation of systems with different degrees of complexity. Power distribution systems and electric railway networks are areas for which computer simulations are being heavily applied. A dominant factor in evaluating the performance of a software simulator is its processing time, especially in the cases of real-time simulation. Parallel processing provides a viable mean to reduce the computing time and is therefore suitable for building real-time simulators. In this paper, the authors present different issues related to solving a power distribution system with parallel computing based on a multiple-CPU server and they concentrate, in particular, on the speed-up performance of such an approach.A low-cost parallel computing platform for power engineering applications
http://dl-live.theiet.org/content/conferences/10.1049/cp_20000422
This paper develops and evaluates a low-cost parallel computing platform for the implementation of parallel algorithms in power engineering applications. The proposed approach utilises an existing local area network without incurring any additional hardware costs. Application of computational intelligence techniques based on the developed computing platform to the economic dispatch problem is outlined. The performance of genetic algorithms in parallel and cluster structures and their abilities in coping time constraint applications are also demonstrated. It is found that when the workload is large, a parallel computing structure should be exploited for cost-effectiveness purpose.The impact of parallel computers
http://dl-live.theiet.org/content/conferences/10.1049/ic_19990052
This paper presents a snapshot in the continuing development of numerical modelling as applied to electromagnetic analysis and synthesis. From the experience of work done in Oxford Parallel on electromagnetic systems, the author draws some personal conclusions as to the continuing and future impact of parallel computers in this area. (4 pages)Post production workstations on NT platforms
http://dl-live.theiet.org/content/conferences/10.1049/ic_19980953
Intergraph Computer Systems have led the development of a new tool for video post production professionals; an affordable desktop workstation based on an open, extensible architecture. This paper describes the evolution of Intergraph's systems and discusses the design decisions that shaped these systems. (7 pages)Technology transfer and certification issues in safety critical real time systems
http://dl-live.theiet.org/content/conferences/10.1049/ic_19980523
The advances in recent years in the academic field of scheduling of real-time systems have not been matched to the same degree within industry, especially within safety critical systems development. This is to some extent prudent, as the maturity of methods and techniques must be assured before being utilised within systems upon which lives depend once the applicability of a technique has been assured in theory, however, there are still major hurdles to be overcome in convincing customers, project managers, engineers and certifying authorities of the safety and/or applicability of the method. This paper briefly discusses the approach taken and the lessons learned when introducing fixed-priority based scheduling into a safety-critical system. It concentrates on the migration of the scheduling method from a largely academic idea into an “industry strength” technique, and the steps that had to be taken to achieve this migration. These include the close co-operation between university staff and company engineers; the championing of the technique within the target engineering department; and the liaison between development engineers, customer quality and safety representatives and the certification authorities. The introduction of the new technology highlighted some key lessons to be learned when managing such projects in the future. These need to be understood by both engineers and academics alike if they wish to see the successful uptake of novel ideas within an industrial application. (4 pages)Scheduling real-time communications with P-NET
http://dl-live.theiet.org/content/conferences/10.1049/ic_19980530
In this paper we address the P-NET Medium Access Control (MAC) ability to schedule traffic according to its real-time requirements, in order to support real-time distributed applications. We provide a schedulability analysis based on the P-NET standard, and propose mechanisms to overcome priority inversion problems resulting from the use of FIFO outgoing buffers. (5 pages)Integration of a real-time parallel transputer-based helicopter simulation with real flight control hardware
http://dl-live.theiet.org/content/conferences/10.1049/cp_19980615
A transputer-based parallel implementation of a helicopter model allows real-time operation with a high level of fidelity. Such an implementation, using a highly configurable transputer network has now been developed. Our model, of the EH101 helicopter, has a five-bladed main rotor, with twelve elements modelled for each blade. The subject of this paper is the integration of this model with production flight control computers as used on the EH101, a hardware-in-the-loop simulation.Code generator for parallel implementation of intensive algorithms on multiple DSP chips
http://dl-live.theiet.org/content/conferences/10.1049/ic_19970999
Many intensive and real-time signal processing applications for example in control, speech synthesis/recognition and image processing, have computational requirements which are too high for implementation on single DSP processors. Parallel implementation on multiple DSP processors provides an attractive solution to this problem. This paper presents a code generator which is capable of automatically producing parallel code for a multiprocessor hardware platform. The generator makes use of special code skeletons to abstract away from the hardware platform being used, and hence providing a high degree of flexibility in the choice of platforms. By using a consistent interface to these skeletons one can easily retarget the same signal processing application to different hardware systems. The design philosophy and architecture of the code generator as well as the types of multiprocessor hardware platforms used are presented and discussed. (10 pages)A pulse compression radar signal processor
http://dl-live.theiet.org/content/conferences/10.1049/ic_19970995
This paper describes the continuing work on hardware development of a pulse compression radar signal processing system, with application to the evaluation of low cost radar signal processing. The aim of this work is to design a flexible hardware platform which can be used to develop radar detection algorithms incorporating pulse compression. The flexibility permits the use of multiple types and lengths of codes available for phase coded pulse compression. To facilitate testing of the instrumentation system, a hardware simulator has been developed. The hardware implementation uses INMOS A100 digital signal processors, a transputer and Altera EPLDs. The A100 was selected due to its speed and ease of cascading. To facilitate algorithm development and analyses of results, National Instruments Labview is used as an operating environment. (5 pages)An introduction to modular map systems
http://dl-live.theiet.org/content/conferences/10.1049/ic_19970732
We present an overview of our design for a fully digital hardware implementation of the Self Organising Map (SOM) (T. Kohonen, 1982). Our approach has resulted in a modular system (Modular Maps) which utilises fine grain parallelism with each neuron being a separate entity implemented as a small RISC processor. The essence of the SOM has been maintained by this design, although minor modifications have been made to the original algorithm to facilitate implementation. Modules can be used as either stand alone systems or combined to enable large networks to be created and large input vectors to be catered for. A simulator system was developed to facilitate investigation into the high level behaviour of Modular Map systems and, as Modular Maps are computationally intensive and parallel in nature, it was implemented on a parallel computer system. A series of simulations was carried out using encoded images of human faces where it was found that the classification accuracy of a Modular Map system offered an improvement over that of the traditional SOM. (4 pages)A new approach to past distance protection with adaptive features
http://dl-live.theiet.org/content/conferences/10.1049/cp_19970022
Distance relays are one of the most important components of protection available to power system protection engineers. Distance relays can benefit from ideas in the newly developed field of adaptive protection, and can offer an even more selective and sensitive form of protection, under a variety of power system configurations. A new fast adaptive distance protection scheme for single line to ground faults and multi-phase faults for EHV transmission lines is described in this paper. This scheme also performs well on series compensated lines. The new complementary fast tripping algorithms together with the earlier experienced algorithm and multiprocessor-based distance algorithm, have led to the development to an adaptive hybrid scheme. This scheme is implemented into a well-proven hardware platform with moderate requirements on the communication. The protection scheme provides high speed tripping (less than one cycle) and high speed signalling.Optimisation of high-speed motors using a genetic algorithm
http://dl-live.theiet.org/content/conferences/10.1049/cp_19971031
An optimisation software package for high-speed electric motors has been compiled. High-speed motors are modelled with finite element and thermal network analyses. A genetic optimisation algorithm is chosen for the optimisation and parallel computing is used to speed up the optimisation process. High-speed motors of 53,000 RPM; 50 kW and 100,000 RPM; 100 kW have been optimised. The objective of the optimisation was to minimise the motors' losses. The results justify the use of genetic algorithms and emphasise their accurate thermal modelling of high-speed motors.Object-oriented optical interconnection distributed parallel processor array network for integrated substation automation system
http://dl-live.theiet.org/content/conferences/10.1049/cp_19971830
In order to advance the level of integrated substation automaton system (ISAS), a novel object-oriented optical interconnection distributed parallel processor array network used to ISAS is established. The function unit (FU) and the optical interconnection network (OIN) are important parts of the ISAS. The FUs include protection unit, control unit, monitoring unit, man-machine unit and so on. The massively optical interconnection distributed parallel processor array network consists of various kinds of protection units, control units and other units by means of the OIN. In this paper, the architecture and implementation of FU and OIN are described in detail. Results show that the ISAS not only has high parallel processing ability to coordinate the relationship organically between the FUs, but also has advantages of high speed, large bandwidth, immunity to electromagnetic interference as well as being simple, reliable, economic and easy to compact.A fully digital real-time simulator for protective relay testing
http://dl-live.theiet.org/content/conferences/10.1049/cp_19970050
This paper outlines the hardware and software design aspects of a real-time digital simulator (RTDS). The hardware architecture is based on parallel processing techniques and employs many high-speed digital signal processors. Sophisticated graphical interface software facilitates circuit assembly and simulator operation. Detailed power system component models together with flexible analogue and digital I/O enable detailed testing of power systems and external devices connected to such systems. The RTDS can be applied in various areas of power system study. Specific applications which have been identified include closed-loop protective relay testing, control equipment testing, analogue simulator expansion and operation as a power system training tool.Scalable hardware and software architecture for radar signal processing system
http://dl-live.theiet.org/content/conferences/10.1049/cp_19971770
The conventional approach to digital signal processing in radar systems involves hardware realization with the use of specialized integrated circuits. Such an approach is lacking in versatility and scalability. Advances in digital signal processor (DSP) technology make it possible to realize nearly all algorithms in software, using general-purpose DSP chips (cf. Edwards and Wilkinson 1996). Such an approach has many advantages as it can be easily adapted to changing (growing) users demand. In the current paper the latter method is considered in detail and a scalable architecture of hardware and software adequate for radar signal processing is derived. Typical radar systems require a total workload of the order of 1-10 Gflops. Comparing this figure with the 50 Mflops average performance of a modern floating-point DSP evidently one has to use about 100 processors in a single system. Therefore, both the topology of their connections and methods of paralleling processing algorithms are very important.Neural network optimal-power-flow
http://dl-live.theiet.org/content/conferences/10.1049/cp_19971842
The paper develops a massively-parallel computing structure based on arrays of neural networks to solve the optimal power flow (OPF) problem. The context of its application is in EMS (energy management systems) relating to finding optimal operating states for an interconnected power network system where the total load demand which the system is required to supply is specified. A principal feature of the neural network OPF is that it offers ultra-highspeed computation. It provides parallel computation and, at the same time, takes full advantage of sparsity of the matrixes encountered in ORE.Real-time O.S. based radar controller for multi-mode phased array radar system
http://dl-live.theiet.org/content/conferences/10.1049/cp_19971738
Modern multi-mode radar systems require efficient control structures and algorithms for the dynamically changing radar missions and environments as well as effective utilization of limited radar resources. In this paper, a real-time operating system (OS) based radar system control technique is presented for a multi-mode search phased array radar system, which is operated in composite target and clutter environments. The proposed radar controller has been designed and implemented with high performance, multiple processors in the real-time operating system environment. The implemented techniques have been evaluated by the real-time operation of the phased array radar system integrated in a van-type test-bed. Depending on the allocated missions and functions, the radar controller efficiently operates by distributing the required loads to the multiple processors interconnected through the dedicated high speed data network. The main hardware structure design is based on the PME68-42, a single board computer of Radstone and the transputer processor of Inmos. The software code is developed using the C-language with VRTXsa of Microtec and VIRTUOSO of ISI. The standard VME bus and transputer link are used for the interfaces between the control processors.Parallel transient stability analysis on distributed memory message passing multiprocessors
http://dl-live.theiet.org/content/conferences/10.1049/cp_19971849
This paper presents a new parallel-in-space algorithm for power system transient stability simulations. The nonlinear differential equations are discretized by applying the trapezoidal rule and solved together with the nonlinear algebraic equations for each time step. A network partitioning scheme, which is based on the subdivision of the factorization path tree of the network matrix, is proposed to exploit the parallelism-in-space of the transient stability problem. The parallel version of the very dishonest Newton (VDHN) method, in which the parallel algorithm for solving large sparse network matrix equations is incorporated, is developed and tested on a distributed memory message passing multicomputer. Test results on a sample power system are presented to show the performance of the proposed algorithm.Fast implementation of discrete wavelet transform based on pipeline processor farming
http://dl-live.theiet.org/content/conferences/10.1049/cp_19970877
Efficient implementations of wavelet transforms have been derived, based on the FFT and short-length `fast-running FIR algorithms'. However, for long one-dimensional arrays or two dimensional data, such as encountered in image processing, the time required to calculate wavelet transforms, even in the case of `fast' FFT-based implementations, is still large. In order to reduce the time consumption of the wavelet transform and bring it closer to real-time implementation, this paper suggests the use of parallel processing based on the pipeline processor farm (PPF) methodology. The paper is mainly focussed on parallel implementation of the discrete wavelet transform (DWT), which is extensively used in image processing applications. The parallel environment in which the algorithms were implemented comprised two TMS320C40 boards with a total of six processors.Parallel implementation of the hybrid MoM/Green's function technique on a cluster of workstations
http://dl-live.theiet.org/content/conferences/10.1049/cp_19970234
In this paper a parallel version of the hybrid method of moments/Green's function technique is presented for the analysis of portable hand-held transceivers radiating close to the human head. As compared to other numerical techniques, this formulation leads to a drastic reduction in memory requirement and because of the parallel implementation gives also acceptable execution times. After a brief description of the theory and the parallel processing, an example is presented which demonstrates the very efficient parallelisation using the message passing interface standard even on an inhomogeneous cluster of workstations. (4 pages)An investigation of the heterogeneous mapping problem using genetic algorithms
http://dl-live.theiet.org/content/conferences/10.1049/cp_19960594
Mapping is the off-line allocation of the tasks that represent a parallelised algorithm across a multiprocessor architecture. In this paper the target architecture is heterogeneous, where a number of computationally disparate processors are integrated within a single network. This paper describes the development of several exploratory mapping algorithms that attempt to minimise the cycle-time of the application algorithms. A simple heuristic is appraised first, followed by an examination of a genetic algorithm (GA) approach. Subsequently, the GA is augmented with several specialised operators in an attempt to improve performance. Finally, a mechanism to adapt the operator probabilities based on their recent performance is introduced. Initially, the GA utilises a simple parallel architecture model. However, this leads to the embedding of the target hardware within the objective function to improve performance. Finally, the effectiveness of these approaches are examined and contrasted, with due consideration of what has been learnt about the nature of the heterogeneous mapping problem.A process-algebraic approach to the design of asynchronous (counterflow) pipelines
http://dl-live.theiet.org/content/conferences/10.1049/ic_19960252
In this paper we discuss the use of an approach in the design and verification of an asynchronous processor, the Sproull Counterflow Pipeline Processor Architecture (SCPP-A). As is often the case, the formal specification of the problem turns out to be the more difficult part. Once in the formal framework, we are in the realm of algebraic manipulations, where verification is a matter of proving theorems. We show how we break down such an asynchronous specification into a pipeline of stages interleaved with arbitration elements, the so-called Cops. Due to space limitations we confine ourselves to the topmost specifications, and refrain from decomposing the Stage and Cop any further. (6 pages)A transputer based, flexible, real-time control system for robotic manipulators
http://dl-live.theiet.org/content/conferences/10.1049/cp_19960531
This paper presents a new control system structure, which makes the realisation of adaptive, flexible real-time systems possible and outlines its implementation in case of robot control system using the latest T9000 transputer technology. The modules of the example system developed for robot control communicate to each other by message passing, which made the system development and its further modification much easier and faster.An approach in the parallelization of a boundary element algorithm on shared memory workstations
http://dl-live.theiet.org/content/conferences/10.1049/cp_19960151
With the increasing hardware capabilities, parallel computation of electromagnetic fields is no longer limited to supercomputers and computation centres. By using desktop multiprocessor workstations exploiting modern computer architecture and compilers accelerating numerical software the computation time for electromagnetic field problems can be significantly reduced. The paper reports the parallelization of an existing 3D-BEM-software on a shared memory architecture (multiprocessor workstation) using adapted algorithms.