New Publications are available for Video signal processing
http://dl-live.theiet.org
New Publications are available now online for this publication.
Please follow the links to view the publication.Epidural needle length measurement by video processing
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0446
This paper presents a novel image processing algorithm to measure the length and depth of an epidural needle during insertion. A wireless camera is used which transmits video during insertion to a host computer. The computer contains the image processing algorithm to detect the visible needle in the image and measures the length. The measurement is done by HSV background removal, colour comparison and using RGB histograms to locate 10mm markings on the Tuohy needle shaft. The visible length is then subtracted from the known length of the needle to calculate the depth of the needle tip. The camera can be placed in the operating theatre up to one meter away from the needle insertion site. The purpose of measuring needle depth in real time is to precisely place the needle in the epidural space. (6 pages)Inter-description motion vector redundancy control for scalable multiple description video coding
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0429
The problem of inter-description motion vector redundancy control in scalable multiple description video coding is addressed in this paper. In the motion compensated temporal filtering (MCTF) followed by the multiple description scalar quantization (MDSQ) of texture data approach, the same set of motion vectors are repeated in both descriptions. While this repetition of motion vector data adds extra overhead, they do not contribute to self correcting of errors as in MDSQed texture data during joint decoding. The paper proposes interleaving of motion vector fields between the descriptions and MDSQ of motion vectors to obtain two correlating motion vector fields rather than repeating them. The results show superior performance of MDSQ of motion vectors compared to other approaches. (5 pages)Multi-frame super resolution using edge directed interpolation and complex wavelet transform
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0447
In this paper, a multi frame super resolution technique is proposed which uses edge directional interpolation (EDI) and dual-tree complex wavelet transform (DT-CWT). In the proposed technique a super resolution process is applied for each frame to generate the low frequency component. On the other hand, high frequency components are generated by DTCWT decomposition followed by EDI. Finally, the composition of the generated subbands using inverse DTCWT (IDT-CWT) reconstructs the super resolved output frame. Experimental results on a number of benchmark video sequences with respect to their PSNR measures confirm the superiority of the suggested method over the state of the art video resolution enhancement methods. (5 pages)Surrey University Library for Forensic Analysis (SULFA) of video content
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0422
In this paper we propose SULFA (Surrey University Library for Forensic Analysis) for the benchmarking of video forensic techniques. This new video library has been designed and built for the purpose of video forensics specifically related to camera identification and integrity verification. As far as we know, no such library or similar currently exists in the community. SULFA contains original as well as forged video files, which will be freely available through the University of Surrey's website. There are approximately 150 videos collected from three camera sources, which are Canon SX220 (codec H.264) [1], Nikon S3000 (codec MJPEG) [2] and Fujifilm S2800HD (codec MJPEG) [3]. Each video is approximately 10 seconds long with resolution of 320×240 and 30 frames per second. All videos have been shot after carefully considering both temporal and spatial video characteristics. In order to present life-like scenarios, various complex and simple scenes have been shot with and without using camera support (tripod). Furthermore 9 original videos from each source in SULFA have been tested with Photo Response Non Uniformity (PRNU) based camera identification methods. Currently, SULFA also includes videos with cloning or copy-paste forgery. Each forged video includes full information of the doctored region. (5 pages)Person tracking via audio and video fusion
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0410
In this paper we present a joint audio-video (AV) tracker which can track the active source between two freely moving persons speaking in turn to simulate a meeting scenario, but less constrained. Our tracker differs from existing work in that it requires only a small number of sensors, works when speaker is not close to the sensors and relies on simple, yet efficient, inference techniques in AV processing. The system uses audio and video measures of the target position on the ground plane to strengthen the single modality predictions that would be weak if taken on their own as occlusions, clutter, reverberations and speech pauses happen in the test environment. In particular, the inter-microphone signal delays and the target image locations are input to single modality Bayesian filters, whose proposed likelihoods are multiplied in a Kalman Filter to give the joint AV final estimation. Despite the low complexity of the system, results show that the multi-modal tracker does not fail, tolerating video occlusion and intermittent speech (within 50 cm of accuracy) in the context of a non-meeting scenario. The system evaluation is done both on single modality than multi-modality tracking, and the performance improvement given by the AV fusion is discussed and quantified i.e 24 % improvement on the audio tracker accuracy. (6 pages)Adaptive GOP-length multiple representation coding for error-resilient video delivery
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0430
Multiple Representation Coding (MRC) is a novel scheme that can enable error-resilient video delivery over channels prone to burst or signal losses. In the MRC scheme, the source video is decomposed into multiple independently decodable representations. These multiple representations are then transmitted as a single video stream using a `GOP interleaving' (GOP: Group of Pictures) mechanism. The GOP interleaver disperses the multiple representations within the transmitted stream so that the spatio-temporally co-located segments of the sequence belonging to different representations are not simultaneously impaired by the same burst loss. When the transmitted bitstream is impaired by a burst loss spanning multiple frames, the MRC scheme can give a PSNR gain on the order of 2-4 dBs over the conventional full-size encoding and transmission of the video. Further, the error-robustness of the MRC scheme can be improved by increasing the length of the representation segments interleaved in the transmitted stream. In this paper, we propose adapting the GOP-length of individual representations in response to the expected length of the burst loss over the network. Simulation results demonstrate that the adaptive MRC scheme can give a PSNR gain of around 2 dB over the non-adaptive MRC scheme, in presence of long burst errors or signal loss intervals. (6 pages)Depth estimation from a video sequence with moving and deformable objects
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0425
In this paper we present an algorithm for depth estimation from a monocular video sequence containing moving and deformable objects. The method is based on a coded aperture system (i.e., a conventional camera with a mask placed on the main lens) and it takes a coded video as input to provide a sequence of dense depth maps as output. To deal with nonrigid deformations, our work builds on the state-of-the-art single-image depth estimation algorithm. Since single-image depth estimation is very ill-posed, we cast the reconstruction task as a regularized algorithm based on nonlocal-means filtering applied to both the spatial and temporal domain. Our assumption is that regions with similar texture in the same frame and in neighbouring frames are likely to belong to the same surface. Moreover, we show how to increase the computational efficiency of the method. The proposed algorithm has been successfully tested on challenging real scenarios. (6 pages)High precision and low power DCT architectures for image compression applications
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0460
The computation of two-dimensional Discrete Cosine Transform (2-D DCT) in image and video compression standards involves specific level of precision and high degree of complexity. This paper introduces two architectures, taking into consideration accuracy, power consumption and speed. The proposed architectures are implemented using the Xilinx system generator on the Virtex5 5vlx50tff1136-3 Xilinx platform and tested upon six standard images. The proposed architectures partition the input image into blocks of (8×8) pixels to compute 2-DDCT of each block sequentially. The results obtained revealed that the proposed architectures produced very good image quality, with 53 to 79 dB PSNR for the first standard image (Lena Image) and a word length of two and three bytes, respectively. The architectures are capable of operating up to 171 MHz at a word length of two bytes and the total memory used was 36 KB. In addition, the dynamic power consumption for first and second architecture are 60 and 38m W, respectively at 10 ns. (6 pages)Video analytics: past, present, and future
http://dl-live.theiet.org/content/conferences/10.1049/cp.2012.0403
Over the last quarter of a century or so a great deal of money and effort has been devoted developing video analytic solutions which until relatively recently has led to little deployment of these technologies. It could be argued that this has been rather disappointing but the availability of much greater computer power, realistic data sets, and potential customers having more confidence in deploying these technologies we will see the developers' efforts gaining more widespread use in the years ahead. (5 pages)A visual voice activity detection method with adaboosting
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0145
Spontaneous speech in videos capturing the speaker's mouth provides bimodal information. Exploiting the relationship between the audio and visual streams, we propose a new visual voice activity detection (VAD) algorithm, to over-come the vulnerability of conventional audio VAD techniques in the presence of background interference. First, a novel lip extraction algorithm combining rotational templates and prior shape constraints with active contours is introduced. The visual features are then obtained from the extracted lip region. Second, with the audio voice activity vector used in training, adaboosting is applied to the visual features, to generate a strong final voice activity classifier by boosting a set of weak classifiers. We have tested our lip extraction algorithm on the XM2VTS database (with higher resolution) and some video clips from YouTube (with lower resolution). The visual VAD was shown to offer low error rates. (5 pages)Scalable fusion using a 3D dual tree wavelet transform
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0172
This paper introduces a novel system that is able to fuse two or more sets of multimodal videos in the compressed domain. This is achieved without drift and produces an embedded bitstream that offers fine grain scalability. Previous attempts to fuse in the compressed video domain have been not been possible due to the complications of predictive loops within standard video encoding techniques. The compression system is based on an optimised spatio-temporal codec using the 3D Discrete Dual-tree Wavelet Trans- form (DDWT) together with bit plane encoding method SPIHT and a coefficient sparsification process (noise shaping). Together, these methods are able to efficiently encode a video sequence without the need for motion compensation due to the directional (in space and time) selectivity of the transform. This enables scalable compressed domain fusion without drift. This results in extremely flexible fusion scenarios in dynamic bandwidth environments where there are variable client receiving capabilities. (5 pages)Improving real time video surveillance performance using inter-frame retransmission
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0102
Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) are common transport layer protocols used in the IP based networks. However, both protocols have disadvantages for wireless video transmission. TCP yields high delay, while UDP experiences high packet loss. Lost packet retransmission is one of the solutions in video transmission to enhance video quality. This paper proposes inter-frame retransmission method to enhance the performance of video surveillance over WiMAX. The method outperforms the existing retransmission protocols. (5 pages)An object tracking in particle filtering and data association framework, using SIFT features
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0104
In this paper, we propose a novel approach for multi-object tracking for video surveillance with a single static camera using particle filtering and data association. The proposed method allows for real-time tracking and deals with the most important challenges: (1) selecting and tracking real objects of interest in noisy environments and (2) managing occlusion. We will consider tracker inputs from classic motion detection (based on background subtraction and clustering). Particle filtering has proven very successful for non-linear and non-Gaussian estimation problems. This article presents SIFT feature tracking in a particle filtering and data association framework. The performance of the proposed algorithm is evaluated on sequences from ETISEO, CAVIAR, PETS2001 and VS-PETS2003 datasets in order to show the improvements relative to the current state-of-the-art. (6 pages)GPGPU-accelerated visual search in large surveillance archives
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0103
Surveillance archives encompass vast amount of data. Given the amount of data the need for search and data exploration arises naturally. Various authorities such as infrastructure operators and law enforcement agencies are confronted with search needs based on a visual description (size, color, clothing, number plates, facial biometry, etc.) and/or behavioral patterns (limping, loitering, etc.) in order to find a ”needle in a haystack” of digital data. In this paper we present a framework which allows for an efficient video archive forensic search and data exploration in an interactive manner, and exploiting hardware accelerated video analytics at the same time. Furthermore we present a query concept to facilitate and improve the search for a specific person in large video surveillance archives using a synthetic human model in a query-by-example manner. The presented overall framework combines know-how on user interfaces, computer vision algorithms and video archive management. The system is designed with an open archive interface in mind enabling it to operate with CCTV (Closed Circuit Tele Vision) video archives from a wide variety of manufacturers. (6 pages)Pervasive monitoring: appreciating citizen's surveillance as digital evidence in legal proceedings
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0130
Images or video streams, extracted from data acquired through surveillance systems and intended to be used as evidence in court, should have all attributes of conventional digital evidence, meaning that they should be admissible, authentic, reliable, complete and believable. This paper discusses the first three attributes that surveillance systems should comply with to be submitted as evidence in legal proceedings and it identifies some of the obstacles in the way through harmonization. The focus is on data gathered from a range of ad hoc sources present at the scene of an incident, including smartphones and wireless sensor networks (used for safety, security or traffic management/environmental monitoring). New scenarios for crowd-sourced surveillance mediated by law enforcement supervision are further considered. Specific attention is brought to the compliance with privacy requirements that often condition the admissibility of the evidence. (6 pages)Efficient 3D face reconstruction from low quality video
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0117
3D shape information is crucial in many video analytic applications, such as face recognition and expression analysis. However, most commercial 3D modeling systems rely on dedicated equipment, which lacks operational flexibility. We present an efficient approach to reconstruct 3D face from low quality video, concentrated on recovering the depth information lost in imaging process. There are two novelties in the proposed method. First, depth error is explicitly estimated, which ensures a fully linear shape recovery process. Second, the shape is adjusted locally using local feature analysis (LFA) model, which effectively alleviates the model dominance problem. A prototype system is established based on the proposed approach, and evaluated on a publicly available database. Experimental results show that, compared with state-of-the-art approaches, our method increase accuracy of estimated shape in an efficient way. (6 pages)Detecting object in the dynamic background from the noisy image in visual surveillance
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0062
Detecting an object from a dynamic background is a challenging process m computer vision and pattern matching research. The proposed algorithm identifies moving objects from the sequence of video frames which contains dynamically changing backgrounds in the noisy environment. In connection with our previous work, here we have proposed a methodology to perform background subtraction and modernized from moving vehicles in traffic video sequences that combines statistical assumptions of moving objects using the previous frames in the dynamically varying noisy situation. For that, a binary moving objects hypothesis mask is constructed. Then, Kalman filter is utilized for the amalgamation of current background. Shadow and noise removal algorithms are proposed to operate at the lattice which identifies object-level elements. The results of post-processing can be used to detect object more efficiently. Experimental results and comparisons using real data demonstrate the pre-eminence of the proposed approach.Real-time active visual tracking with level sets
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0122
This paper presents a new real-time active visual tracker which improves standard mean shift tracking by using level sets to extract contours from the target. We use colour and the disparity map computed from a stereo camera pair which prove to be powerful features for tracking in an indoor surveillance scenario. To combine the features in the level sets process, we enhance Chen's et al appearance model of [5] by using a probabilistic model determined via Expectation-Maximization (EM) clustering. The level set result is used as the weighting kernel which improves the accuracy of the similarity measurement in the mean shift method. Finally a Kalman filter deals with complete occlusions. (6 pages)Hooligan detection: the effects of saliency and expert knowledge
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0131
We investigated differences in visual search of dangerous events between security experts and naive observers during the observation of large scenes, typically encountered on the grandstand of stadiums during soccer matches. Our main technical objective was the reduction of computational effort required for the detection and recognition of such events. To overcome the scarcity and legal issues associated with real footage, we designed a new algorithm for the synthesis of crowd scenes with well-controlled statistical properties. We characterize the relative importance of saliency and expert knowledge for the generation of correct detections and the visual search strategies for both security experts and naive observers. We found that during the first few seconds of this search task, experts and naive observers look at the scenes in a similar fashion, but experts see more. We compare the results with theoretical models for saliency and event classification. We show that the recognition model can deliver reasonable classification/detection performance even when operating under real-time constraints. When real-time operation is not a concern, performance can be improved further by allowing the model to grow. (6 pages)Hostile intent and behaviour detection in elevators
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0115
We propose a visual surveillance based person-to-person hostile intent and behavior detection method in elevators. The view of an elevator by a surveillance camera is typically of a small confined space with abrupt changes in illumination due to opening and closing of the elevator door. We extract three levels of features in a sequential process for the violent event detection. First, as low-level features, foreground blobs are segmented from the background and their motion velocity vectors are extracted by an optical flow method. Second, as a mid-level feature, the number of people inside the elevator is estimated by considering the number and sizes of the segmented blobs. As the other mid-level features, the velocity magnitudes and directions are computed by image based motion analyses. A person-to-person violence can only occur when there is more than one person in the elevator. As the key classifying feature, we consider the average velocity magnitude and direction of each blob. A sequence of image frames are determined to contain a violent event if an average velocity magnitude of any segmented blob exceeds a threshold along with its associated direction not being dominant in one direction. The experimental results demonstrate that the proposed method functions effectively with a computational efficiency sufficient for real-time processing. (6 pages)Neural network based approach for MPEG video traffic prediction
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0041
In the near future, video is going to be the major Internet traffic and the most popular standard used to transport and view video is MPEG. The MPEG traffic is VBR (variable bit rate) traffic & in the form of a time-series representing frame/VOP (video object planes) sizes. Video traffic prediction and modeling is important in enhancing the reliable operation over these networks. In this paper, the MPEG-4 VBR video traffic is predicted by ANN (Artificial Neural Network). The arm is to predict the future frame of video stream. In single frame prediction problem, the information of previous frame sizes is used to predict the next frame size of the sequence. As a tool for the prediction, we use neural network - multilayer perception feed forward neural network. (FMLP). The prediction results of neural network have compared with the traditional averaging method. The results show that the neural approach is best as compared to averaging approach.Object classification based on behaviour patterns
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0112
With the recent explosion of surveillance videos, media management has gained n increasing popularity. Addressing this challenge, in this paper, we propose a Surveillance Media Management framework for object detection and classification based on behaviour patterns. The objectives of the paper are: (i) demostrating the discriminative power of behaviour features for object recognition and classification, (ii) proposing a behavioural fuzzy classifier which progressively discriminate objects by including different degrees of uncertainty in the classification process and (iii) presenting a Surveillance Media Management system to extract semantic media information and provide unsupervised object classification from raw surveillance videos. The performance of the proposed system has been thoroughly evaluated on AVSS 2007 surveillance dataset and as the results indicate the proposed technique enhances object classification performance. (6 pages)Iterative active querying for surveillance data retrieval in crime detection and forensics
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0133
Large sets of visual data are now available both, in real time and off line, at time of investigation in multimedia forensics, however passive querying systems often encounter difficulties in retrieving significant results. In this paper we propose an iterative active querying system for video surveillance and forensic applications based on the continuous interaction between the user and the system. The positive and negative user feedbacks are exploited as the input of a graph based transductive procedure for iteratively refining the initial query results. Experiments are shown using people trajectories and people appearance as distance metrics. (6 pages)Estimation of 3D head region using gait motion for surveillance video
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0105
Detecting and recognizing people is important in surveillance. Many detection approaches use local information, such as pattern and colour, which can lead to constraints on application such as changes in illumination, low resolution, and camera view point. In this paper we propose a novel method for estimating the 3D head region based on analysing the gait motion derived from the video provided by a single camera. Generally, when a person walks there is known head movement in the vertical direction, regardless of the walking direction. Using this characteristic the gait period is detected using wavelet decomposition and the heel strike position is calculated in 3D space. Then, a 3D gait trajectory model is constructed by non-linear optimization. We evaluate our new approach using the CAVIAR database and show that we can indeed determine the head region to good effect. The contributions of this research include the first use of detecting a face region by using human gait and which has fewer application constraints than many previous approaches. (6 pages)Consistent quality control for wireless video surveillance using distributed video coding
http://dl-live.theiet.org/content/conferences/10.1049/ic.2011.0116
Distributed Video Coding (DVC) is well known for low complexity encoding which provides coding solutions for a wide range of applications, in particular wireless video surveillance. In this paper, we address the problem of distortion variation introduced by typical rate control algorithms, especially in a various bit rate environment. A distortion quantization model derived from a MPEG-2 distortion estimation model is proposed to achieve smooth picture quality across video frames. Simulation results show that the proposed quality control algorithm is capable to meet user defined target distortion and maintain a rather small variation for sequence with slow motion and performs similar to fixed quantization for fast motion sequence at the cost of some RD performance. (6 pages)An improved method using kinematic features for action recognition
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0766
Human action recognition is a challenge problem in computer vision. In this paper, we propose an improved approach using kinematic features for action recognition. In this approach, we find the area that relates to action by a simple method, and select eight discriminative features derived from optical flow field to describe the dynamics of the field. The covariance matrix of the feature vectors is used to fuse the features and to serve as the feature descriptor. Multi-class SVM classifiers are then employed for action classification. Experiments are carried out on public datasets. We obtain a recognition rate of 97.66% SEG-ACA and 98.2% SEQ-ACA on KTH dataset, and 98.89% SEQ-ACA and 93.83% SEG-ACA on WEIZMANN dataset with leave-one-out test.Moderate prefetching strategy based on video slicing mechanism for P2P VoD streaming system
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.1006
In peer-to-peer video-on-demand (VoD) streaming systems, each peer contributes a fixed amount of hard disk storage (usually 2GB) to store viewed videos and then uploads them to requesting peers. However, the daily hits (namely popularity) of different segments of the same video is highly diverse, which means that taking the whole video as the basic storage unit may lead to redundancy of unpopular segment replicas and scarcity of popular segment replicas in the P2P network. To address this issue, we propose a video slicing mechanism (VSM) where the whole video is sliced into small blocks (20 MB, for instance). Under VSM, peers can moderately remove unpopular blocks from and accordingly add popular blocks into their contributed hard disk storage, which increases the usage of peers' storage space. To reasonably assign bandwidth among peers with different download capacity, we propose a moderate prefetching strategy (MPS) based on VSM. Under MPS, when the amount of prefetched content reaches the predefined threshold, peers immediately stop prefetching content and then release bandwidth for other peers. We apply the MPS to PPLive VoD system and measurement results demonstrate that low server load and perfect user satisfaction can be achieved.The research of foggy blurring image restoration
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0775
In this paper the restoring technique of foggy gray blurring image and foggy color blurring image are mainly studied. Global histogram equalization method and local histogram equalization method are all used to restore foggy blurring gray image, hue saturation adjustment method and Retinex method are all used for restoring foggy blurring color image. The experimental results show that the optimal restoring algorithm is different for different image because of the difference in light, fog seriousness degree, ratio between bright and dark areas, and noise. Therefore, it is needed to select right algorithm to restore foggy blurring image with different degradation reason.A new algorithm for fast block-matching motion estimation based on tree-structured block partition
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0996
The fast block-matching motion estimation has been widely used in some low bit-rate video compression applications, owing to its simplicity and effectiveness. How ever, there are several problems with it, which are mainly caused by the employment of fixed block size. In this paper, a new algorithm for fast block- matching motion estimation based on tree-structured block partition is proposed which applies the variable block size (VBS) techniques in the block search. Simulation results show that, the algorithm block- searching efficiency as well as to improve the precision of block-searching.Video search algorithm using the motion vectors map
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.1011
A use of 3D motion vector map for a video story search is provided herein. In a method of a video story search, parsing processes are respectively performed on data stream of an input search video file and a plurality of video files dedicated to be searched, and then a plurality of motion vectors are obtained accordingly. A plurality of corresponding motor vector maps are generated in time domain. A correlation result is generated according to the motion vector map of the input search video file and the motion vector maps of the plurality of video files, and then a search result is obtained based on the correlation result.A novel 8×8 transform method applied in video coding
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0846
Transform Coding has been playing an important role in video coding and increasingly becomes a research focus especially in the current popular standards such as H.264/AVC, AVS and HEVC. It is important to select an excellent transform method as transform module has a direct impact on the efficiency of video codec. This paper proposes a new 8×8 transform method as well as its integer approximation applied in video coding. Experiments show that it achieves a higher performance.Research on underground wireless video encoding algorithm based on region of interest for interprediction
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0992
The low illumination and odious environment in the coal mine underground have greatly affected the quality of video image. Aimed at the specialty that the background is unchanged basically at coal mine underground wireless video monitoring point with foreground objects moving only, the inter-prediction algorithm based on region of interest is brought forward in the article. By simulating, the algorithm could greatly reduce the complexity of video encoding and encoding rate, while the quality of video basically keeping unchanged or even enhanced.An interpolation motion compensation method for video sequence
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0274
The video processing technology is becoming the indispensable bridge between the person and the information. Typically, a video processing pipeline consists of video signal demodulation, video decoding and video post processing. In order to adapt to the kinds of modern video formats, it is necessary that the additional frames are interpolated based on the original video sequence. The movement should be to accurately estimate so that the proper motion compensation of the frames can do in video. The importance is well considered between the operation speed and the accuracy of the algorithm. The paper introduces a bilinear interpolation algorithm, along the movement direction, that selects the proper pixels and normalizes. Taking advantage of the common video library, lots of experiments are carried out and compared to the common interpolation algorithm. It is verified that the proposed algorithm may improve the performance of motion compensation. (6 pages)The effects of replication on the QoS in P2P VoD systems
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.1004
Replication strategy is one of the central design issues in P2P VoD systems. Given the distributed cache spaces of all peers, it aims to address how to utilize them efficiently. In this paper, we concentrate on the essential question for designing: how replication affects the QoS of requests. Unlike the previous works, in order to close to the practice, three types of replicas are differentiated: complete replicas, incomplete replicas and the replicas of viewers in the same channel. The effects of these types are quantified respectively via modeling and simulating. The model of bandwidth competition is presented and it is observed that the QoS of video is not only determined by the Availability to Demand ratio (ATD) of itself, but also influenced by the ATD of the video saved in the same peer. Both theoretical analysis and experiments show that when the viewer size is large enough, the upload capacity of incomplete replicas is equal to that of complete replicas. We argue that proportion replication is not optimal and more replicas should be assigned to the unpopular videos.Complexity analysis of algorithm/architecture co-design for H.264 deblocking filter in MPSoC design
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0843
Due to the high computational and power consumption demands of modern embedded visual media processing, MPSoC architectures often contain multiple heterogeneous processing elements, which introduced numerous problems involving the mapping of algorithm functionality and possible refinements of each processing element. To cope with these challenges, algorithm and architecture co-design (AAC) is significant for characterizing the algorithmic complexity used to optimize targeted architecture. This paper proposes the efforts towards a systematic complexity analysis for AAC by timed mode of computing (MoC)-based analysis. Through such seamless approach, complexity measures intrinsic to the algorithm, such as degree of parallelism and pipeline depth, can be fully exploited. Furthermore, the resulting explicit architecture/algorithm friendliness will greatly help the mapping of algorithm to differential processing elements. As an example, an ideal architecture prototype of deblocking filter for H.264 was proposed, which can achieve 144 cycles/macroblock throughput, and also several deblocking filter designs in the literature have been compared and illustrated to demonstrate the benefits of analysis and exploitation of complexity measures with our approach in the early system-level design.GPU-based background generation method
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0859
Video processing is wildly used in traffic detection, and it needs a lot of process time of CPU. GPU computing is deeply parallelized and the performance of float-point computing is highly-efficient comparing with the traditional GPU. GPUs are also in cheap price. More and more video processing algorithms are going to add GPU support. This paper describes an effective image background generation algorithm and its GPU implementation. The performance of GPU-based background extraction method is presented. And some issues about its application are discussed as well.A new motion estimation algorithm based on space relation for frame rate up conversion
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0841
In this paper, a novel motion estimation and compensation algorithm for frame rate up conversion is presented. Compared to conventional motion estimation methods, motion vectors are processed by different approaches for static region or internal object region and moving boundary of video stream in this proposed algorithm. In the static or internal object region, motion vectors are decided according to the interaction principle between adjacent blocks and easily obtained by the adjacent motion vectors; while for the moving boundary, motion vectors are firstly calculated according to the adjacent blocks, and then the algorithm detects the movement of block in the region by SAD values, finally, the motion vectors of moving blocks are refined and corrected.ROI-based MB-level adaptive frequency weighting
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0991
In order to improve the subjective quality of video coding, satisfy the properties of human visual, and reduce the bitrate of coding, a ROI-based MB-Level Adaptive Frequency Weighting Scheme is proposed in this paper. In the perceptual video coding scheme, the visual attention and the frequency sensitivity are the most important properties of HVS and usually used to improve the subjective quality. In this paper, the spatial context, the side information and the properties of the human visual system are all taken into consideration. According to the effect of frequency weighting, three different strategies are defined, and the different areas in one picture can choose different frequency weighting strategies. The experimental results show that the proposed region of interest based MB-Level Adaptive Frequency Weighting algorithm (ROI-baesd MBAFW) can improve the subjective quality significantly. Additionally, compared with the no frequency weighting ROI-based MBAFW algorithm can achieve about 10% bitrate reductions with almost the same subjective quality.Optimal quantization for DC coefficient of Wyner-Ziv frame in unidirectional distributed video coding
http://dl-live.theiet.org/content/conferences/10.1049/cp.2011.0839
This paper presents a rate-distortion based optimal quantization scheme for DC coefficient in the unidirectional Wyner-Ziv (WZ) video coding system. In the proposed scheme, a new rate-distortion (RD) model is developed to find out an optimal quantization step (OQS) for DC coefficient. More specifically, the effects of quantization step on both distortion and rate are considered in our presented RD model. And OQS, optimal in term of RD cost, can be solved for DC coefficient, at the expense of increasing the coding complexity slightly. The comparison of RD performance between the Wyner-Ziv video coding system using our proposed scheme and the baseline system without using OQS is presented. The results show that quantization with OQS for DC coefficient can improve the average PSNR of WZ frames by 0.3~1.0 dB.Standard Codecs: Image compression to advanced video coding
http://dl-live.theiet.org/content/books/te/pbte054e
<p xmlns="http://pub2web.metastore.ingenta.com/ns/">A substantially updated edition of <i>Video Coding: An introduction to standard codecs</i> (IEE 1999, winner of IEE Rayleigh Award as the best book of 2000), this book discusses the growth of digital television technology, from image compression to advanced video coding. This third edition also includes the latest developments on H.264/MPEG-4 video coding and the scalability defined for this codec, which were not available at the time of the previous edition (IEE 2003). The book highlights the need for standardisation in processing static and moving images and extensively exploits the ITU and ISO/IEC standards defined in this field. The book gives an authoritative explanation of pictures and video coding algorithms, working from basic principles through to the advanced video compression systems now being developed. It discusses the reasons behind the introduction of a standard codec for a specific application and its chosen parameters. Each chapter is devoted to a standard video codec, and chapters are introduced in an evolutionary manner complementing the earlier chapters. This book will enable readers to appreciate the fundamentals needed to design a video codec for any given application and should prove a valuable resource for managers, engineers and researchers working in this field.</p>A robust technique for person-background segmentation in video sequences based on the codebook method of background subtraction and head tracking
http://dl-live.theiet.org/content/conferences/10.1049/ic.2010.0234
In this paper, we introduce a new method to reliably ex- tract humans from a video sequence even when the humans are static for long periods of time. The proposed method addresses a common problem in background subtraction techniques whereby humans that are static are mistaken for new additions to the background scene and are consequently absorbed into the background model. In the proposed method, codebook background subtraction is used to identify foreground regions in the video frame. A motion based particle filter is then used to track one or more human heads in the frame and determine which of these foreground regions represent people. The background model is then selectively updated given this knowledge thus ensuring that people will never be absorbed into the background model once detected, even when indefinitely static. Simulation results confirm that a human body is robustly extracted using this method in a non-static environment. (5 pages)Unsupervised feature based abnormality detection
http://dl-live.theiet.org/content/conferences/10.1049/ic.2010.0235
In recent years, there has been an increasing focus on detecting anomalous events in surveillance applications. In this paper, we present an unsupervised feature-based abnormality detection algorithm suited for online video surveillance applications. The features used in our method include trajectories, object sizes, and velocities. Unlike the traditional trajectory-based abnormality detection, we consider both the trajectory-based information and region-based information. In our algorithm, the trajectories are clustered using Principal Component Analysis (PCA), providing the ability to choose the optimal number of clusters. Different trajectory clusters are modelled as a chain of Gaussians and new tracks are matched with the cluster models to detect abnormalities. In addition, a novel region-based method is proposed and can be combined with trajectory-based detection. The proposed method has the advantage of detecting abnormal events that cannot be detected by trajectory-based algorithms alone. The results show improved detection compared with traditional trajectory-based methods. (5 pages)Reliable moving object extraction and counting
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0601
This work adopts background subtraction, modified edge-following, connected component to develop automatic extraction of moving objects in a video sequence. Additionally, the number of moving objects at an interval can be effectively counted. In the proposed method, pixels at a specific position of successive image frames are first processed by the modified iterative threshold selection technique to determine the background gray-level value. Pixels at all positions employ such an iterative technique to establish the background. Second, an original image is subtracted by this background to obtain a difference image that is added with the differential image between an original image and its precedent neighboring image to yield an image with many edge points of moving objects. Third, the robust edge-following scheme manipulates this edge-point image to produce the closed-form objects that are then conducted by the morphological operation to yield complete objects. The practical implementation reveals that the proposed method can precisely and reliably estimate the traffic amount.An improved Db2-based MCTF for scalable video coding
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0680
MCTF (Motion Compensated Temporal Filtering) was firstly introduced in 3-D transform coding and has been used in several scalable coding proposals. The quality of the MCTF which plays an essential role in motion compensated 3-D subband/wavelet coding. The simplest implementation of MCTF in a lifting filter structure utilizes Haar filters, and longer filters such as 5/3 is also utilized extensively. In this paper, an improved db-2 based MCTF is proposed. The lowpass frames obtained by this db2-based MCTF whose PSNR is higher than haar-based MCTF keep more approximate information of original frames.Automatic road feature detection and correlation for the correction of consumer satellite navigation system mapping
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0397
This paper presents a novel approach for the use of on-vehicle video analysis aimed at the verification and correction of consumer satellite navigation system mapping information. The proposed system automatically detects road and environment features (e.g. flyover bridges, road junctions, traffic lights and road signs) for real-time comparison to information available from corresponding navigation mapping. This can be used both for secondary feature-based localization of vehicle position and the verification of roadway mapping information against the true environment. (9 pages)Adaptive color image style transformation for video
http://dl-live.theiet.org/content/conferences/10.1049/cp.2010.0539
This paper presents a simple and fast color style transfer algorithm for video. Based on this proposed algorithm, users can simply select two reference images during the video conversion to complete the color transfer process. Generally speaking, video is a compilation of related images, thus we first split the video into an image sequence, and then perform one-by-one color transfer for each video frame. In order to speed up the color characteristic parameters processing of the reference image and target video, the algorithm exploits the laβ color space for transfer, since it has a lower color channel correlation. The resulting video transfer produces a digital special effects scene with two color-style transformed reference images. Lastly, the produced results are analyzed. The changes in each frame are minute, and are analyzed by the human eye for its video structure. The assessment illustrates good visual effects are achieved, suggesting this technique may be applied to special effects generation in videos.A real-time system for abnormal path detection
http://dl-live.theiet.org/content/conferences/10.1049/ic.2009.0251
This paper proposes a real-time system capable to extract and model object trajectories from a multi-camera setup with the aim of identifying abnormal paths. The trajectories are modeled as a sequence of positional distributions (2D Gaussians) and clustered in the training phase by exploiting an innovative distance measure based on a global alignment technique and Bhattacharyya distance between Gaussians. An on-line classification procedure is proposed in order to on-the-fly classify new trajectories into either "normal" or "abnormal" (in the sense of rarely seen before, thus unusual and potentially interesting). Experiments on a real scenario will be presented. (6 pages)Online evaluation of tracking algorithm performance
http://dl-live.theiet.org/content/conferences/10.1049/ic.2009.0266
This paper presents a method to evaluate online the performance of tracking algorithms in surveillance videos. We use a set of features to compute the confidence of trajectories and also the precision of tracking results. A global score is computed online based on these features and is used to estimate the performance of tracking algorithms. The method has been tested with two real video sequences and two tracking algorithms. The similar variations between the results obtained by the proposed method and the output of a supervised evaluation tool using ground truth data have showed the performance of our global score. The advantages of our approach over the existing state of the art approaches are: (i) few a priori knowledge information is required, (ii) the method can be applied in complex scenes containing several mobile objects and (iii) we can simultaneously compare the performance of different tracking algorithms. (6 pages)A real time solution for face logging
http://dl-live.theiet.org/content/conferences/10.1049/ic.2009.0238
An intrusion logger is a video surveillance application designed to detect intrusion events and document them by storing times-tamped images in a log. Face loggers, in particular, are focused in grabbing imagery of intruders face, and are typically used to provide inputs to a face recognition system. Commonly, intrusion loggers are supposed to signal an alarm condition as the intrusion occurs, and are therefore required to run in real time and to work continuously for long time periods. In this paper we present a face logging solution capable of detecting and tracking several targets in real time, grabbing face images and evaluating their quality in order to store only the best for each detected target. An evaluation of the performance on a specifically designed test set is provided. (6 pages)Event recognition with PTZ cameras
http://dl-live.theiet.org/content/conferences/10.1049/ic.2009.0262
In this paper, a framework for explicit complex event recognition is proposed. The system extracts relevant features from video streams coming from PTZ cameras in order to detect, classify, and track moving objects in the scene. This information is then processed in order to recognize both simple events (instantaneous events involving a single object) and complex events. In particular, we propose a method for explicitly describing and matching complex events in terms of simpler elements (e.g. simple events, information on object's state, etc.). Experimental results prove the validity of the proposed event recognition technique. (6 pages)