Image, Video, and Multidimensional Signal Processing

Spatial Keyframe Extraction of Mobile Videos for Efficient Object Detection at the Edge

Advances in federated learning and edge computing advocate for deep learning models to run at edge devices for video analysis. However, the captured video frame rate is too high to be processed at the edge in real-time with a typical model such as CNN. Any approach to consecutively feed frames to the model compromises both the quality (by missing important frames) and the efficiency (by processing redundantly similar frames) of analysis.

ICIP2020.pdf

ICIP2020.pdf (198)

Categories:: Image, Video, and Multidimensional Signal Processing

14 Views

PerceptNet: A Human Visual System Inspired Neural Network for Estimating Perceptual Distance

Traditionally, the vision community has devised algorithms to estimate the distance between an original image and images that have been subject to perturbations. Inspiration was usually taken from the human visual perceptual system and how the system processes different perturbations in order to replicate to what extent it determines our ability to judge image quality. While recent works have presented deep neural networks trained to predict human perceptual quality, very few borrow any intuitions from the human visual system.

ICIP_2020_presentation.pdf

ICIP_2020_presentation.pdf (183)

Categories:: Image, Video, and Multidimensional Signal Processing

13 Views

Boundary of Distribution Support Generator (BDSG): Sample Generation on the Boundary

Read more about Boundary of Distribution Support Generator (BDSG): Sample Generation on the Boundary
Log in to post comments

Boundary of Distribution Support Generator (BDSG) Sample Generation on the Boundary.pdf

Boundary of Distribution Support Generator (BDSG) Sample Generation on the Boundary.pdf (393)

Categories:: Image, Video, and Multidimensional Signal Processing
Other

105 Views

Always Look on the Bright Side of the Field: Merging Pose and Contextual Data to Estimate Orientation of Soccer Players [Slides]

SlidesICIP.pdf

SlidesICIP.pdf (171)

Categories:: Image, Video, and Multidimensional Signal Processing

13 Views

Deep-URL

Read more about Deep-URL
Log in to post comments

The lack of interpretability in current deep learning models causes serious concerns as they are extensively used for various life-critical applications. Hence, it is of paramount importance to develop interpretable deep learning models. In this paper, we consider the problem of blind deconvolution and propose a novel model-aware deep architecture that allows for the recovery of both the blur kernel and the sharp image from the blurred image.

ICIP_2020_video_presentation.pptx

The final presentation for the video presented at ICIP 2020 (201)

Categories:: Image, Video, and Multidimensional Signal Processing

37 Views

MULTI IMAGE DEPTH FROM DEFOCUS NETWORK WITH BOUNDARY CUE FOR DUAL APERTURE CAMERA

Read more about MULTI IMAGE DEPTH FROM DEFOCUS NETWORK WITH BOUNDARY CUE FOR DUAL APERTURE CAMERA
Log in to post comments

In this paper, we estimate depth information using two defocused images from dual aperture camera. Recent advances in deep learning techniques have increased the accuracy of depth estimation. Besides, methods of using a defocused image in which an object is blurred according to a distance from a camera have been widely studied. We further improve the accuracy of the depth estimation by training the network using two images with different degrees of depth-of-field.

ICASSP_MIDFD_PPT.pdf

ICASSP_MIDFD_PPT.pdf (222)

Categories:: Image, Video, and Multidimensional Signal Processing

25 Views

IMPROVING THE PERFORMANCE OF TRANSFORMER BASED LOW RESOURCE SPEECH RECOGNITION FOR INDIAN LANGUAGES

The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e.,(i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors.

shetty.pdf

shetty.pdf (232)

Categories:: Image, Video, and Multidimensional Signal Processing

50 Views

INTERPRETABLE SELF-ATTENTION TEMPORAL REASONING FOR DRIVING BEHAVIOR UNDERSTANDING

Read more about INTERPRETABLE SELF-ATTENTION TEMPORAL REASONING FOR DRIVING BEHAVIOR UNDERSTANDING
Log in to post comments

Performing driving behaviors based on causal reasoning is essential to ensure driving safety. In this work, we investigated how state-of-the-art 3D Convolutional Neural Networks (CNNs) perform on classifying driving behaviors based on causal reasoning. We proposed a perturbation-based visual explanation method to inspect the models' performance visually. By examining the video attention saliency, we found that existing models could not precisely capture the causes (e.g., traffic light) of the specific action (e.g., stopping).