Image, Video, and Multidimensional Signal Processing

TokenMotion: Motion-Guided Vision Transformer For Video Camouflaged Object Detection Via Learnable Token Selection

The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guided features using a learnable token selection. Evaluated on the challenging MoCA-Mask dataset, TMNet achieves state-of-the-art performance in VCOD.

ICASSP2024_ARL-ASU_Updated_April_9.pptx

ICASSP2024_ARL-ASU_Updated_April_9.pptx (6)

Categories:: Image, Video, and Multidimensional Signal Processing

16 Views

Adaptive Multi-Exposure Fusion for Enhanced Neural Radiance Fields

Read more about Adaptive Multi-Exposure Fusion for Enhanced Neural Radiance Fields
Log in to post comments

Neural Radiance Fields (NeRF) have revolutionized 3D scene modeling and rendering. However, their performance dips when handling images with diverse exposure levels, mainly due to the intricate luminance dynamics. Addressing this, we present an innovative method that proficiently models and renders images across a spectrum of exposure conditions. Our approach utilizes an unsupervised classifier-generator structure for HDR fusion, significantly enhancing NeRF's ability to comprehend and adjust to light variations, leading to the generation of images with appropriate brightness.

poster_icassp2024.pdf

poster_icassp2024.pdf (42)

Categories:: Image, Video, and Multidimensional Signal Processing

44 Views

Gravitated Latent Space Loss Generated by Metric Tensor for High-Dynamic Range Imaging

High Dynamic Range (HDR) imaging seeks to enhance image quality by combining multiple Low Dynamic Range (LDR) images captured at varying exposure levels. Traditional deep learning approaches often employ reconstruction loss, but this method can lead to ambiguities in feature space during training. To address this issue, we present a new loss function, termed Gravitated Latent Space (GLS) loss, that leverages a metric tensor to introduce a form of virtual gravity within the latent space. This feature helps the model in overcoming saddle points more effectively.

P240401HL_ICASSP2024.pdf

P240401HL_ICASSP2024.pdf (41)

Categories:: Image, Video, and Multidimensional Signal Processing

39 Views

DOMAIN-WISE INVARIANT LEARNING FOR PANOPTIC SCENE GRAPH GENERATION

Read more about DOMAIN-WISE INVARIANT LEARNING FOR PANOPTIC SCENE GRAPH GENERATION
Log in to post comments

Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates). However, the presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. This issue substantially impedes the practical utility and real-world applicability of PSG models.

DML_poster.pptx

DML_poster.pptx (43)

Categories:: Image, Video, and Multidimensional Signal Processing

37 Views

AEAM3D: ADVERSE ENVIRONMENT-ADAPTIVE MONOCULAR 3D OBJECT DETECTION VIA FEATURE EXTRACTION REGULARIZATION

3D object detection plays a crucial role in intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes while most of existing methods fail in these scenes. To address this issue, this paper proposes a monocular 3D detection model, termed AEAM3D, which effectively mitigates the degradation of detection performance in various harsh environments. Additionally, we assemble a new adverse 3D object detection dataset encompassing some challenging scenes, including rainy, foggy, and low light

poster_icassp2024.pdf

poster (61)

AEAM3D- ADVERSE ENVIRONMENT-ADAPTIVE MONOCULAR 3D OBJECT DETECTION VIA FEATURE EXTRACTION REGULARIZATION.pdf

paper (26)

Categories:: Image, Video, and Multidimensional Signal Processing

75 Views

M3SUM: A Novel Unsupervised Language-guided Video Summarization

Read more about M3SUM: A Novel Unsupervised Language-guided Video Summarization
Log in to post comments

Language-guided video summarization empowers users to use natural language queries to effortlessly summarize lengthy videos into concise and relevant summaries that cater specifically to their information needs, which is more friendly to access and digest. However, most of the previous works rely on tremendous (also expensive) annotated videos and complex designs to align different modals at the feature level.

icassp2024_m3sum.pdf

icassp2024_m3sum.pdf (15)

Categories:: Image, Video, and Multidimensional Signal Processing

54 Views

Style-Driven Multi-Resolution Human Motion Synthesis from Limited Data

Read more about Style-Driven Multi-Resolution Human Motion Synthesis from Limited Data
Log in to post comments

We present a generative model that learns to synthesize human motion from limited training sequences. In contrast to existing methods, our framework provides stylistic control across multiple temporal resolutions. The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture. Our framework contains a set generative and adversarial networks, along with style embedding modules, each tailored for generating motions at specific frame rates while exerting control over their style.

styles.zip

Example videos of style control. (32)

Categories:: Image, Video, and Multidimensional Signal Processing

19 Views

BMT-BENCH: A Benchmark Sports Dataset for Video Generation

Read more about BMT-BENCH: A Benchmark Sports Dataset for Video Generation
Log in to post comments

This is the supplementary materials for BMT-BENCH dataset for video generation. The material submission includes the links to the dataset and the baseline system

Supplementary Material BMT-BENCH .pdf

Supplementary Material BMT-BENCH .pdf (29)

Categories:: Image, Video, and Multidimensional Signal Processing

5 Views

Supplementary Material for A REAL-WORLD SATELLITE VIDEO SUBJECTIVE QOE DATABASE

Read more about Supplementary Material for A REAL-WORLD SATELLITE VIDEO SUBJECTIVE QOE DATABASE
Log in to post comments

The LIVE-Viasat Real-World Satellite QoE Database is an innovative and comprehensive resource designed to address the critical challenges faced by Internet Service Providers (ISPs), particularly in the domain of satellite streaming services.

supplementary_material.pdf

supplementary_material.pdf (29)

Categories:: Image, Video, and Multidimensional Signal Processing

8 Views

Supplementary Materials

Read more about Supplementary Materials
Log in to post comments

To evaluate the generalization of referring image segmentation (RIS) in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.

Supplementary_Materials.pdf

Supplementary_Materials.pdf (61)

Categories:: Image, Video, and Multidimensional Signal Processing

11 Views

Image, Video, and Multidimensional Signal Processing

Pages