IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Gravitated Latent Space Loss Generated by Metric Tensor for High-Dynamic Range Imaging

High Dynamic Range (HDR) imaging seeks to enhance image quality by combining multiple Low Dynamic Range (LDR) images captured at varying exposure levels. Traditional deep learning approaches often employ reconstruction loss, but this method can lead to ambiguities in feature space during training. To address this issue, we present a new loss function, termed Gravitated Latent Space (GLS) loss, that leverages a metric tensor to introduce a form of virtual gravity within the latent space. This feature helps the model in overcoming saddle points more effectively.

P240401HL_ICASSP2024.pdf

P240401HL_ICASSP2024.pdf (42)

Categories:: Image, Video, and Multidimensional Signal Processing

39 Views

FAST PERSONALIZED TEXT TO IMAGE SYNTHESIS WITH ATTENTION INJECTION

Read more about FAST PERSONALIZED TEXT TO IMAGE SYNTHESIS WITH ATTENTION INJECTION
Log in to post comments

Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts. We propose an effective and fast approach that could balance the text-image consistency and identity consistency of the generated image and reference image. Our method can generate personalized images without any fine-tuning while maintaining the inherent text-to-image generation ability of diffusion models.

Fast_Personalized.pptx

Fast_Personalized.pptx (37)

Categories:: Other

32 Views

Physics-Guided Deep Scatter Estimation by Weakly Supervision for Quantitative SPECT

Read more about Physics-Guided Deep Scatter Estimation by Weakly Supervision for Quantitative SPECT
Log in to post comments

Accurate scatter estimation is important in quantitative SPECT for improving image contrast and accuracy. With a large number of photon histories, Monte-Carlo (MC) simulation can yield accurate scatter estimation, but is computationally expensive. Recent deep learning-based approaches can yield accurate scatter estimates quickly, yet full MC simulation is still required to generate scatter estimates as ground truth labels for all training data.

icassp2024_hkim_poster_v3_0327.pdf

icassp2024_hkim_poster_v3_0327.pdf (32)

Categories:: Medical imaging

31 Views

DOMAIN-WISE INVARIANT LEARNING FOR PANOPTIC SCENE GRAPH GENERATION

Read more about DOMAIN-WISE INVARIANT LEARNING FOR PANOPTIC SCENE GRAPH GENERATION
Log in to post comments

Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates). However, the presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. This issue substantially impedes the practical utility and real-world applicability of PSG models.

DML_poster.pptx

DML_poster.pptx (44)

Categories:: Image, Video, and Multidimensional Signal Processing

37 Views

A Novel Iterative Thresholding Algorithm for Arctangent Regularization Problem

Read more about A Novel Iterative Thresholding Algorithm for Arctangent Regularization Problem
Log in to post comments

In this work, we derive the proximity operator of an arctangent penalty, which is expressed using hyperbolic functions of sine and cosine. This penalty is then applied to sparse signal recovery, and an efficient arctangent regularization iterative thresholding (ARIT) algorithm is proposed, offering closed-form solutions for the subproblems associated with the arctangent penalty.

poster_ARIT.pdf

Poster (29)

Categories:: Signal and System Modeling, Representation and Estimation

17 Views

Video-Language Graph Convolutional Network for Human Action Recognition

Read more about Video-Language Graph Convolutional Network for Human Action Recognition
Log in to post comments

Transferring visual language models (VLMs) from the image domain to the video domain has recently yielded great success on human action recognition tasks. However, standard recognition paradigms overlook fine-grained action parsing knowledge that could enhance the recognition accuracy. In this paper, we propose a novel method that leverages both coarse-grained and fine-grained knowledge to recognize human actions in videos. Our method consists of a video-language graph convolutional network that integrates and fuses multi-modal knowledge in a progressive manner.

Video-Language Graph Convolutional Network for Human Action Recognition.pptx

presentation slides used in oral talks (25)

Categories:: Multimedia computing systems and applications

22 Views

Highlight removal network based on an improved dichromatic reflection model

Read more about Highlight removal network based on an improved dichromatic reflection model
Log in to post comments

State-of-the-art highlight removal methods still face the problems of color inconsistencies between highlight region and background, and content unreality in highlight areas.
To solve these two problems, we propose a novel adaptive highlight-aware network for specular highlight removal based on an improved dichromatic reflection model.
For color inconsistencies, we propose an adaptive highlight-aware (AHA) module to perceive the complete highlight information including the location and the scale of the specular highlight.

Poster-Highlight removal network based on an improved dichromatic reflection model.pdf

Poster-Highlight removal network based on an improved dichromatic reflection model.pdf (36)

Categories:: Image/Video Processing

30 Views

TRLS: A TIME SERIES REPRESENTATION LEARNING FRAMEWORK VIA SPECTROGRAM FOR MEDICAL SIGNAL PROCESSING

Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get more informative representations. We transform the input time-

icassp2024_xly_TRLS.pdf

icassp2024_xly_TRLS.pdf (47)

Categories:: Biomedical signal processing

82 Views

SMALL OBJECT DETECTION ON THE WATER SURFACE BASED ON RADAR AND CAMERA FUSION

Read more about SMALL OBJECT DETECTION ON THE WATER SURFACE BASED ON RADAR AND CAMERA FUSION
Log in to post comments

With the growing applications of water operations, water surface object detection tasks are facing new challenges. In this paper, we focus on improving the performance of water surface small object detection. Due to the limitations of single sensor in water environments, we propose RCFNet, a novel small object detection method based on radar-vision fusion. RCFNet fuses features captured by radar and camera in multiple stages to generate more effective target feature representations for small object detection on water surfaces.

Poster_ICASSP2024_9464_MMSP-P3.8.pdf

Poster_ICASSP2024_9464_MMSP-P3.8.pdf (20)

Categories:: Multimodal signal processing

45 Views

TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data Towards Effective Person-Job Fit

Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation. Recently, pretrained large language models have further enhanced the effectiveness by leveraging richer textual information in user profiles and job descriptions apart from user behavior features and job metadata. However, the general domain-oriented design struggles to capture the unique structural information within user profiles and job descriptions, leading to a loss of latent semantic correlations.

TAROT-poster.pdf

Poster for TAROT paper. (61)

Categories:: Other

50 Views

IEEE ICASSP 2024

Pages