Machine Learning for Signal Processing

USEE: UNIFIED SPEECH ENHANCEMENT AND EDITING WITH CONDITIONAL DIFFUSION MODELS

Read more about USEE: UNIFIED SPEECH ENHANCEMENT AND EDITING WITH CONDITIONAL DIFFUSION MODELS
Log in to post comments

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner.

uSee_Poster.pdf

uSee_Poster.pdf (15)

Categories:: Machine Learning for Signal Processing

8 Views

CROSS-LINGUAL LEARNING IN MULTILINGUAL SCENE TEXT RECOGNITION

Read more about CROSS-LINGUAL LEARNING IN MULTILINGUAL SCENE TEXT RECOGNITION
Log in to post comments

In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to multilingual STR: (1) Joint learning with high- and low-resource languages may reduce performance on low-resource languages, and (2) CLL works best between typologically similar languages.

ICASSP2024 poster.pdf

ICASSP2024 poster.pdf (19)

Categories:: Machine Learning for Signal Processing
Image/Video Processing

18 Views

Identifying Attack-Specific Signatures in Adversarial Examples

Read more about Identifying Attack-Specific Signatures in Adversarial Examples
Log in to post comments

The adversarial attack literature contains numerous algorithms for crafting perturbations which manipulate neural network predictions. Many of these adversarial attacks optimize inputs with the same constraints and have similar downstream impact on the models they attack. In this work, we first show how to reconstruct an adversarial perturbation, namely the difference between an adversarial example and the original natural image, from an adversarial example. Then, we classify reconstructed adversarial perturbations based on the algorithm that generated them.

poster_RED_ICASSP_2024.pdf

poster_RED_ICASSP_2024.pdf (14)

Categories:: Machine Learning for Signal Processing

35 Views

A Robust Quantile Huber Loss with Interpretable Parameter Adjustment in Distributional Reinforcement Learning

Distributional Reinforcement Learning (RL) estimates return distribution mainly by learning quantile values via minimizing the quantile Huber loss function, entailing a threshold parameter often selected heuristically or via hyperparameter search, which may not generalize well and can be suboptimal. This paper introduces a generalized quantile Huber loss function derived from Wasserstein distance (WD) calculation between Gaussian distributions, capturing noise in predicted (current) and target (Bellmanupdated) quantile values.

2401.02325v2.pdf

2401.02325v2.pdf (26)

Categories:: Machine Learning for Signal Processing

29 Views

RD-COST REGRESSION SPEED UP TECHNIQUE FOR VVC INTRA BLOCK PARTITIONING

Read more about RD-COST REGRESSION SPEED UP TECHNIQUE FOR VVC INTRA BLOCK PARTITIONING
Log in to post comments

The last standard Versatile Video Codec (VVC) aims to improve the compression efficiency by saving around 50% of bitrate at the same quality compared to its predecessor High Efficiency Video Codec (HEVC). However, this comes with higher encoding complexity mainly due to a much larger number of block splits to be tested on the encoder side.

RD_Cost_Regression_Speed_Up_Technique_For_vvc_intra_block_Partitioning (4).pdf

RD_Cost_Regression_Speed_Up_Technique_For_vvc_intra_block_Partitioning (4).pdf (21)

Categories:: Image/Video Coding
Machine Learning for Signal Processing

21 Views

Exploiting spatial attention mechanism for improved depth completion and feature fusion in novel view synthesis

Many image-based rendering (IBR) methods rely on depth estimates obtained from structured light or time-of-flight depth sensors to synthesize novel views from sparse camera networks. However, these estimates often contain missing or noisy regions, resulting in an incorrect mapping between source and target views. This situation makes the fusion process more challenging, as the visual information is misaligned, inconsistent, or missing.

Ban pdf.pdf

Ban pdf.pdf (30)

Categories:: Machine Learning for Signal Processing

19 Views

Prompting Label Efficiency in Federated Graph Learning via Personalized Semi-supervision

Federated graph learning (FGL) enables the collaborative training of graph neural networks (GNNs) in a distributed manner. A critical challenge in FGL is label deficiency, which becomes more intricate due to non-IID decentralized data. Existing methods have focused on extracting knowledge from abundant unlabeled data, leaving few-shot labeled data unexplored. To this end, we propose ConFGL, a novel FGL framework to enhance label efficiency in federated learning with non-IID subgraphs.

ICASSP-poster.pdf

ICASSP-poster.pdf (106)

Categories:: Machine Learning for Signal Processing

8 Views

UNRAVELING EXPLAINABLE REINFORCEMENT LEARNING USING BEHAVIOR TREE STRUCTURES

Read more about UNRAVELING EXPLAINABLE REINFORCEMENT LEARNING USING BEHAVIOR TREE STRUCTURES
Log in to post comments

The black-box characteristic of deep reinforcement learning restricts the safe and scalable application of decision models in practical deployment. Existing interpretability methods for deep reinforcement learning models are often inadequate in providing comprehensive insights and generating logical sequential decisions.
In this study, we propose an innovative framework called XRLBT, which introduces the behavior tree structure to explainable reinforcement learning.

poster (1).pdf

poster (1).pdf (31)

Categories:: Machine Learning for Signal Processing

20 Views

Fractional Fourier Transform in Time Series Prediction

Read more about Fractional Fourier Transform in Time Series Prediction
Log in to post comments

Several signal processing tools are integrated into machine learning models for performance and computational cost improvements. Fourier transform (FT) and its variants, which are powerful tools for spectral analysis, are employed in the prediction of univariate time series by converting them to sequences in the spectral domain to be processed further by recurrent neural networks (RNNs). This approach increases the prediction performance and reduces training time compared to conventional methods.

ICASSP2024_Poster.pdf

ICASSP2024_Poster.pdf (38)

Categories:: Machine Learning for Signal Processing

40 Views

Unsupervised Continual Learning of Image Representation via Rememory-based SimSiam

Read more about Unsupervised Continual Learning of Image Representation via Rememory-based SimSiam
3 comments
Log in to post comments

Unsupervised continual learning (UCL) of image representation has garnered attention due to practical need. However, recent UCL methods focus on mitigating the catastrophic forgetting with a replay buffer (i.e., rehearsal-based strategy), which needs much extra storage. To overcome this drawback, we propose a novel rememory-based SimSiam (RM-SimSiam) method to reduce the dependency on replay buffer. The core idea of RM-SimSiam is to store and remember the old knowledge with a data-free historical module instead of replay buffer.

1115_Lecture_FuFeifei.pptx

1115_Lecture_FuFeifei.pptx (17)

Categories:: Machine Learning for Signal Processing

17 Views

Machine Learning for Signal Processing

Pages