IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

DISTRIBUTED STOCHASTIC CONTEXTUAL BANDITS FOR PROTEIN DRUG INTERACTION

Read more about DISTRIBUTED STOCHASTIC CONTEXTUAL BANDITS FOR PROTEIN DRUG INTERACTION
1 comment
Log in to post comments

In recent work [1], we developed a distributed stochastic multi-arm contextual bandit algorithm to learn optimal actions when the contexts are unknown, and M agents work collaboratively under the coordination of a central server to minimize the total regret. In our model, the agents observe only the context distribution and the exact context is unknown to the agents. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism.

ICASSP-Poster -Final.pdf

ICASSP-Poster -Final.pdf (18)

Categories:: Sequential learning; sequential decision methods (MLR-SLER)

11 Views

Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion

Read more about Object Trajectory Estimation with Multi-Band Wi-Fi Neural Dynamic Fusion
Log in to post comments

In contrast to existing multi-band Wi-Fi fusion in a frame-to-frame basis for simple classification, this paper considers asynchronous sequence-to-sequence fusion between sub-7GHz channel state information (CSI) and 60GHz beam SNR for more challenging downstream tasks such as continuous regression.

icassp_skato_release.pptx

icassp_skato_release.pptx (10)

Categories:: Other

21 Views

DISTRIBUTED STOCHASTIC CONTEXTUAL BANDITS FOR PROTEIN DRUG INTERACTION

Read more about DISTRIBUTED STOCHASTIC CONTEXTUAL BANDITS FOR PROTEIN DRUG INTERACTION
Log in to post comments

Bandits_ICASSP_2024.pdf

Bandits_ICASSP_2024.pdf (17)

Categories:: Distributed and Cooperative Learning (MLR-DIST)

8 Views

MaskMark: Robust Neural Watermarking for Real and Synthetic Speech (Slides)

Read more about MaskMark: Robust Neural Watermarking for Real and Synthetic Speech (Slides)
1 comment
Log in to post comments

High-quality speech synthesis models may be used to spread misinformation or impersonate voices. Audio watermarking can help combat such misuses by embedding a traceable signature in generated audio. However, existing audio watermarks are not designed for synthetic speech and typically demonstrate robustness to only a small set of transformations of the watermarked audio. To address this, we propose MaskMark, a neural network-based digital audio watermarking technique optimized for speech.

maskmark_flat.pdf

maskmark_flat.pdf (27)

Categories:: Watermarking and Steganography

14 Views

[Poster] Selective Acoustic Feature Enhancement for Speech Emotion Recognition with Noisy Speech

A speech emotion recognition (SER) system deployed on a real-world application can encounter speech contaminated with unconstrained background noise. To deal with this issue,

poster-selective-feature-enhancement.pdf

poster-selective-feature-enhancement.pdf (12)

Categories:: Speech Analysis (SPE-ANLS)

5 Views

A Lightweight Hybrid Multi-Channel Speech Extraction System with Directional Voice Activity Detection

Although deep learning (DL) based end-to-end models have shown outstanding performance in multi-channel speech extraction, their practical applications on edge devices are restricted due to their high computational complexity. In this paper, we propose a hybrid system that can more effectively integrate the generalized sidelobe canceller (GSC) and a lightweight post-filtering model under the assistance of spatial speaker activity information provided by a directional voice activity detection (DVAD) module.

tianchi.sun_.pptx

tianchi.sun_.pptx (46)

Categories:: Source Separation and Signal Enhancement

18 Views

Poster for ICASSP 2024 paper "Hot-Fixing Wake Work Recognition for End-to-End ASR via Neural Model Reprogramming"

This paper proposes two novel variants of neural reprogramming to enhance wake word recognition in streaming end-to-end ASR models without updating model weights. The first, "trigger-frame reprogramming", prepends the input speech feature sequence with the learned trigger-frames of the target wake word to adjust ASR model’s hidden states for improved wake word recognition. The second, "predictor-state initialization", trains only the initial state vectors (cell and hidden states) of the LSTMs in the prediction network.

WW_HF_w_NP_ICASSP2024 Poster.pdf

WW_HF_w_NP_ICASSP2024 Poster.pdf (15)

Categories:: Other

7 Views

ColorFlow_ICASSP2024

Image colorization is an ill-posed task, as objects within grayscale images can correspond to multiple colors, motivating researchers to establish a one-to-many relationship between objects and colors. Previous work mostly could only create an insufficient deterministic relationship. Normalizing flow can fully capture the color diversity from natural image manifold. However, classical flow often overlooks the color correlations between different objects, resulting in generating unrealistic color.

ColorFlow_ICASSP2024.pptx

ColorFlow_ICASSP2024.pptx (22)

Categories:: Multimedia Signal Processing

7 Views

On-Device Constrained Self-Supervised Learning for Keyword Spotting via Quantization Aware Pre-Training and Fine-tuning

Large self-supervised models have excelled in various speech processing tasks, but their deployment on resource-limited devices is often impractical due to their substantial memory footprint. Previous studies have demonstrated the effectiveness of self-supervised pre-training for keyword spotting, even with constrained model capacity.

final_v5.pdf

final_v5.pdf (19)

Categories:: Resource constrained speech recognition (SPE-RCSR)

9 Views

MDRT: MULTI-DOMAIN SYNTHETIC SPEECH LOCALIZATION

Read more about MDRT: MULTI-DOMAIN SYNTHETIC SPEECH LOCALIZATION
Log in to post comments

With recent advancements in generating synthetic speech, tools to generate high-quality synthetic speech impersonating any human speaker are easily available. Several incidents report misuse of high-quality synthetic speech for spreading misinformation and for large-scale financial frauds. Many methods have been proposed for detecting synthetic speech; however, there is limited work on localizing the synthetic segments within the speech signal. In this work, our goal is to localize the synthetic speech segments in a partially synthetic speech signal.

mdrt_v05.pdf

mdrt_v05.pdf (19)

Categories:: Multimedia Forensics

12 Views

IEEE ICASSP 2024

Pages