Seminar Logs

Multimedia

All List []

List of papers we read in multimedia seminar. (2024/04-)

IEEE/CVF Computer Vision and Pattern Recognition (CVPR) []

(CVPR 2024) Style Aligned Image Generation via Shared Attention
(CVPR 2024) One-step Diffusion with Distribution Matching Distillation
(CVPR 2024) VideoBooth: Diffusion-based Video Generation with Image Prompts
(CVPR 2024) FreeU: Free Lunch in Diffusion U-Net
(CVPR 2024) StyLitGAN: Prompting StyleGAN to Produce New Illumination Conditions
(CVPR 2024) Joint-task Regularization for Partially Labeled Multitask Learning
(CVPR 2024) Shadows Don’t Lie and Lines Can’t Bend! Generative Models don’t know Projective Geometry...for now
(CVPR 2024) Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
(CVPR 2024) FINER: Flexible spectral-bias tuning in Implicit Neural Representation by Variable-periodic Activation Functions
(CVPR 2024) DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
(CVPR 2024) Long-Tailed Anomaly Detection with Learnable Class Names
(CVPR 2024) Towards Backward-Compatible Continual Learning of Image Compression
(CVPR 2024) SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
(CVPR 2024) YOLO-World: Real-Time Open-Vocabulary Object Detection
(CVPR 2024) InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
(CVPR 2024) TextCraftor: Your Text Encoder Can be Image Quality Controller
(CVPR 2024) Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
(CVPR 2024) SUGAR : Pre-training 3D Visual Representations for Robotics
(CVPR 2024) Mip-Splatting: Alias-free 3D Gaussian Splatting
(CVPR 2024) DETRs Beat YOLOs on Real-time Object Detection
(CVPR 2024) Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
(CVPR 2024) FedAS: Bridging Inconsistency in Personalized Federated Learning
(CVPR 2024) PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
(CVPR 2024) Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
(CVPR 2024) C3: High-performance and low-complexity neural compression from a single image or video
(CVPR 2024) SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
(CVPR 2024) GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
(CVPR 2024) Distribution Extrapolation Diffusion Model for Video Prediction
(CVPR 2024) GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
(CVPR 2024) Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
(CVPR 2024) Boosting Neural Representations for Videos with a Conditional Decoder
(CVPR 2024) Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature(CVPR 2024)
(CVPR 2024) Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
(CVPR 2023) MOSO: Decomposing MOtion, Scene and Object for Video Prediction
(CVPR 2023) Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
(CVPR 2023) Learned Image Compression with Mixed Transformer-CNN Architectures
(CVPR 2023) ImageBind: One Embedding Space To Bind Them All
(CVPR 2022) Anomaly Detection via Reverse Distillation from One-Class Embedding
(CVPR 2021) End-to-End Object Detection with Fully Convolutional Network
(CVPR 2021) Checkerboard Context Model for Efficient Learned Image Compression

International Conference on Learning Representations (ICLR) []

(ICLR 2024) Language Model Beats Diffusion - Tokenizer is Key to Visual Generation
(ICLR 2024) VDT: General-Purpose Video Diffusion Transformers via Mask Modeling
(ICLR 2024) Vision Transformers Need Registers
(ICLR 2024) Implicit Neural Representation Image Codec with Mixed Context for Fast Decoding
(ICLR 2023) DINO: DETR with Improved Denoising Anchor Boxes for End-to-End Object Detection
(ICLR 2022) Language-driven Semantic Segmentation
(ICLR 2022) LoRA: Low-Rank Adaptation of Large Language Models
(ICLR 2022) Prompt-to-Prompt Image Editing with Cross Attention Control
(ICLR 2022) Entroformer: A Transformer-based Entropy Model for Learned Image Compression
(ICLR 2019) DARTS: Differentiable Architecture Search
(ICLR 2018) Variational image compression with a scale hyperprior
(ICLR 2017) Neural Architecture Search with Reinforcement Learning
(ICLR 2021 workshop) COIN: COmpression with Implicit Neural representations

Annual Conference on Neural Information Processing Systems (NeurIPS) []

(NeurIPS 2024) Video Diffusion Models are Training-free Motion Interpreter and Controller
(NeurIPS 2024) YOLOv10: Real-Time End-to-End Object Detection
(NeurIPS 2024) NVRC: Neural Video Representation Compression
(NeurIPS 2024) MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
(NeurIPS 2023) Idempotent Learned Image Compression with Right-Inverse
(NeurIPS 2023) Towards Efficient Image Compression Without Autoregressive Models
(NeurIPS 2022) Flexible Diffusion Modeling of Long Videos
(NeurIPS 2022) Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
(NeurIPS 2018) Joint autoregressive and hierarchical priors for learned image compression

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) []

(ICASSP 2024) Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
(ICASSP 2023) Hybrid Neural Network With Cross-And Self-Module Attention Pooling For Text-Independent Speaker Verification
(ICASSP 2023) Improving Music Genre Classification from Multi-Modal Properties of Music and Genre Correlations Perspective
(ICASSP 2023) HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
(ICASSP 2022) AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
(ICASSP 2021) Image Coding for Machines: an End-To-End Learned Approach

IEEE/CVF International Conference on Computer Vision (ICCV) []

(ICCV 2023) Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
(ICCV 2023) Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
(ICCV 2023) Adding Conditional Control to Text-to-Image Diffusion Models
(ICCV 2023) Video Object Segmentation-aware Video Frame Interpolation
(ICCV 2023) COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec (cool-chic v1.0)
(ICCV 2023) TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception Image
(ICCV 2023) Semantically Structured Image Compression via Irregular Group-Based Decoupling
(ICCV 2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

IEEE International Conference on Image Processing (ICIP) []

(ICIP 2024) Image Coding For Machine Via Analytics-Driven Appearance Redundancy Reduction
(ICIP 2023) Frequency Disentangled Features in Neural Image Compression
(ICIP 2022) Bridging the Gap Between Image Coding for Machines and Humans
(ICIP 2022) Deep Feature Compression Using Rate-Distortion Optimization Guided Autoencoder
(ICIP 2021) An Efficient Image Compression Method Based on Neural Network: An Overfitting Approach
(ICIP 2018) Video Error Concealment Using Deep Neural Networks

European Conference on Computer Vision (ECCV) []

(ECCV 2024) Fast Encoding and Decoding for Implicit Video Representation
(ECCV 2024) Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
(ECCV 2024) GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
(ECCV 2024) ReNoise: Real Image Inversion Through Iterative Noising

International Conference on Machine Learning (ICML) []

(ICML 2024) Fast Timing-Conditioned Latent Audio Diffusion
(ICML 2023) AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

IEEE International Workshop on Multimedia Signal Processing (MMSP) []

(MMSP 2023) Region of Interest Enabled Learned Image Coding for Machines
(MMSP 2023) Low-complexity Overfitted Neural Image Codec (cool-chic v2.0)

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) []

(WACV 2024) D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles
(WACV 2024) Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model

IEEE International Conference on Multimedia & Expo (ICME) []

(ICME 2020) Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

ACM Special Interest Group on Computer Graphics (SIGGRAPH) []

(SIGGRAPH 2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering

ACM Multimedia Conference (ACMMM) []

(MM 2023) ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision

Others []

(EUSIPCO 2024) Overfitted image coding at reduced complexity (cool-chic v3.2)
(SLT 2024) Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection Challenge 2024
(ISSC 2023) A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality
(ICDCS 2023) Edge-Cloud Collaborated Object Detection via Difficult-Case Discriminator
(TMLR 2022) COIN++: Neural Compression Across Modalities
(ISM 2021) Learned Enhancement Filters for Image Coding for Machines
(ACSSC 2014) Weighted boundary matching error concealment for HEVC using block partition decisions
(VCIP 2023) Hybrid Implicit Neural Image Compression with Subpixel Context Model and Iterative Pruner
(ICPR 2021) PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization
(Signal Processing Letters vol. 30) Enhanced Quantified Local Implicit Neural Representation for Image Compression

arXiv []

(arXiv 2024) How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
(arxiv 2024) Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
(arXiv 2024) Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
(arXiv 2024) ControlNeXt: Powerful and Efficient Control for Image and Video Generation
(arXiv 2023) SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
(arXiv 2024) Long-form music generation with latent diffusion
(arXiv 2024) Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces

Log 2024/08/20

(NeurIPS 2023) Towards Efficient Image Compression Without Autoregressive Models
(CVPR 2021) Checkerboard Context Model for Efficient Learned Image Compression
(CVPR 2024) One-step Diffusion with Distribution Matching Distillation
(ICASSP 2023) HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
(CVPR 2024) Mip-Splatting: Alias-free 3D Gaussian Splatting

log file: coming soon

Log 2024/08/27

(arXiv 2024) ControlNeXt: Powerful and Efficient Control for Image and Video Generation
(NeurIPS 2024) YOLOv10: Real-Time End-to-End Object Detection
(CVPR 2021) End-to-End Object Detection with Fully Convolutional Network
(CVPR 2024) FreeU: Free Lunch in Diffusion U-Net
(CVPR 2024) FedAS: Bridging Inconsistency in Personalized Federated Learning

log file: (2024/08/27)

Log 2024/09/03

(CVPR 2024) PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
(CVPR 2023) Learned Image Compression with Mixed Transformer-CNN Architectures
(CVPR 2024) Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
(CVPR 2024) C3: High-performance and low-complexity neural compression from a single image or video

log file: (2024/09/03)

Log 2024/09/17

(ICLR 2022) Entroformer: A Transformer-based Entropy Model for Learned Image Compression
(ECCV 2024) ReNoise: Real Image Inversion Through Iterative Noising
(ICASSP 2024) Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos

log file: (2024/09/17)

Log 2024/09/24

(CVPR 2024) SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
(CVPR 2024) GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
(CVPR 2022) Anomaly Detection via Reverse Distillation from One-Class Embedding
(SLT 2024) Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection Challenge 2024
(ICASSP 2022) AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
(NeurIPS 2024) NVRC: Neural Video Representation Compression

log file: (2024/09/24)

Log 2024/10/01

(NeurIPS 2024) MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
(ICCV 2023) TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception Image
(VCIP 2023) Hybrid Implicit Neural Image Compression with Subpixel Context Model and Iterative Pruner

log file: (2024/10/01)

Log 2024/10/08

(arXiv 2024) Long-form music generation with latent diffusion
(arXiv 2024) Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces

log file: (2024/10/08)

Log 2024/10/15

(ICCV 2023) Semantically Structured Image Compression via Irregular Group-Based Decoupling
(ICCV 2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
(CVPR 2024) Distribution Extrapolation Diffusion Model for Video Prediction
(CVPR 2024) GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
(CVPR 2024) Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

log file: (2024/10/15)

Log 2024/10/22

(CVPR 2024) Boosting Neural Representations for Videos with a Conditional Decoder
(NeurIPS 2023) Idempotent Learned Image Compression with Right-Inverse
(ICLR 2024) Implicit Neural Representation Image Codec with Mixed Context for Fast Decoding
(ICPR 2021) PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization

log file: (2024/10/22)

Log 2024/11/12

(CVPR 2024) Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature
(CVPR 2024) Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
(CVPR 2023) ImageBind: One Embedding Space To Bind Them All
(ICIP 2024) Image Coding For Machine Via Analytics-Driven Appearance Redundancy Reduction

log file: (2024/11/12)

Log 2024/11/19

(NeurIPS 2024) Video Diffusion Models are Training-free Motion Interpreter and Controller
(ECCV 2024) Fast Encoding and Decoding for Implicit Video Representation
(ICIP 2023) Frequency Disentangled Features in Neural Image Compression

log file: (2024/11/19)

Log 2024/12/03

(ICME 2020) Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization
(ICLR 2022) LoRA: Low-Rank Adaptation of Large Language Models
(WACV 2024) D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles
(Signal Processing Letters vol. 30) Enhanced Quantified Local Implicit Neural Representation for Image Compression

log file: (2024/12/03)

Log 2024/12/10

(ICIP 2022) Bridging the Gap Between Image Coding for Machines and Humans
(arXiv 2024) How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
(ICLR 2022) Language-driven Semantic Segmentation

log file: (2024/12/10)