All List []
List of papers we read in multimedia seminar. (2024/04-)
IEEE/CVF Computer Vision and Pattern Recognition (CVPR) []
- (CVPR 2024) Style Aligned Image Generation via Shared Attention
- (CVPR 2024) One-step Diffusion with Distribution Matching Distillation
- (CVPR 2024) VideoBooth: Diffusion-based Video Generation with Image Prompts
- (CVPR 2024) FreeU: Free Lunch in Diffusion U-Net
- (CVPR 2024) StyLitGAN: Prompting StyleGAN to Produce New Illumination Conditions
- (CVPR 2024) Joint-task Regularization for Partially Labeled Multitask Learning
- (CVPR 2024) Shadows Don’t Lie and Lines Can’t Bend! Generative Models don’t know Projective Geometry...for now
- (CVPR 2024) Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
- (CVPR 2024) FINER: Flexible spectral-bias tuning in Implicit Neural Representation by Variable-periodic Activation Functions
- (CVPR 2024) DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
- (CVPR 2024) Long-Tailed Anomaly Detection with Learnable Class Names
- (CVPR 2024) Towards Backward-Compatible Continual Learning of Image Compression
- (CVPR 2024) SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
- (CVPR 2024) YOLO-World: Real-Time Open-Vocabulary Object Detection
- (CVPR 2024) InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
- (CVPR 2024) TextCraftor: Your Text Encoder Can be Image Quality Controller
- (CVPR 2024) Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
- (CVPR 2024) SUGAR : Pre-training 3D Visual Representations for Robotics
- (CVPR 2024) Mip-Splatting: Alias-free 3D Gaussian Splatting
- (CVPR 2024) DETRs Beat YOLOs on Real-time Object Detection
- (CVPR 2024) Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
- (CVPR 2024) FedAS: Bridging Inconsistency in Personalized Federated Learning
- (CVPR 2024) PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
- (CVPR 2024) Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
- (CVPR 2024) C3: High-performance and low-complexity neural compression from a single image or video
- (CVPR 2024) SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
- (CVPR 2024) GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
- (CVPR 2024) Distribution Extrapolation Diffusion Model for Video Prediction
- (CVPR 2024) GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
- (CVPR 2024) Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
- (CVPR 2024) Boosting Neural Representations for Videos with a Conditional Decoder
- (CVPR 2024) Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature(CVPR 2024)
- (CVPR 2024) Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
- (CVPR 2023) MOSO: Decomposing MOtion, Scene and Object for Video Prediction
- (CVPR 2023) Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
- (CVPR 2023) Learned Image Compression with Mixed Transformer-CNN Architectures
- (CVPR 2023) ImageBind: One Embedding Space To Bind Them All
- (CVPR 2022) Anomaly Detection via Reverse Distillation from One-Class Embedding
- (CVPR 2021) End-to-End Object Detection with Fully Convolutional Network
- (CVPR 2021) Checkerboard Context Model for Efficient Learned Image Compression
International Conference on Learning Representations (ICLR) []
- (ICLR 2024) Language Model Beats Diffusion - Tokenizer is Key to Visual Generation
- (ICLR 2024) VDT: General-Purpose Video Diffusion Transformers via Mask Modeling
- (ICLR 2024) Vision Transformers Need Registers
- (ICLR 2024) Implicit Neural Representation Image Codec with Mixed Context for Fast Decoding
- (ICLR 2023) DINO: DETR with Improved Denoising Anchor Boxes for End-to-End Object Detection
- (ICLR 2022) Language-driven Semantic Segmentation
- (ICLR 2022) LoRA: Low-Rank Adaptation of Large Language Models
- (ICLR 2022) Prompt-to-Prompt Image Editing with Cross Attention Control
- (ICLR 2022) Entroformer: A Transformer-based Entropy Model for Learned Image Compression
- (ICLR 2019) DARTS: Differentiable Architecture Search
- (ICLR 2018) Variational image compression with a scale hyperprior
- (ICLR 2017) Neural Architecture Search with Reinforcement Learning
- (ICLR 2021 workshop) COIN: COmpression with Implicit Neural representations
Annual Conference on Neural Information Processing Systems (NeurIPS) []
- (NeurIPS 2024) Video Diffusion Models are Training-free Motion Interpreter and Controller
- (NeurIPS 2024) YOLOv10: Real-Time End-to-End Object Detection
- (NeurIPS 2024) NVRC: Neural Video Representation Compression
- (NeurIPS 2024) MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
- (NeurIPS 2023) Idempotent Learned Image Compression with Right-Inverse
- (NeurIPS 2023) Towards Efficient Image Compression Without Autoregressive Models
- (NeurIPS 2022) Flexible Diffusion Modeling of Long Videos
- (NeurIPS 2022) Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
- (NeurIPS 2018) Joint autoregressive and hierarchical priors for learned image compression
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) []
- (ICASSP 2024) Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
- (ICASSP 2023) Hybrid Neural Network With Cross-And Self-Module Attention Pooling For Text-Independent Speaker Verification
- (ICASSP 2023) Improving Music Genre Classification from Multi-Modal Properties of Music and Genre Correlations Perspective
- (ICASSP 2023) HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
- (ICASSP 2022) AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
- (ICASSP 2021) Image Coding for Machines: an End-To-End Learned Approach
IEEE/CVF International Conference on Computer Vision (ICCV) []
- (ICCV 2023) Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
- (ICCV 2023) Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
- (ICCV 2023) Adding Conditional Control to Text-to-Image Diffusion Models
- (ICCV 2023) Video Object Segmentation-aware Video Frame Interpolation
- (ICCV 2023) COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec (cool-chic v1.0)
- (ICCV 2023) TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception Image
- (ICCV 2023) Semantically Structured Image Compression via Irregular Group-Based Decoupling
- (ICCV 2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
IEEE International Conference on Image Processing (ICIP) []
- (ICIP 2024) Image Coding For Machine Via Analytics-Driven Appearance Redundancy Reduction
- (ICIP 2023) Frequency Disentangled Features in Neural Image Compression
- (ICIP 2022) Bridging the Gap Between Image Coding for Machines and Humans
- (ICIP 2022) Deep Feature Compression Using Rate-Distortion Optimization Guided Autoencoder
- (ICIP 2021) An Efficient Image Compression Method Based on Neural Network: An Overfitting Approach
- (ICIP 2018) Video Error Concealment Using Deep Neural Networks
European Conference on Computer Vision (ECCV) []
- (ECCV 2024) Fast Encoding and Decoding for Implicit Video Representation
- (ECCV 2024) Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- (ECCV 2024) GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
- (ECCV 2024) ReNoise: Real Image Inversion Through Iterative Noising
International Conference on Machine Learning (ICML) []
- (ICML 2024) Fast Timing-Conditioned Latent Audio Diffusion
- (ICML 2023) AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
IEEE International Workshop on Multimedia Signal Processing (MMSP) []
- (MMSP 2023) Region of Interest Enabled Learned Image Coding for Machines
- (MMSP 2023) Low-complexity Overfitted Neural Image Codec (cool-chic v2.0)
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) []
- (WACV 2024) D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles
- (WACV 2024) Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model
IEEE International Conference on Multimedia & Expo (ICME) []
- (ICME 2020) Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization
ACM Special Interest Group on Computer Graphics (SIGGRAPH) []
- (SIGGRAPH 2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering
ACM Multimedia Conference (ACMMM) []
- (MM 2023) ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision
Others []
- (EUSIPCO 2024) Overfitted image coding at reduced complexity (cool-chic v3.2)
- (SLT 2024) Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection Challenge 2024
- (ISSC 2023) A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality
- (ICDCS 2023) Edge-Cloud Collaborated Object Detection via Difficult-Case Discriminator
- (TMLR 2022) COIN++: Neural Compression Across Modalities
- (ISM 2021) Learned Enhancement Filters for Image Coding for Machines
- (ACSSC 2014) Weighted boundary matching error concealment for HEVC using block partition decisions
- (VCIP 2023) Hybrid Implicit Neural Image Compression with Subpixel Context Model and Iterative Pruner
- (ICPR 2021) PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization
- (Signal Processing Letters vol. 30) Enhanced Quantified Local Implicit Neural Representation for Image Compression
arXiv []
- (arXiv 2024) How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
- (arxiv 2024) Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
- (arXiv 2024) Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
- (arXiv 2024) ControlNeXt: Powerful and Efficient Control for Image and Video Generation
- (arXiv 2023) SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
- (arXiv 2024) Long-form music generation with latent diffusion
- (arXiv 2024) Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces
Log 2024/08/20
- (NeurIPS 2023) Towards Efficient Image Compression Without Autoregressive Models
- (CVPR 2021) Checkerboard Context Model for Efficient Learned Image Compression
- (CVPR 2024) One-step Diffusion with Distribution Matching Distillation
- (ICASSP 2023) HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
- (CVPR 2024) Mip-Splatting: Alias-free 3D Gaussian Splatting
Log 2024/08/27
- (arXiv 2024) ControlNeXt: Powerful and Efficient Control for Image and Video Generation
- (NeurIPS 2024) YOLOv10: Real-Time End-to-End Object Detection
- (CVPR 2021) End-to-End Object Detection with Fully Convolutional Network
- (CVPR 2024) FreeU: Free Lunch in Diffusion U-Net
- (CVPR 2024) FedAS: Bridging Inconsistency in Personalized Federated Learning
Log 2024/09/03
- (CVPR 2024) PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
- (CVPR 2023) Learned Image Compression with Mixed Transformer-CNN Architectures
- (CVPR 2024) Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
- (CVPR 2024) C3: High-performance and low-complexity neural compression from a single image or video
Log 2024/09/17
- (ICLR 2022) Entroformer: A Transformer-based Entropy Model for Learned Image Compression
- (ECCV 2024) ReNoise: Real Image Inversion Through Iterative Noising
- (ICASSP 2024) Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
Log 2024/09/24
- (CVPR 2024) SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
- (CVPR 2024) GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
- (CVPR 2022) Anomaly Detection via Reverse Distillation from One-Class Embedding
- (SLT 2024) Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection Challenge 2024
- (ICASSP 2022) AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks
- (NeurIPS 2024) NVRC: Neural Video Representation Compression
Log 2024/10/01
- (NeurIPS 2024) MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
- (ICCV 2023) TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception Image
- (VCIP 2023) Hybrid Implicit Neural Image Compression with Subpixel Context Model and Iterative Pruner
Log 2024/10/08
- (arXiv 2024) Long-form music generation with latent diffusion
- (arXiv 2024) Integrating Text-to-Music Models with Language Models: Composing Long Structured Music Pieces
Log 2024/10/15
- (ICCV 2023) Semantically Structured Image Compression via Irregular Group-Based Decoupling
- (ICCV 2021) Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- (CVPR 2024) Distribution Extrapolation Diffusion Model for Video Prediction
- (CVPR 2024) GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
- (CVPR 2024) Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Log 2024/10/22
- (CVPR 2024) Boosting Neural Representations for Videos with a Conditional Decoder
- (NeurIPS 2023) Idempotent Learned Image Compression with Right-Inverse
- (ICLR 2024) Implicit Neural Representation Image Codec with Mixed Context for Fast Decoding
- (ICPR 2021) PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization
Log 2024/11/12
- (CVPR 2024) Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature
- (CVPR 2024) Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
- (CVPR 2023) ImageBind: One Embedding Space To Bind Them All
- (ICIP 2024) Image Coding For Machine Via Analytics-Driven Appearance Redundancy Reduction
Log 2024/11/19
- (NeurIPS 2024) Video Diffusion Models are Training-free Motion Interpreter and Controller
- (ECCV 2024) Fast Encoding and Decoding for Implicit Video Representation
- (ICIP 2023) Frequency Disentangled Features in Neural Image Compression
Log 2024/12/03
- (ICME 2020) Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization
- (ICLR 2022) LoRA: Low-Rank Adaptation of Large Language Models
- (WACV 2024) D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles
- (Signal Processing Letters vol. 30) Enhanced Quantified Local Implicit Neural Representation for Image Compression
Log 2024/12/10
- (ICIP 2022) Bridging the Gap Between Image Coding for Machines and Humans
- (arXiv 2024) How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
- (ICLR 2022) Language-driven Semantic Segmentation
