CVPR'23 最新 89 篇论文分方向整理｜涵盖视频目标检测 ...

民智有待提高 · 发表于 2023-7-18 15:18:35

编辑丨极市平台
CVPR2023已经放榜，今年有2360篇，接收率为25.78%。在CVPR2023正式会议召开前，为了让大家更快地获取和学习到计算机视觉前沿技术，极市对CVPR023 最新论文进行追踪，包括分研究方向的论文、代码汇总以及论文技术直播分享。
CVPR 2023 论文分方向整理目前在极市社区持续更新中，已累计更新了693篇，项目地址：https://www.cvmart.net/community/detail/7422
以下是最近更新的 CVPR 2023 论文，包含检测、分割、人脸、视频处理、医学影像、神经网络结构、多模态、小样本学习等方向。
打包下载地址： https://www.cvmart.net/community/detail/7480
2D目标检测(2D Object Detection)

[1]What Can Human Sketches Do for Object Detection?
paper：https://arxiv.org/abs/2303.15149

视频目标检测(Video Object Detection)

[1]Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
paper：https://arxiv.org/abs/2303.14768 code：https://github.com/tencentyouturesearch/highlightdetection-clc

[2]3D Video Object Detection with Learnable Object-Centric Global Optimization
paper：https://arxiv.org/abs/2303.15416 code：https://github.com/jiaweihe1996/ba-det

3D目标检测(3D object detection)

[1]Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
paper：https://arxiv.org/abs/2303.14311

[2]Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
paper：https://arxiv.org/abs/2303.14488 code：https://github.com/cuogeihong/ceasc

[3]Viewpoint Equivariance for Multi-View 3D Object Detection
paper：https://arxiv.org/abs/2303.14548 code：https://github.com/tri-ml/vedet

伪装目标检测(Camouflaged Object Detection)

[1]Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
paper：https://arxiv.org/abs/2303.14816 code：https://github.com/zhouhuang23/fspnet

关键点检测(Keypoint Detection)

[1]Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
paper：https://arxiv.org/abs/2303.15270

异常检测(Anomaly Detection)

[1]WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
paper：https://arxiv.org/abs/2303.14814

[2]SimpleNet: A Simple Network for Image Anomaly Detection and Localization
paper：https://arxiv.org/abs/2303.15140 code：https://github.com/donaldrr/simplenet

[3]Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
paper：https://arxiv.org/abs/2303.15167

图像分割(Image Segmentation)

[1]Parameter Efficient Local Implicit Image Function Network for Face Segmentation
paper：https://arxiv.org/abs/2303.15122

[2]EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
paper：https://arxiv.org/abs/2303.15440

全景分割(Panoptic Segmentation)

[1]You Only Segment Once: Towards Real-Time Panoptic Segmentation
paper：https://arxiv.org/abs/2303.14651 code：https://github.com/hujiecpp/yoso

语义分割(Semantic Segmentation)

[1]Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
paper：https://arxiv.org/abs/2303.14360

[2]Instant Domain Augmentation for LiDAR Semantic Segmentation
paper：https://arxiv.org/abs/2303.14378

[3]Leveraging Hidden Positives for Unsupervised Semantic Segmentation
paper：https://arxiv.org/abs/2303.15014 code：https://github.com/hynnsk/hp

实例分割(Instance Segmentation)

[1]DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
paper：https://arxiv.org/abs/2303.14373

[2]The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation
paper：https://arxiv.org/abs/2303.15062

视频目标分割(Video Object Segmentation)

[1]Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
paper：https://arxiv.org/abs/2303.14361 code：https://github.com/shaoyuanlo/stpl

密集预测(Dense Prediction)

[1]Ensemble-based Blackbox Attacks on Dense Prediction
paper：https://arxiv.org/abs/2303.14304

[2]Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
paper：https://arxiv.org/abs/2303.14960 code：https://github.com/PaddlePaddle/PaddleDetection

视频处理(Video Processing)

[1]Affordance Grounding from Demonstration Video to Target Image
paper：https://arxiv.org/abs/2303.14644 code：https://github.com/showlab/afformer

[2]Frame Flexible Network
paper：https://arxiv.org/abs/2303.14817 code：https://github.com/bespontaneous/ffn

[3]Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
paper：https://arxiv.org/abs/2303.15043 code：https://github.com/shangwei5/vidue

人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)

[1]ScarceNet: Animal Pose Estimation with Scarce Annotations
paper：https://arxiv.org/abs/2303.15023 code：https://github.com/chaneyddtt/scarcenet

[2]Human Pose Estimation in Extremely Low-Light Conditions
paper：https://arxiv.org/abs/2303.15410

超分辨率(Super Resolution)

[1]Learning Generative Structure Prior for Blind Text Image Super-resolution
paper：https://arxiv.org/abs/2303.14726 code：https://github.com/csxmli2016/marconet

[2]Learning to Zoom and Unzoom
paper：https://arxiv.org/abs/2303.15390

图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

[1]Visual-Tactile Sensing for In-Hand Object Reconstruction
paper：https://arxiv.org/abs/2303.14498

[2]3D-Aware Multi-Class Image-to-Image Translation with NeRFs
paper：https://arxiv.org/abs/2303.15012 code：https://github.com/sen-mao/3di2i-translation

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

[1]Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior
paper：https://arxiv.org/abs/2303.15046 code：https://github.com/ykdai/BracketFlare

图像去噪/去模糊/去雨去雾(Image Denoising)

[1]Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
paper：https://arxiv.org/abs/2303.14218 code：https://github.com/yuzheng9/c2pnet

[2]Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
paper：https://arxiv.org/abs/2303.14934 code：https://github.com/nagejacob/spatiallyadaptivessid

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[1]OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
paper：https://arxiv.org/abs/2303.14662

[2]High-fidelity 3D Human Digitization from Single 2K Resolution Images
paper：https://arxiv.org/abs/2303.15108

[3]FaceLit: Neural 3D Relightable Faces
paper：https://arxiv.org/abs/2303.15437

图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

[1]Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
paper：https://arxiv.org/abs/2303.14348 code：https://github.com/buptlinfy/zse-sbir

[2]Selective Structured State-Spaces for Long-Form Video Understanding
paper：https://arxiv.org/abs/2303.14526

行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

[1]3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
paper：https://arxiv.org/abs/2303.14474

行人重识别/检测(Re-Identification/Detection)

[1]Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
paper：https://arxiv.org/abs/2303.14481 code：https://github.com/zyk100/llcm

医学影像(Medical Imaging)

[1]Label-Free Liver Tumor Segmentation
paper：https://arxiv.org/abs/2303.14869 code：https://github.com/mrgiovanni/synthetictumors

[2]Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
paper：https://arxiv.org/abs/2303.15038

图像生成/图像合成(Image Generation/Image Synthesis)

[1]Unsupervised Domain Adaption with Pixel-level Discriminator for Image-aware Layout Generation
paper：https://arxiv.org/abs/2303.14377

[2]Freestyle Layout-to-Image Synthesis
paper：https://arxiv.org/abs/2303.14412 code：https://github.com/essunny310/freestylenet

点云(Point Cloud)

[1]Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
paper：https://arxiv.org/abs/2303.14505

[2]NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
paper：https://arxiv.org/abs/2303.15126 code：https://github.com/ispc-lab/neuralpci

[3]Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
paper：https://arxiv.org/abs/2303.15385

三维重建(3D Reconstruction)

[1]PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
paper：https://arxiv.org/abs/2303.14587 code：https://github.com/shuhongchen/panic3d-anime-reconstruction

场景重建/视图合成/新视角合成(Novel View Synthesis)

[1]DyLiN: Making Light Field Networks Dynamic
paper：https://arxiv.org/abs/2303.14243

[2]FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
paper：https://arxiv.org/abs/2303.14368

[3]NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
paper：https://arxiv.org/abs/2303.14435 code：https://github.com/jokeryan/nerf-ds

[4]SUDS: Scalable Urban Dynamic Scenes
paper：https://arxiv.org/abs/2303.14536

[5]JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
paper：https://arxiv.org/abs/2303.15427

知识蒸馏(Knowledge Distillation)

[1]Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
paper：https://arxiv.org/abs/2303.14666

神经网络结构设计(Neural Network Structure Design)

[1]Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
paper：https://arxiv.org/abs/2303.14404 code：https://github.com/akhtarvision/bpc_calibration

[2]Compacting Binary Neural Networks by Sparse Kernel Selection
paper：https://arxiv.org/abs/2303.14470

图神经网络(GNN)

[1]Mind the Label Shift of Augmentation-based Graph OOD Generalization
paper：https://arxiv.org/abs/2303.14859

图像压缩(Image Compression)

[1]Learned Image Compression with Mixed Transformer-CNN Architectures
paper：https://arxiv.org/abs/2303.14978 code：https://github.com/jmliu206/lic_tcm

模型训练/泛化(Model Training/Generalization)

[1]Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
paper：https://arxiv.org/abs/2303.14382 code：https://github.com/yichen928/activeft

[2]CFA: Class-wise Calibrated Fair Adversarial Training
paper：https://arxiv.org/abs/2303.14460 code：https://github.com/pku-ml/cfa

视觉-语言（Vision-language）

[1]VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
paper：https://arxiv.org/abs/2303.14302

[2]Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
paper：https://arxiv.org/abs/2303.14369 code：https://github.com/jpthu17/HBI

[3]IFSeg: Image-free Semantic Segmentation via Vision-Language Model
paper：https://arxiv.org/abs/2303.14396 code：https://github.com/alinlab/ifseg

[4]Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
paper：https://arxiv.org/abs/2303.14968 code：https://github.com/zwx8981/liqe

数据集(Dataset)

[1]CelebV-Text: A Large-Scale Facial Text-Video Dataset
paper：https://arxiv.org/abs/2303.14717 code：https://github.com/CelebV-Text/CelebV-Text

[2]On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
paper：https://arxiv.org/abs/2303.14840 code：https://github.com/junggy/hammer-dataset

[3]Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
paper：https://arxiv.org/abs/2303.15166 code：https://github.com/dreemurr-t/baid

[4]Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding
paper：https://arxiv.org/abs/2303.15417 code：https://github.com/jaehakim97/blurhand_release

小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

[1]Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
paper：https://arxiv.org/abs/2303.14652

[2]ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
paper：https://arxiv.org/abs/2303.14679 code：https://github.com/casia-iva-lab/zbs

[3]Learning Attention as Disentangler for Compositional Zero-shot Learning
paper：https://arxiv.org/abs/2303.15111 code：https://github.com/haoosz/ade-czsl

[4]Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
paper：https://arxiv.org/abs/2303.15322 code：https://github.com/manliucoder/psvma

持续学习(Continual Learning/Life-long Learning)

[1]Preserving Linear Separability in Continual Learning by Backward Feature Projection

paper：https://arxiv.org/abs/2303.14595

场景图预测(Scene Graph Prediction)

[1]VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
paper：https://arxiv.org/abs/2303.14408 code：https://github.com/wz7in/cvpr2023-vlsat

视觉定位/位姿估计(Visual Localization/Pose Estimation)

[1]Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
paper：https://arxiv.org/abs/2303.15274

视觉推理/视觉问答(Visual Reasoning/VQA)

[1]MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
paper：https://arxiv.org/abs/2303.14933 code：https://github.com/zzc-1998/md-vqa

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[1]BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
paper：https://arxiv.org/abs/2303.14773 code：https://github.com/changdaeoh/blackvip

对比学习(Contrastive Learning)

[1]Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
paper：https://arxiv.org/abs/2303.14865

半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

[1]Detecting Backdoors in Pre-trained Encoders
paper：https://arxiv.org/abs/2303.15180 code：https://github.com/giantseaweed/decree

神经网络可解释性(Neural Network Interpretability)

[1]IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
paper：https://arxiv.org/abs/2303.14242 code：https://github.com/yangruo1226/idgi

联邦学习(Federated Learning)

[1]The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
paper：https://arxiv.org/abs/2303.14868

其他

[1]DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
paper：https://arxiv.org/abs/2303.14585 code：https://github.com/yizhiwang96/deepvecfont-v2

[2]PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
paper：https://arxiv.org/abs/2303.14676

[3]Disentangling Writer and Character Styles for Handwriting Generation
paper：https://arxiv.org/abs/2303.14736 code：https://github.com/dailenson/sdt

[4]Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
paper：https://arxiv.org/abs/2303.14926

[5]DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering
paper：https://arxiv.org/abs/2303.15101 code：https://github.com/lmozart/cvpr2023-dani-net

[6]Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
paper：https://arxiv.org/abs/2303.15266 code：https://github.com/zhourixin/bronze-ding

[7]Handwritten Text Generation from Visual Archetypes
paper：https://arxiv.org/abs/2303.15269 code：https://github.com/aimagelab/vatr

		自动登录	找回密码
密码			立即注册