|
编辑丨极市平台
CVPR2023已经放榜,今年有2360篇,接收率为25.78%。在CVPR2023正式会议召开前,为了让大家更快地获取和学习到计算机视觉前沿技术,极市对CVPR023 最新论文进行追踪,包括分研究方向的论文、代码汇总以及论文技术直播分享。
CVPR 2023 论文分方向整理目前在极市社区持续更新中,已累计更新了693篇,项目地址:https://www.cvmart.net/community/detail/7422
以下是最近更新的 CVPR 2023 论文,包含检测、分割、人脸、视频处理、医学影像、神经网络结构、多模态、小样本学习等方向。
打包下载地址: https://www.cvmart.net/community/detail/7480
2D目标检测(2D Object Detection)
[1]What Can Human Sketches Do for Object Detection?
paper:https://arxiv.org/abs/2303.15149
视频目标检测(Video Object Detection)
[1]Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
paper:https://arxiv.org/abs/2303.14768 code:https://github.com/tencentyouturesearch/highlightdetection-clc
[2]3D Video Object Detection with Learnable Object-Centric Global Optimization
paper:https://arxiv.org/abs/2303.15416 code:https://github.com/jiaweihe1996/ba-det
3D目标检测(3D object detection)
[1]Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
paper:https://arxiv.org/abs/2303.14311
[2]Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
paper:https://arxiv.org/abs/2303.14488 code:https://github.com/cuogeihong/ceasc
[3]Viewpoint Equivariance for Multi-View 3D Object Detection
paper:https://arxiv.org/abs/2303.14548 code:https://github.com/tri-ml/vedet
伪装目标检测(Camouflaged Object Detection)
[1]Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
paper:https://arxiv.org/abs/2303.14816 code:https://github.com/zhouhuang23/fspnet
关键点检测(Keypoint Detection)
[1]Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
paper:https://arxiv.org/abs/2303.15270
异常检测(Anomaly Detection)
[1]WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
paper:https://arxiv.org/abs/2303.14814
[2]SimpleNet: A Simple Network for Image Anomaly Detection and Localization
paper:https://arxiv.org/abs/2303.15140 code:https://github.com/donaldrr/simplenet
[3]Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
paper:https://arxiv.org/abs/2303.15167
图像分割(Image Segmentation)
[1]Parameter Efficient Local Implicit Image Function Network for Face Segmentation
paper:https://arxiv.org/abs/2303.15122
[2]EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
paper:https://arxiv.org/abs/2303.15440
全景分割(Panoptic Segmentation)
[1]You Only Segment Once: Towards Real-Time Panoptic Segmentation
paper:https://arxiv.org/abs/2303.14651 code:https://github.com/hujiecpp/yoso
语义分割(Semantic Segmentation)
[1]Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
paper:https://arxiv.org/abs/2303.14360
[2]Instant Domain Augmentation for LiDAR Semantic Segmentation
paper:https://arxiv.org/abs/2303.14378
[3]Leveraging Hidden Positives for Unsupervised Semantic Segmentation
paper:https://arxiv.org/abs/2303.15014 code:https://github.com/hynnsk/hp
实例分割(Instance Segmentation)
[1]DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
paper:https://arxiv.org/abs/2303.14373
[2]The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation via Point-Guided Mask Representation
paper:https://arxiv.org/abs/2303.15062
视频目标分割(Video Object Segmentation)
[1]Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
paper:https://arxiv.org/abs/2303.14361 code:https://github.com/shaoyuanlo/stpl
密集预测(Dense Prediction)
[1]Ensemble-based Blackbox Attacks on Dense Prediction
paper:https://arxiv.org/abs/2303.14304
[2]Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
paper:https://arxiv.org/abs/2303.14960 code:https://github.com/PaddlePaddle/PaddleDetection
视频处理(Video Processing)
[1]Affordance Grounding from Demonstration Video to Target Image
paper:https://arxiv.org/abs/2303.14644 code:https://github.com/showlab/afformer
[2]Frame Flexible Network
paper:https://arxiv.org/abs/2303.14817 code:https://github.com/bespontaneous/ffn
[3]Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
paper:https://arxiv.org/abs/2303.15043 code:https://github.com/shangwei5/vidue
人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)
[1]ScarceNet: Animal Pose Estimation with Scarce Annotations
paper:https://arxiv.org/abs/2303.15023 code:https://github.com/chaneyddtt/scarcenet
[2]Human Pose Estimation in Extremely Low-Light Conditions
paper:https://arxiv.org/abs/2303.15410
超分辨率(Super Resolution)
[1]Learning Generative Structure Prior for Blind Text Image Super-resolution
paper:https://arxiv.org/abs/2303.14726 code:https://github.com/csxmli2016/marconet
[2]Learning to Zoom and Unzoom
paper:https://arxiv.org/abs/2303.15390
图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)
[1]Visual-Tactile Sensing for In-Hand Object Reconstruction
paper:https://arxiv.org/abs/2303.14498
[2]3D-Aware Multi-Class Image-to-Image Translation with NeRFs
paper:https://arxiv.org/abs/2303.15012 code:https://github.com/sen-mao/3di2i-translation
图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)
[1]Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior
paper:https://arxiv.org/abs/2303.15046 code:https://github.com/ykdai/BracketFlare
图像去噪/去模糊/去雨去雾(Image Denoising)
[1]Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
paper:https://arxiv.org/abs/2303.14218 code:https://github.com/yuzheng9/c2pnet
[2]Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
paper:https://arxiv.org/abs/2303.14934 code:https://github.com/nagejacob/spatiallyadaptivessid
人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
[1]OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
paper:https://arxiv.org/abs/2303.14662
[2]High-fidelity 3D Human Digitization from Single 2K Resolution Images
paper:https://arxiv.org/abs/2303.15108
[3]FaceLit: Neural 3D Relightable Faces
paper:https://arxiv.org/abs/2303.15437
图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)
[1]Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
paper:https://arxiv.org/abs/2303.14348 code:https://github.com/buptlinfy/zse-sbir
[2]Selective Structured State-Spaces for Long-Form Video Understanding
paper:https://arxiv.org/abs/2303.14526
行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)
[1]3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
paper:https://arxiv.org/abs/2303.14474
行人重识别/检测(Re-Identification/Detection)
[1]Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
paper:https://arxiv.org/abs/2303.14481 code:https://github.com/zyk100/llcm
医学影像(Medical Imaging)
[1]Label-Free Liver Tumor Segmentation
paper:https://arxiv.org/abs/2303.14869 code:https://github.com/mrgiovanni/synthetictumors
[2]Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
paper:https://arxiv.org/abs/2303.15038
图像生成/图像合成(Image Generation/Image Synthesis)
[1]Unsupervised Domain Adaption with Pixel-level Discriminator for Image-aware Layout Generation
paper:https://arxiv.org/abs/2303.14377
[2]Freestyle Layout-to-Image Synthesis
paper:https://arxiv.org/abs/2303.14412 code:https://github.com/essunny310/freestylenet
点云(Point Cloud)
[1]Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
paper:https://arxiv.org/abs/2303.14505
[2]NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
paper:https://arxiv.org/abs/2303.15126 code:https://github.com/ispc-lab/neuralpci
[3]Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
paper:https://arxiv.org/abs/2303.15385
三维重建(3D Reconstruction)
[1]PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
paper:https://arxiv.org/abs/2303.14587 code:https://github.com/shuhongchen/panic3d-anime-reconstruction
场景重建/视图合成/新视角合成(Novel View Synthesis)
[1]DyLiN: Making Light Field Networks Dynamic
paper:https://arxiv.org/abs/2303.14243
[2]FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
paper:https://arxiv.org/abs/2303.14368
[3]NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
paper:https://arxiv.org/abs/2303.14435 code:https://github.com/jokeryan/nerf-ds
[4]SUDS: Scalable Urban Dynamic Scenes
paper:https://arxiv.org/abs/2303.14536
[5]JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
paper:https://arxiv.org/abs/2303.15427
知识蒸馏(Knowledge Distillation)
[1]Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
paper:https://arxiv.org/abs/2303.14666
神经网络结构设计(Neural Network Structure Design)
[1]Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
paper:https://arxiv.org/abs/2303.14404 code:https://github.com/akhtarvision/bpc_calibration
[2]Compacting Binary Neural Networks by Sparse Kernel Selection
paper:https://arxiv.org/abs/2303.14470
图神经网络(GNN)
[1]Mind the Label Shift of Augmentation-based Graph OOD Generalization
paper:https://arxiv.org/abs/2303.14859
图像压缩(Image Compression)
[1]Learned Image Compression with Mixed Transformer-CNN Architectures
paper:https://arxiv.org/abs/2303.14978 code:https://github.com/jmliu206/lic_tcm
模型训练/泛化(Model Training/Generalization)
[1]Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
paper:https://arxiv.org/abs/2303.14382 code:https://github.com/yichen928/activeft
[2]CFA: Class-wise Calibrated Fair Adversarial Training
paper:https://arxiv.org/abs/2303.14460 code:https://github.com/pku-ml/cfa
视觉-语言(Vision-language)
[1]VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
paper:https://arxiv.org/abs/2303.14302
[2]Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
paper:https://arxiv.org/abs/2303.14369 code:https://github.com/jpthu17/HBI
[3]IFSeg: Image-free Semantic Segmentation via Vision-Language Model
paper:https://arxiv.org/abs/2303.14396 code:https://github.com/alinlab/ifseg
[4]Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
paper:https://arxiv.org/abs/2303.14968 code:https://github.com/zwx8981/liqe
数据集(Dataset)
[1]CelebV-Text: A Large-Scale Facial Text-Video Dataset
paper:https://arxiv.org/abs/2303.14717 code:https://github.com/CelebV-Text/CelebV-Text
[2]On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
paper:https://arxiv.org/abs/2303.14840 code:https://github.com/junggy/hammer-dataset
[3]Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
paper:https://arxiv.org/abs/2303.15166 code:https://github.com/dreemurr-t/baid
[4]Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding
paper:https://arxiv.org/abs/2303.15417 code:https://github.com/jaehakim97/blurhand_release
小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)
[1]Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
paper:https://arxiv.org/abs/2303.14652
[2]ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
paper:https://arxiv.org/abs/2303.14679 code:https://github.com/casia-iva-lab/zbs
[3]Learning Attention as Disentangler for Compositional Zero-shot Learning
paper:https://arxiv.org/abs/2303.15111 code:https://github.com/haoosz/ade-czsl
[4]Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning
paper:https://arxiv.org/abs/2303.15322 code:https://github.com/manliucoder/psvma
持续学习(Continual Learning/Life-long Learning)
[1]Preserving Linear Separability in Continual Learning by Backward Feature Projection
paper:https://arxiv.org/abs/2303.14595
场景图预测(Scene Graph Prediction)
[1]VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
paper:https://arxiv.org/abs/2303.14408 code:https://github.com/wz7in/cvpr2023-vlsat
视觉定位/位姿估计(Visual Localization/Pose Estimation)
[1]Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
paper:https://arxiv.org/abs/2303.15274
视觉推理/视觉问答(Visual Reasoning/VQA)
[1]MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
paper:https://arxiv.org/abs/2303.14933 code:https://github.com/zzc-1998/md-vqa
迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)
[1]BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
paper:https://arxiv.org/abs/2303.14773 code:https://github.com/changdaeoh/blackvip
对比学习(Contrastive Learning)
[1]Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
paper:https://arxiv.org/abs/2303.14865
半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)
[1]Detecting Backdoors in Pre-trained Encoders
paper:https://arxiv.org/abs/2303.15180 code:https://github.com/giantseaweed/decree
神经网络可解释性(Neural Network Interpretability)
[1]IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
paper:https://arxiv.org/abs/2303.14242 code:https://github.com/yangruo1226/idgi
联邦学习(Federated Learning)
[1]The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
paper:https://arxiv.org/abs/2303.14868
其他
[1]DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
paper:https://arxiv.org/abs/2303.14585 code:https://github.com/yizhiwang96/deepvecfont-v2
[2]PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
paper:https://arxiv.org/abs/2303.14676
[3]Disentangling Writer and Character Styles for Handwriting Generation
paper:https://arxiv.org/abs/2303.14736 code:https://github.com/dailenson/sdt
[4]Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
paper:https://arxiv.org/abs/2303.14926
[5]DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering
paper:https://arxiv.org/abs/2303.15101 code:https://github.com/lmozart/cvpr2023-dani-net
[6]Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
paper:https://arxiv.org/abs/2303.15266 code:https://github.com/zhourixin/bronze-ding
[7]Handwritten Text Generation from Visual Archetypes
paper:https://arxiv.org/abs/2303.15269 code:https://github.com/aimagelab/vatr |
|