Skip to content

Latest commit

 

History

History

README.md

Computer Vision

Description: Classic papers in the field of computer vision

计算机视觉学习路线

  1. 收藏 | 帮你精选CV方向经典+顶会论文,科研看这些就够了!

image recognition and image classficiation

  1. AlexNet(2012):Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.
  2. ZFNet(2013):Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[C]//European conference on computer vision. Springer, Cham, 2014: 818-833.
  3. VGG16 and VGG19(2014):Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
  4. GoogLeNet(2014):
    • Inception v1:Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
    • Inception v2:Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
    • Inception v3:Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826.
    • Inception v4:Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//AAAI. 2017, 4: 12.
  5. Resnet(2015):He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
  6. DenseNet(2017):Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2261-2269.
  7. MobileNet(2017):Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
  8. ShuffleNet(2017):《ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices》
    • Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.
  9. Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
  10. EffNet(2018):Freeman I, Roese-Koerner L, Kummert A. EffNet: An Efficient Structure for Convolutional Neural Networks[J]. arXiv preprint arXiv:1801.06434, 2018.
  11. 从 VGG开始,介绍了 GoogLeNet、ResNet、Inception系列、DenseNet、Xception、SENet,还有轻量级网络,如:MobileNet、ShuffleNet和IGCV系列。甚至还有最近很火的NasNet系列网络。每种网络都带有论文链接和多种复现的代码链接。
  12. CV 图像分类常见的 36 个模型汇总!附完整论文和代码

Classification + Localization

Description:对图像进行分类并给出分类目标物体的位置。

  1. 待补充

Object Detection

  1. SSD:Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
    • SSD300 | SSD500
  2. R-CNN:Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
  3. Fast R-CNN:Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.
  4. Faster R-CNN:Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems. 2015: 91-99.
    • Faster R-CNN(VGG16) | Faster R-CNN(ZFNET)
  5. YOLO:
  6. Fast YOLO:
  7. 目标检测最新进展总结与展望
  8. GitHub:目标检测最全论文集锦

图像分割

  1. GitHub:图像分割最全资料集锦

Instance Segmentation(实例分割)

1.待补充

Semantic Segmentation(语义分割)

  1. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440.

Panoptic segmentation(全景分割)

image caption

  1. 【专知荟萃08】图像描述生成Image Caption知识资料全集(入门/进阶/论文/综述/视频/专家等)
  2. Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3156-3164.
  3. Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//International conference on machine learning. 2015: 2048-2057.
  4. Karpathy A, Fei-Fei L. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3128-3137.
  5. Bernardi R, Cakici R, Elliott D, et al. Automatic description generation from images: A survey of models, datasets, and evaluation measures[J]. Journal of Artificial Intelligence Research, 2016, 55: 409-442.
  6. Karpathy A. Connecting Images and Natural Language[D]. Ph. D. Dissertation. STANFORD UNIVERSITY, 2016.
  7. Soh M. Learning CNN-LSTM architectures for image caption generation[J]. 2016.
  8. Rennie S J, Marcheret E, Mroueh Y, et al. Self-critical sequence training for image captioning[C]//CVPR. 2017, 1(2): 3.
  9. Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. pp. 3337{3345 (2017) website pdf
    • Jonathan等人采用层次结构的LSTM,其模型能够生成段落级的图像描述。基本上,在他们的工作中,使用了两个基于LSTM的语言解码器:第一阶段LSTM捕获图像的一般信息,并在隐藏状态下存储每个句子的上下文信息。然后,使用第二阶段LSTM将第一阶段LSTM的隐藏状态解码为段落中的不同句子。
  10. Lu J, Xiong C, Parikh D, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 6: 2.
  11. 「Show and Tell」——图像标注(Image Caption)任务技术综述
  12. Chen L, Zhang H, Xiao J, et al. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 6298-6306.
  13. Lu J, Yang J, Batra D, et al. Neural Baby Talk[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7219-7228.
  14. Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//CVPR. 2018, 3(5): 6.
  15. 梅涛:深度学习为视觉和语言之间搭建了一座桥梁
  16. Image Caption-教程
  17. Lu J, Xiong C, Parikh D, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 375-383.

Image Sentiment Analysis(图片情感分析)

  1. Deep Learning for Sentiment Analysis: A Survey
  2. Hu A, Flaxman S. Multimodal Sentiment Analysis To Explore the Structure of Emotions[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018: 350-358.

Visual Question Answering(视觉问答)

  1. Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering[C]//Proceedings of the IEEE international conference on computer vision. 2015: 2425-2433.
  2. 李庆. 基于深度神经网络和注意力机制的图像问答研究[D]. 中国科学技术大学, 2018.

Text to Image(文本转图片)

  1. Johnson J, Gupta A, Fei-Fei L. Image generation from scene graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1219-1228.
  2. Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis[J]. arXiv preprint arXiv:1605.05396, 2016.

基于文本的图像检索

  1. Gu J, Cai J, Joty S R, et al. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7181-7189.

Image Translation(图像翻译或图像风格迁移)

  1. Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution[C]//European conference on computer vision. Springer, Cham, 2016: 694-711.
    • 在看pix2pix,Cycle-GAN,pix2pix HD等Image Translation的文章时常有提到Perceptual Losses for Real-Time Style Transfer and Super-Resolution的Perceptual Loss, 而且李飞飞是文章的作者之一,文章应该不错。
  2. pix2pix
  3. Cycle-GAN
  4. pix2pix HD
  5. Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015.
    • 最早在A Neural Algorithm of Artistic Style中提出了Perceptual Loss。

自动驾驶领域

  1. GitHub:车道线检测最全资料集锦

train trick

  1. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.