Name	Name	Last commit message	Last commit date
parent directory ..
2019杭州云栖大会-计算机视觉新探索专场	2019杭州云栖大会-计算机视觉新探索专场
CVFramework.jpg	CVFramework.jpg
CVFrameworkDetail.jpg	CVFrameworkDetail.jpg
README.md	README.md
有三AI+视觉算法工程师成长指导手册_20190812_192257.pdf	有三AI+视觉算法工程师成长指导手册_20190812_192257.pdf

Computer Vision

Description: Classic papers in the field of computer vision

计算机视觉学习路线

收藏 | 帮你精选CV方向经典+顶会论文，科研看这些就够了！

image recognition and image classficiation

AlexNet(2012)：Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.
ZFNet(2013)：Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[C]//European conference on computer vision. Springer, Cham, 2014: 818-833.
VGG16 and VGG19(2014)：Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
- pdf
GoogLeNet(2014):
- Inception v1：Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
- Inception v2：Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
- Inception v3：Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826.
- Inception v4：Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//AAAI. 2017, 4: 12.
Resnet(2015)：He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
- 对ResNet本质的一些思考(微信公众号) | 对ResNet本质的一些思考（知乎）
DenseNet(2017)：Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2261-2269.
MobileNet(2017)：Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
ShuffleNet(2017)：《ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices》
- Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
- 深度学习之卷积网络attention机制SENET、CBAM模块原理总结
- 如何评价Momenta ImageNet 2017夺冠架构SENet? - 知乎
EffNet(2018)：Freeman I, Roese-Koerner L, Kummert A. EffNet: An Efficient Structure for Convolutional Neural Networks[J]. arXiv preprint arXiv:1801.06434, 2018.
从 VGG开始，介绍了 GoogLeNet、ResNet、Inception系列、DenseNet、Xception、SENet，还有轻量级网络，如：MobileNet、ShuffleNet和IGCV系列。甚至还有最近很火的NasNet系列网络。每种网络都带有论文链接和多种复现的代码链接。
CV 图像分类常见的 36 个模型汇总！附完整论文和代码

Classification + Localization

Description:对图像进行分类并给出分类目标物体的位置。

待补充

Object Detection

SSD：Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.
- SSD300 | SSD500
R-CNN：Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
Fast R-CNN：Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.
Faster R-CNN：Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems. 2015: 91-99.
- Faster R-CNN(VGG16) | Faster R-CNN(ZFNET)
YOLO：
Fast YOLO：
目标检测最新进展总结与展望
GitHub：目标检测最全论文集锦

图像分割

GitHub：图像分割最全资料集锦
- GitHub-Awesome Semantic Segmentation

Instance Segmentation(实例分割)

1.待补充

Semantic Segmentation(语义分割)

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440.

Panoptic segmentation(全景分割)

image caption

【专知荟萃08】图像描述生成Image Caption知识资料全集（入门/进阶/论文/综述/视频/专家等）
Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3156-3164.
- code | pdf
Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//International conference on machine learning. 2015: 2048-2057.
Karpathy A, Fei-Fei L. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3128-3137.
Bernardi R, Cakici R, Elliott D, et al. Automatic description generation from images: A survey of models, datasets, and evaluation measures[J]. Journal of Artificial Intelligence Research, 2016, 55: 409-442.
Karpathy A. Connecting Images and Natural Language[D]. Ph. D. Dissertation. STANFORD UNIVERSITY, 2016.
Soh M. Learning CNN-LSTM architectures for image caption generation[J]. 2016.
Rennie S J, Marcheret E, Mroueh Y, et al. Self-critical sequence training for image captioning[C]//CVPR. 2017, 1(2): 3.
Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. pp. 3337{3345 (2017) website pdf
- Jonathan等人采用层次结构的LSTM，其模型能够生成段落级的图像描述。基本上，在他们的工作中，使用了两个基于LSTM的语言解码器：第一阶段LSTM捕获图像的一般信息，并在隐藏状态下存储每个句子的上下文信息。然后，使用第二阶段LSTM将第一阶段LSTM的隐藏状态解码为段落中的不同句子。
Lu J, Xiong C, Parikh D, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 6: 2.
「Show and Tell」——图像标注（Image Caption）任务技术综述
Chen L, Zhang H, Xiao J, et al. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 6298-6306.
Lu J, Yang J, Batra D, et al. Neural Baby Talk[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7219-7228.
Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//CVPR. 2018, 3(5): 6.
梅涛：深度学习为视觉和语言之间搭建了一座桥梁
Image Caption-教程
Lu J, Xiong C, Parikh D, et al. Knowing when to look: Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 375-383.
- paper websit | pdf

Image Sentiment Analysis(图片情感分析)

Deep Learning for Sentiment Analysis: A Survey
- pdf
- 综述论文：情感分析中的深度学习
Hu A, Flaxman S. Multimodal Sentiment Analysis To Explore the Structure of Emotions[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018: 350-358.

Visual Question Answering(视觉问答)

Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering[C]//Proceedings of the IEEE international conference on computer vision. 2015: 2425-2433.
李庆. 基于深度神经网络和注意力机制的图像问答研究[D]. 中国科学技术大学, 2018.

Text to Image(文本转图片)

Johnson J, Gupta A, Fei-Fei L. Image generation from scene graphs[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1219-1228.
Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis[J]. arXiv preprint arXiv:1605.05396, 2016.
- pdf | GitHub-code

基于文本的图像检索

Gu J, Cai J, Joty S R, et al. Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7181-7189.

Image Translation(图像翻译或图像风格迁移)

Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution[C]//European conference on computer vision. Springer, Cham, 2016: 694-711.
- 在看pix2pix，Cycle-GAN，pix2pix HD等Image Translation的文章时常有提到Perceptual Losses for Real-Time Style Transfer and Super-Resolution的Perceptual Loss, 而且李飞飞是文章的作者之一，文章应该不错。
pix2pix
Cycle-GAN
pix2pix HD
Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015.
- 最早在A Neural Algorithm of Artistic Style中提出了Perceptual Loss。

自动驾驶领域

GitHub：车道线检测最全资料集锦

train trick

Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Computer Vision

计算机视觉学习路线

image recognition and image classficiation

Classification + Localization

Object Detection

图像分割

Instance Segmentation(实例分割)

Semantic Segmentation(语义分割)

Panoptic segmentation(全景分割)

image caption

Image Sentiment Analysis(图片情感分析)

Visual Question Answering(视觉问答)

Text to Image(文本转图片)

基于文本的图像检索

Image Translation(图像翻译或图像风格迁移)

自动驾驶领域

train trick

FilesExpand file tree

Computer Vision

Directory actions

More options

Directory actions

More options

Latest commit

History

Computer Vision

Folders and files

parent directory

README.md

Computer Vision

计算机视觉学习路线

image recognition and image classficiation

Classification + Localization

Object Detection

图像分割

Instance Segmentation(实例分割)

Semantic Segmentation(语义分割)

Panoptic segmentation(全景分割)

image caption

Image Sentiment Analysis(图片情感分析)

Visual Question Answering(视觉问答)

Text to Image(文本转图片)

基于文本的图像检索

Image Translation(图像翻译或图像风格迁移)

自动驾驶领域

train trick