中国邮电高校学报(英文) ›› 2016, Vol. 23 ›› Issue (3): 89-93.doi:

• Others • 上一篇    下一篇

First-feed LSTM model for video description

汪悦,王小捷,毛宇兆   

  1. 北京邮电大学
  • 收稿日期:2016-02-11 修回日期:2016-03-17 出版日期:2016-06-28 发布日期:2016-07-05
  • 通讯作者: 王小捷 E-mail:xjwang@bupt.edu.cn

First-feed LSTM model for video description

  • Received:2016-02-11 Revised:2016-03-17 Online:2016-06-28 Published:2016-07-05

摘要: Video description aims to automatically generate descriptive natural language for videos. With its successful implementations and a broad range of applications, lots of work based on Deep Neural Network (DNN) models have been put forward by researchers. This paper takes inspiration from an image caption model and develops an end-to-end video description model using Long Short-Term Memory (LSTM). Single video feature is fed to the first unit of LSTM decoder, and subsequent words of sentence are generated on previous predicted words. Experimental results on two publicly available datasets demonstrate that the performance of the proposed model outperforms that of baseline.

关键词:

计算机视觉,自然语言处理,视频描述,深度网络

Abstract:

Video description aims to automatically generate descriptive natural language for videos. With its successful implementations and a broad range of applications, lots of work based on Deep Neural Network (DNN) models have been put forward by researchers. This paper takes inspiration from an image caption model and develops an end-to-end video description model using Long Short-Term Memory (LSTM). Single video feature is fed to the first unit of LSTM decoder, and subsequent words of sentence are generated on previous predicted words. Experimental results on two publicly available datasets demonstrate that the performance of the proposed model outperforms that of baseline.

Key words:

computer vision, natural language processing, video description, deep neural network (DNN)