First-feed LSTM model for video description

JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOM ›› 2016, Vol. 23 ›› Issue (3): 89-93.doi:

• Others • Previous Articles Next Articles

First-feed LSTM model for video description

Received:2016-02-11 Revised:2016-03-17 Online:2016-06-28 Published:2016-07-05

Abstract

Abstract:

Video description aims to automatically generate descriptive natural language for videos. With its successful implementations and a broad range of applications, lots of work based on Deep Neural Network (DNN) models have been put forward by researchers. This paper takes inspiration from an image caption model and develops an end-to-end video description model using Long Short-Term Memory (LSTM). Single video feature is fed to the first unit of LSTM decoder, and subsequent words of sentence are generated on previous predicted words. Experimental results on two publicly available datasets demonstrate that the performance of the proposed model outperforms that of baseline.

Key words:

computer vision, natural language processing, video description, deep neural network (DNN)

References

Metrics

Comments

Copyright © 2020 The Journal of China Universities of Posts and Telecommunications
　 Adress: P.O. Box 231,Beijing University of Posts and Telecommunications,10 Xi Tucheng Road,Beijing 100876,P.R.China　Post Code: 100081
Tel：86-010-62282493　Fax： 86-010-62283461　E-mail: jchupt@bupt.edu.cn
Support by: Beijing Magtech Co.Ltd

First-feed LSTM model for video description

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments