Abstract: Video caption aims to generate descriptive sentences about the video, and the most critical problem is how to achieve accurate word prediction with standardized and coherent syntax structure ...