陈彪, 周之平, 叶琴. 基于优化注意力机制的图像字幕生成方法研究[J]. 南昌航空大学学报(自然科学版), 2021, 35(1): 28-34, 62. DOI: 10.3969/j.issn.2096-8566.2021.01.005
引用本文: 陈彪, 周之平, 叶琴. 基于优化注意力机制的图像字幕生成方法研究[J]. 南昌航空大学学报(自然科学版), 2021, 35(1): 28-34, 62. DOI: 10.3969/j.issn.2096-8566.2021.01.005
Biao CHEN, Zhi-Ping ZHOU, Qin YE. Research on Image Captioning Generation Method Based on Optimized Attention Mechanism[J]. Journal of nanchang hangkong university(Natural science edition), 2021, 35(1): 28-34, 62. DOI: 10.3969/j.issn.2096-8566.2021.01.005
Citation: Biao CHEN, Zhi-Ping ZHOU, Qin YE. Research on Image Captioning Generation Method Based on Optimized Attention Mechanism[J]. Journal of nanchang hangkong university(Natural science edition), 2021, 35(1): 28-34, 62. DOI: 10.3969/j.issn.2096-8566.2021.01.005

基于优化注意力机制的图像字幕生成方法研究

Research on Image Captioning Generation Method Based on Optimized Attention Mechanism

  • 摘要: 在图像字幕生成任务中,注意力机制处理图像特征时会忽视图像中目标之间联系程度,从而影响到字幕生成质量。针对这一问题,在传统的多头注意力机制的基础上,引入注意力机制优化模块(Optimize Attention,OA)来测量图像中目标之间的关联程度,以引导字幕的产生过程。OA结合多头注意力机制的关注结果和当前上下文(即查询)生成“信息向量”和“注意门”,然后将“注意门”应用于“信息向量”来构造一个新的注意力,并将新的注意力再与“信息向量”进行逐元素相乘以生成图像中目标和背景之间关联程度高的关注信息。在MS COCO数据集上对改进后的模型进行训练和验证,BLEU-1和METEOR评价指标分别达到了75.2%和27.5%。

     

    Abstract: In the task of image captioning generation, the attention mechanism ignores the degree of connection between objects in the image when processing image features, which affects the quality of caption generation. In response to this problem, based on the traditional multi-head attention mechanism, the attention mechanism optimization module (Optimize Attention, OA) is introduced to measure the degree of connection between objects in the image to guide the process of caption generation. OA combines the attention results of the multi-head attention mechanism and the current context (query) to generate “information vector” and “attention gate”, and then applies the “attention gate” to the “information vector” to construct a new attention, and multiply the new attention with the “information vector” to generate an image concerned information about the degree of relevance between objects and background. The improved model was trained and verified on the MS COCO data set, and the BLEU-1 and METEOR evaluation indicators reached 75.2% and 27.5% respectively.

     

/

返回文章
返回