Abstract:
In the task of image captioning generation, the attention mechanism ignores the degree of connection between objects in the image when processing image features, which affects the quality of caption generation. In response to this problem, based on the traditional multi-head attention mechanism, the attention mechanism optimization module (Optimize Attention, OA) is introduced to measure the degree of connection between objects in the image to guide the process of caption generation. OA combines the attention results of the multi-head attention mechanism and the current context (query) to generate “information vector” and “attention gate”, and then applies the “attention gate” to the “information vector” to construct a new attention, and multiply the new attention with the “information vector” to generate an image concerned information about the degree of relevance between objects and background. The improved model was trained and verified on the MS COCO data set, and the BLEU-1 and METEOR evaluation indicators reached 75.2% and 27.5% respectively.