Image Aesthetic Score Prediction using Image Captioning

Archive

Review Article

Volume 3 Issue 1

Image Aesthetic Score Prediction using Image Captioning

Aakash Pandit*, Animesh, Bhuvesh Kumar Gautam and Ritu Agarwal

June 29, 2023

View PDF

Abstract

Different kinds of images induce different kinds of stimuli in humans. Certain types of images tend to activate specific parts of our brain. Professional photographers use methods and techniques like rule of thirds, exposure, etc, to click an appealing photograph. Image aesthetic is a partially subjective topic as there are some aspects of the image that are more appealing to the person’s eye than the others, and the paper presents a novel technique to generate a typical score of the quality of an image by using the image captioning technique. The model for Image Captioning model has been trained using Convolutional Neural Network, Long Short Term Memory, Recurrent Neural Networks and Attention Layer. After the Image caption generation we made, a textual analysis is done using RNN- LSTM, embedding layer, LSTM layer, and sigmoid function and then the score of the image is predicted for its aesthetic quality.

Keywords: Image Aesthetic; Convolutional Neural Network; Long Short Term Memory; Recurrent Neural Networks; Attention Layer; Embedding Layer; Image Captioning

References

AM Obeso., et al. “Forward- backward visual saliency propagation in Deep NNs vs internal attentional mechanisms”. 2019 9th International Conference on Image Processing Theory, Tools and Applications, IPTA (2019).
V Mnih., et al. “Recurrent models of visual attention”. Advances in Neural Information Processing Systems 3 (2014): 2204-2212.
X He and Y Peng. “Fine-grained image classification via combining vision and language”. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2017): 7332-7340.
X Bai., et al. “Integrating scene text and visual appearance for fine-grained image classification”. IEEE Access 6 (2018): 66322-66335.
Z Yu., et al. “Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering”. IEEE Transactions on Neural Networks and Learning Systems 29.12 (2018): 5947-5959.
Y Zhou., et al. “Joint image and text representation for aesthetics analysis”. MM 2016 - Proceedings of the 2016 ACM Multimedia Conference (2016): 262-266.
X Tang, W Luo and X Wang. “Content-based photo quality assessment”. IEEE Transactions on Multimedia 15.8 (2013): 1930-1943.
R Datta., et al. “Studying aesthetics in photographic images using a computational approach”. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3953 (2006): 288-301.
L Guo., et al. “Image esthetic assessment using both hand-crafting and semantic features”. Neurocomputing 143 (2014): 14-26.
M Nishiyama., et al. “Aesthetic quality classification of photographs based on color harmony”. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011): 33-40.
DY Kao, R He and K Huang. “Deep Aesthetic Quality Assessment with Semantic Information”. IEEE Transactions on Image Processing 26.3 (2017): 1482-1495.
M Kucer., et al. “Predicting Image Aesthetics”. 27.10 (2018): 5100-5112.
Y Chen., et al. “Engineering deep representations for modeling aesthetic perception”. IEEE Transactions on Cybernetics 48.11 (2018): 3092-3104.
Y Luo and X Tang. “Photo and Video Quality Evaluation”. Quality 8.08 (2008): 386-399.
Y Ke, X Tang and F Jing. “The design of high-level features for photo quality assessment”. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1 (2006): 419-426.
K Xu., et al. “Show, attend and tell: Neural image caption generation with visual attention”. 32nd International Conference on Machine Learning, ICML 3 (2015): 2048-2057.
S Kong., et al. “Photo aesthetics ranking network with attributes and content adaptation”. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9905 (2016): 662-679.
TY Lin., et al. “Microsoft COCO: Common objects in context”. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5 (2014): 740-755.