DOI QR코드

DOI QR Code

CNN-ViT Hybrid Aesthetic Evaluation Model Based on Quantification of Cognitive Features in Images

이미지의 인지적 특징 정량화를 통한 CNN-ViT 하이브리드 미학 평가 모델

  • Soo-Eun Kim (College of IT Convergence, Gachon University) ;
  • Joon-Shik Lim (College of IT Convergence, Gachon University)
  • 김수은 ;
  • 임준식
  • Received : 2024.09.05
  • Accepted : 2024.09.24
  • Published : 2024.09.30

Abstract

This paper proposes a CNN-ViT hybrid model that automatically evaluates the aesthetic quality of images by combining local and global features. In this approach, CNN is used to extract local features such as color and object placement, while ViT is employed to analyze the aesthetic value of the image by reflecting global features. Color composition is derived by extracting the primary colors from the input image, creating a color palette, and then passing it through the CNN. The Rule of Thirds is quantified by calculating how closely objects in the image are positioned near the thirds intersection points. These values provide the model with critical information about the color balance and spatial harmony of the image. The model then analyzes the relationship between these factors to predict scores that align closely with human judgment. Experimental results on the AADB image database show that the proposed model achieved a Spearman's Rank Correlation Coefficient (SRCC) of 0.716, indicating more consistent rank predictions, and a Pearson Correlation Coefficient (LCC) of 0.72, which is 2~4% higher than existing models.

본 논문에서는 이미지의 지역적 및 전역적 특징을 결합하여 이미지의 미학적 품질을 자동으로 평가할 수 있는 CNN-ViT 하이브리드 모델을 제안한다. 이 접근 방식에서는 CNN을 사용하여 색상 및 객체 배치와 같은 지역적 특징을 추출하고, ViT를 통해 전역적 특징을 반영하여 이미지의 미학적 가치를 분석한다. Color composition은 입력 이미지에서 주요 색상을 추출해 생성한 컬러 팔레트를 CNN에 통과시켜 얻은 값이며, Rule of Third는 이미지 속 오브젝트가 삼등분할점에 얼마나 근접한지를 정량적으로 평가한 점수로 사용된다. 이러한 값들은 모델에 이미지의 주요 평가 요소인 색채와 공간 균형에 대한 정보를 제공한다. 모델은 이를 바탕으로 이미지의 점수와 색상, 공간의 균형 간에 연관성을 분석하며, 인간의 평가 분포와 유사한 점수를 추측하도록 설계되었다. 실험 결과, AADB 이미지 데이터베이스에서 스피어만순위상관계수(SRCC)에서는 0.716을 기록하여 순위 예측에서 더 일관된 결과를 제공 했으며, 피어슨상관계수(LCC)에서도 0.72을 기록하여 기존 연구 모델보다 2~4% 정도 향상된 결과를 보였다.

Keywords

References

  1. Lo, K. Y., Liu, K. H., & Chen, C. S., "Assessment of photo aesthetics with efficiency," Proc. of the 21st International Conference on Pattern Recognition (ICPR2012), pp.2186-2189, IEEE, 2012. 
  2. Stepanova, E., "The impact of color palettes on the prices of paintings," Empirical Economics, vol.56, no.2, pp.755-773, 2019. 
  3. Karp, A., & Itten, J., "The elements of color," Leonardo, vol.5, no.2, p.180, 1972. 
  4. O'Donovan, P., Agarwala, A., & Hertzmann, A., "Color compatibility from large datasets," ACM Transactions on Graphics, vol.30, no.4, pp. 1-12, 2011. DOI: 10.1145/2010324.1964958 
  5. Tan, J. C., Echevarria, J., & Gingold, Y., "Efficient palette-based decomposition and recoloring of images via RGBXY-space geometry," ACM Transactions on Graphics, vol.37, no.6, pp.1-10, 2018. DOI: 10.1145/3272127.3275054 
  6. Lu, P., Yu, J. B., & Peng, X. J., "Deep conditional color harmony model for image aesthetic assessment," Proc. of the 2018 24th International Conference on Pattern Recognition (ICPR), pp.2845-2850, IEEE, 2018. DOI: 10.1109/icpr.2018.8546328. 
  7. Schloss, K. B., & Palmer, S. E., "Aesthetics of color combinations," SPIE Proc. of Human Vision and Electronic Imaging XV, SPIE, 2010. DOI: 10.1117/12.849111. 
  8. Leder, H., Belke, B., Oeberst, A., & Augustin, D., "A model of aesthetic appreciation and aesthetic judgments," Brit. J. Psychol. vol.95, no.4, pp.489-508, 2004. DOI: 10.1348/0007126042369811. 
  9. Anwar, A., et al., "A survey on image aesthetic assessment," arXiv preprint, arXiv:2103.11616, 2021. 
  10. Schultze, S., Withoft, A., Abdenebaoui, L., & Boll, S., "Explaining Image Aesthetics Assessment: An Interactive Approach," Proc. of the 2023 ACM International Conference on Multimedia Retrieval, pp.20-28, 2023. DOI: 10.1145/3591106.3592217 
  11. Perona, F. R., Flores Gallego, M. J., & Puerta Callejon, J. M., "An application for aesthetic quality assessment in photography with interpretability features," Entropy, vol.23, no.11, p.1389, 2021. DOI: 10.3390/e23111389 
  12. Deng, Y., Loy, C. C., & Tang, X., "Image aesthetic assessment: An experimental survey," IEEE Signal Process. Mag. vol.34, no.4, pp.80-106, 2017. DOI: 10.1109/MSP.2017.2696576 
  13. Schwarz, K., Wieschollek, P., & Lensch, H. P., "Will people like your image? Learning the aesthetic space," Proc. of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.2048-2057, IEEE, 2018. DOI: 10.1109/WACV.2018.00226 
  14. Tong, J., et al., "An interpretable approach for automatic aesthetic assessment of remote sensing images," Front. Comput. Neurosci., vol.16, p.1077439, 2022. DOI: 10.3389/fncom.2022.1077439 
  15. Li, C., & Chen, T., "Aesthetic visual quality assessment of paintings," IEEE Journal of Selected Topics in Signal Processing, vol.3, no.2, pp. 236-252, 2009. DOI: 10.1109/JSTSP.2009.2015077 
  16. Kong, S., Shen, X., Lin, Z., Mech, R., & Fowlkes, C., "Photo aesthetics ranking network with attributes and content adaptation," Proc. of the 14th European Conference on Computer Vision (ECCV 2016), pp.662-679, Springer, 2016. DOI: 10.48550/arXiv.1606.01621 
  17. Droste, M., Bauhaus 1919~1933, Dansk produktion: Book Service I/S, Copenhagen, Berlin, 1990. 
  18. Nishiyama, M., Okabe, T., Sato, I., & Sato, Y., "Aesthetic quality classification of photographs based on color harmony," Proc. of CVPR 2011, pp.33-40, IEEE, 2011. DOI: 10.1109/CVPR.2011.5995539 
  19. Krizhevsky, A., Sutskever, I., & Hinton, G. E., "ImageNet classification with deep convolutional neural networks," Commun. ACM, vol.60, pp. 84-90, 2017. DOI: 10.1145/3065386. 
  20. Lu, X., Lin, Z., Jin, H., Yang, J., & Wang, J. Z., "RAPID: Rating pictorial aesthetics using deep learning," Proc. of the 22nd ACM International Conference on Multimedia, pp.457-466, 2014. DOI: 10.1145/2647868.2654927. 
  21. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H., "MobileNets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint, arXiv:1704.04861, 2017. DOI: 10.48550/arXiv.1704.04861 
  22. Simonyan, K., & Zisserman, A., "Very deep convolutional networks for large-scale image recognition," arXiv preprint, arXiv:1409.1556, 2014. DOI: 10.48550/arXiv.1409.1556 
  23. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A., "Going deeper with convolutions," Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, IEEE, 2015. DOI: 10.1109/CVPR.2015.7298594. 
  24. Talebi, H., & Milanfar, P., "NIMA: Neural image assessment," IEEE Transactions on Image Processing, vol.27, no.8, pp.3998-4011, 2018. DOI: 10.1109/TIP.2018.2831899 
  25. Wu, O., Hu, W., & Gao, J., "Learning to predict the perceived visual quality of photos," Proc. of the 2011 International Conference on Computer Vision, pp.225-232, IEEE, 2011. DOI: 10.1109/ICCV.2011.6126246. 
  26. Kong, S., Shen, X., Lin, Z., Mech, R., & Fowlkes, C., "Photo aesthetics ranking network with attributes and content adaptation," Proc. of the 14th European Conference on Computer Vision (ECCV 2016), pp.662-679, Springer, 2016. DOI: 10.48550/arXiv.1606.01621 
  27. Gao, F., Li, Z., Yu, J., Yu, J., Huang, Q., & Tian, Q., "Style-adaptive photo aesthetic rating via convolutional neural networks and multi-task learning," Neurocomputing, vol.395, pp.247-254, 2020. DOI: 10.1016/j.neucom.2018.06.099. 
  28. Murray, N., Marchesotti, L., & Perronnin, F., "AVA: A large-scale database for aesthetic visual analysis," Proc. of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2408-2415, IEEE, 2012. DOI: 10.1109/CVPR.2012.6247954 
  29. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N., "An image is worth 16×16 words: Transformers for image recognition at scale," arXiv preprint, arXiv:2010.11929, 2020. DOI: 10.48550/arXiv.2010.11929 
  30. Gulati, A., et al., "Conformer: Convolution-augmented transformer for speech recognition," arXiv preprint, arXiv:2005.08100, 2020. DOI: 10.48550/arXiv.2005.08100 
  31. Li, K., Wang, Y., Gao, P., Song, G., Liu, Y., Li, H., & Qiao, Y., "UniFormer: Unified transformer for efficient spatiotemporal representation learning," arXiv preprint, arXiv:2201.04676, 2022. DOI: 10.48550/arXiv.2201.04676 
  32. Lokeshdhakar, "Color Thief," Lokesh Dhakar's Blog, [Available online](http://lokeshdhakar.com/projects/color-thief), accessed September 2023. 
  33. Ke, Y., et al., "Image aesthetics assessment using composite features from transformer and CNN," Multimedia Systems, vol.29, no.5, pp.2483-2494, 2023. 
  34. Li, Xuewei, et al. "A novel feature fusion method for computing image aesthetic quality." IEEE access, vol.8, pp.63043-63054. 2020. DOI: 10.1109/ACCESS.2020.2983725