CNN-ViT Hybrid Aesthetic Evaluation Model Based on Quantification of Cognitive Features in Images

Soo-Eun Kim;Joon-Shik Lim;

doi:10.7471/ikeee.2024.28.3.352

Journal of IKEEE (전기전자학회논문지)

Volume 28 Issue 3
/
Pages.352-359
/
2024
/
1226-7244(pISSN)
/
2288-243X(eISSN)

Institute of Korean Electrical and Electronics Engineers (한국전기전자학회)

DOI QR Code

CNN-ViT Hybrid Aesthetic Evaluation Model Based on Quantification of Cognitive Features in Images

이미지의 인지적 특징 정량화를 통한 CNN-ViT 하이브리드 미학 평가 모델

Soo-Eun Kim (College of IT Convergence, Gachon University) ;
Joon-Shik Lim (College of IT Convergence, Gachon University)

김수은 ;
임준식

Received : 2024.09.05
Accepted : 2024.09.24
Published : 2024.09.30

https://doi.org/10.7471/ikeee.2024.28.3.352 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

This paper proposes a CNN-ViT hybrid model that automatically evaluates the aesthetic quality of images by combining local and global features. In this approach, CNN is used to extract local features such as color and object placement, while ViT is employed to analyze the aesthetic value of the image by reflecting global features. Color composition is derived by extracting the primary colors from the input image, creating a color palette, and then passing it through the CNN. The Rule of Thirds is quantified by calculating how closely objects in the image are positioned near the thirds intersection points. These values provide the model with critical information about the color balance and spatial harmony of the image. The model then analyzes the relationship between these factors to predict scores that align closely with human judgment. Experimental results on the AADB image database show that the proposed model achieved a Spearman's Rank Correlation Coefficient (SRCC) of 0.716, indicating more consistent rank predictions, and a Pearson Correlation Coefficient (LCC) of 0.72, which is 2~4% higher than existing models.

본 논문에서는 이미지의 지역적 및 전역적 특징을 결합하여 이미지의 미학적 품질을 자동으로 평가할 수 있는 CNN-ViT 하이브리드 모델을 제안한다. 이 접근 방식에서는 CNN을 사용하여 색상 및 객체 배치와 같은 지역적 특징을 추출하고, ViT를 통해 전역적 특징을 반영하여 이미지의 미학적 가치를 분석한다. Color composition은 입력 이미지에서 주요 색상을 추출해 생성한 컬러 팔레트를 CNN에 통과시켜 얻은 값이며, Rule of Third는 이미지 속 오브젝트가 삼등분할점에 얼마나 근접한지를 정량적으로 평가한 점수로 사용된다. 이러한 값들은 모델에 이미지의 주요 평가 요소인 색채와 공간 균형에 대한 정보를 제공한다. 모델은 이를 바탕으로 이미지의 점수와 색상, 공간의 균형 간에 연관성을 분석하며, 인간의 평가 분포와 유사한 점수를 추측하도록 설계되었다. 실험 결과, AADB 이미지 데이터베이스에서 스피어만순위상관계수(SRCC)에서는 0.716을 기록하여 순위 예측에서 더 일관된 결과를 제공 했으며, 피어슨상관계수(LCC)에서도 0.72을 기록하여 기존 연구 모델보다 2~4% 정도 향상된 결과를 보였다.

Keywords

References

Lo, K. Y., Liu, K. H., & Chen, C. S., "Assessment of photo aesthetics with efficiency," Proc. of the 21st International Conference on Pattern Recognition (ICPR2012), pp.2186-2189, IEEE, 2012.
Stepanova, E., "The impact of color palettes on the prices of paintings," Empirical Economics, vol.56, no.2, pp.755-773, 2019.
Karp, A., & Itten, J., "The elements of color," Leonardo, vol.5, no.2, p.180, 1972.
O'Donovan, P., Agarwala, A., & Hertzmann, A., "Color compatibility from large datasets," ACM Transactions on Graphics, vol.30, no.4, pp. 1-12, 2011. DOI: 10.1145/2010324.1964958
Tan, J. C., Echevarria, J., & Gingold, Y., "Efficient palette-based decomposition and recoloring of images via RGBXY-space geometry," ACM Transactions on Graphics, vol.37, no.6, pp.1-10, 2018. DOI: 10.1145/3272127.3275054
Lu, P., Yu, J. B., & Peng, X. J., "Deep conditional color harmony model for image aesthetic assessment," Proc. of the 2018 24th International Conference on Pattern Recognition (ICPR), pp.2845-2850, IEEE, 2018. DOI: 10.1109/icpr.2018.8546328.
Schloss, K. B., & Palmer, S. E., "Aesthetics of color combinations," SPIE Proc. of Human Vision and Electronic Imaging XV, SPIE, 2010. DOI: 10.1117/12.849111.
Leder, H., Belke, B., Oeberst, A., & Augustin, D., "A model of aesthetic appreciation and aesthetic judgments," Brit. J. Psychol. vol.95, no.4, pp.489-508, 2004. DOI: 10.1348/0007126042369811.
Anwar, A., et al., "A survey on image aesthetic assessment," arXiv preprint, arXiv:2103.11616, 2021.
Schultze, S., Withoft, A., Abdenebaoui, L., & Boll, S., "Explaining Image Aesthetics Assessment: An Interactive Approach," Proc. of the 2023 ACM International Conference on Multimedia Retrieval, pp.20-28, 2023. DOI: 10.1145/3591106.3592217
Perona, F. R., Flores Gallego, M. J., & Puerta Callejon, J. M., "An application for aesthetic quality assessment in photography with interpretability features," Entropy, vol.23, no.11, p.1389, 2021. DOI: 10.3390/e23111389
Deng, Y., Loy, C. C., & Tang, X., "Image aesthetic assessment: An experimental survey," IEEE Signal Process. Mag. vol.34, no.4, pp.80-106, 2017. DOI: 10.1109/MSP.2017.2696576
Schwarz, K., Wieschollek, P., & Lensch, H. P., "Will people like your image? Learning the aesthetic space," Proc. of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.2048-2057, IEEE, 2018. DOI: 10.1109/WACV.2018.00226
Tong, J., et al., "An interpretable approach for automatic aesthetic assessment of remote sensing images," Front. Comput. Neurosci., vol.16, p.1077439, 2022. DOI: 10.3389/fncom.2022.1077439
Li, C., & Chen, T., "Aesthetic visual quality assessment of paintings," IEEE Journal of Selected Topics in Signal Processing, vol.3, no.2, pp. 236-252, 2009. DOI: 10.1109/JSTSP.2009.2015077
Kong, S., Shen, X., Lin, Z., Mech, R., & Fowlkes, C., "Photo aesthetics ranking network with attributes and content adaptation," Proc. of the 14th European Conference on Computer Vision (ECCV 2016), pp.662-679, Springer, 2016. DOI: 10.48550/arXiv.1606.01621
Droste, M., Bauhaus 1919~1933, Dansk produktion: Book Service I/S, Copenhagen, Berlin, 1990.
Nishiyama, M., Okabe, T., Sato, I., & Sato, Y., "Aesthetic quality classification of photographs based on color harmony," Proc. of CVPR 2011, pp.33-40, IEEE, 2011. DOI: 10.1109/CVPR.2011.5995539
Krizhevsky, A., Sutskever, I., & Hinton, G. E., "ImageNet classification with deep convolutional neural networks," Commun. ACM, vol.60, pp. 84-90, 2017. DOI: 10.1145/3065386.
Lu, X., Lin, Z., Jin, H., Yang, J., & Wang, J. Z., "RAPID: Rating pictorial aesthetics using deep learning," Proc. of the 22nd ACM International Conference on Multimedia, pp.457-466, 2014. DOI: 10.1145/2647868.2654927.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H., "MobileNets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint, arXiv:1704.04861, 2017. DOI: 10.48550/arXiv.1704.04861
Simonyan, K., & Zisserman, A., "Very deep convolutional networks for large-scale image recognition," arXiv preprint, arXiv:1409.1556, 2014. DOI: 10.48550/arXiv.1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A., "Going deeper with convolutions," Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1-9, IEEE, 2015. DOI: 10.1109/CVPR.2015.7298594.
Talebi, H., & Milanfar, P., "NIMA: Neural image assessment," IEEE Transactions on Image Processing, vol.27, no.8, pp.3998-4011, 2018. DOI: 10.1109/TIP.2018.2831899
Wu, O., Hu, W., & Gao, J., "Learning to predict the perceived visual quality of photos," Proc. of the 2011 International Conference on Computer Vision, pp.225-232, IEEE, 2011. DOI: 10.1109/ICCV.2011.6126246.
Kong, S., Shen, X., Lin, Z., Mech, R., & Fowlkes, C., "Photo aesthetics ranking network with attributes and content adaptation," Proc. of the 14th European Conference on Computer Vision (ECCV 2016), pp.662-679, Springer, 2016. DOI: 10.48550/arXiv.1606.01621
Gao, F., Li, Z., Yu, J., Yu, J., Huang, Q., & Tian, Q., "Style-adaptive photo aesthetic rating via convolutional neural networks and multi-task learning," Neurocomputing, vol.395, pp.247-254, 2020. DOI: 10.1016/j.neucom.2018.06.099.
Murray, N., Marchesotti, L., & Perronnin, F., "AVA: A large-scale database for aesthetic visual analysis," Proc. of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2408-2415, IEEE, 2012. DOI: 10.1109/CVPR.2012.6247954
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N., "An image is worth 16×16 words: Transformers for image recognition at scale," arXiv preprint, arXiv:2010.11929, 2020. DOI: 10.48550/arXiv.2010.11929
Gulati, A., et al., "Conformer: Convolution-augmented transformer for speech recognition," arXiv preprint, arXiv:2005.08100, 2020. DOI: 10.48550/arXiv.2005.08100
Li, K., Wang, Y., Gao, P., Song, G., Liu, Y., Li, H., & Qiao, Y., "UniFormer: Unified transformer for efficient spatiotemporal representation learning," arXiv preprint, arXiv:2201.04676, 2022. DOI: 10.48550/arXiv.2201.04676
Lokeshdhakar, "Color Thief," Lokesh Dhakar's Blog, [Available online](http://lokeshdhakar.com/projects/color-thief), accessed September 2023.
Ke, Y., et al., "Image aesthetics assessment using composite features from transformer and CNN," Multimedia Systems, vol.29, no.5, pp.2483-2494, 2023.
Li, Xuewei, et al. "A novel feature fusion method for computing image aesthetic quality." IEEE access, vol.8, pp.63043-63054. 2020. DOI: 10.1109/ACCESS.2020.2983725

Journal of IKEEE (전기전자학회논문지)

CNN-ViT Hybrid Aesthetic Evaluation Model Based on Quantification of Cognitive Features in Images

이미지의 인지적 특징 정량화를 통한 CNN-ViT 하이브리드 미학 평가 모델

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)