DOI QR코드

DOI QR Code

Enhancing A Neural-Network-based ISP Model through Positional Encoding

위치 정보 인코딩 기반 ISP 신경망 성능 개선

  • Received : 2024.06.15
  • Accepted : 2024.07.05
  • Published : 2024.07.25

Abstract

The Image Signal Processor (ISP) converts RAW images captured by the camera sensor into user-preferred sRGB images. While RAW images contain more meaningful information for image processing than sRGB images, RAW images are rarely shared due to their large sizes. Moreover, the actual ISP process of a camera is not disclosed, making it difficult to model the inverse process. Consequently, research on learning the conversion between sRGB and RAW has been conducted. Recently, the ParamISP[1] model, which directly incorporates camera parameters (exposure time, sensitivity, aperture size, and focal length) to mimic the operations of a real camera ISP, has been proposed by advancing the simple network structures. However, existing studies, including ParamISP[1], have limitations in modeling the camera ISP as they do not consider the degradation caused by lens shading, optical aberration, and lens distortion, which limits the restoration performance. This study introduces Positional Encoding to enable the camera ISP neural network to better handle degradations caused by lens. The proposed positional encoding method is suitable for camera ISP neural networks that learn by dividing the image into patches. By reflecting the spatial context of the image, it allows for more precise image restoration compared to existing models.

영상 신호 프로세서(Image Signal Processor, ISP)는 카메라 센서로부터 획득된 RAW 영상을 사람의 눈에 보기 좋은 sRGB 영상으로 변환한다. RAW 영상은 sRGB 영상에 비해 영상 처리에 도움이 되는 정보를 가지고 있지만 상대적으로 큰 용량으로 인해 주로 sRGB 영상만 저장되고 사용된다. 또한, 실제 카메라의 ISP 과정이 공개되어 있지 않아 그 역과정을 모사하는 것은 매우 어렵다. 이에 sRGB와 RAW 영상의 상호 변환을 위한 카메라 ISP 모델링 연구가 활발히 진행되고 있으며, 최근 기존의 단순한 ISP 신경망 구조를 고도화하고 실제 카메라 ISP의 동작과 유사하게 카메라 파라미터(노출 시간, 감도, 조리개 크기, 초점 거리)를 직접 반영하는 ParamISP[1] 모델이 제안되었다. 하지만 ParamISP[1]를 포함한 기존의 연구는 카메라 ISP를 모델링함에 있어 렌즈로 인해 발생하는 렌즈 쉐이딩(Lens Shading), 광학 수차(Optical Aberration), 렌즈 왜곡(Lens Distortion) 등을 고려하지 않아 복원 성능에 한계가 있다. 본 연구는 ISP 신경망이 렌즈로 인해 발생하는 열화를 보다 잘 다룰 수 있도록 위치 정보 인코딩(Positional Encoding)을 도입한다. 제안하는 위치 정보 인코딩 기법은 영상을 분할하여 패치(Patch) 단위로 학습하는 카메라 ISP 신경망에 적합하며 기존 모델에 비해 영상의 공간적 맥락을 반영할 수 있어 더욱 정교한 영상 복원을 가능하게 한다.

Keywords

Acknowledgement

이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단(2023R1A2C200494611)의 지원 및 정보통신기획평가원(No.2019-0-01906, 인공지능대학원지원(포항공과대학교))의 지원을 받아 수행된 연구임.

References

  1. Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho. ParamISP: Learned Forward and Inverse ISPs using Camera Parameters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 
  2. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proceedings of the European Conference on Computer Vision (ECCV). 
  3. Duc-Tien Dang-Nguyen, Cecilia Pasquini, Valentina Conot- ter, and Giulia Boato. RAISE: A raw images dataset for dig- ital image forensics. In Proceedings of the 6th ACM multi- media systems conference (MMSys), 2015. 2, 5, 6 
  4. Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2018. 
  5. Ben Mildenhall, Jonathan T Barron, Jiawen Chen, Dillon Sharlet, Ren Ng, and Robert Carroll. Burst denoising with kernel prediction networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 
  6. Eli Schwartz, Raja Giryes, and Alex M Bronstein. DeepISP: Toward learning an end-to-end image processing pipeline. IEEE Transactions on Image Processing (TIP), 28(2):912- 923, 2018. 
  7. Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to see in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 
  8. Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, and Jia-Bin Huang. Single-image hdr reconstruction by learning to reverse the camera pipeline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 
  9. Mahmoud Afifi, Abdelrahman Abdelhamed, Abdullah Abuolaim, Abhijith Punnappurath, and Michael S Brown. CIE XYZ Net: Unprocessing images for low-level computer vision tasks. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence (TPAMI), 44(9):4688-4700, 2021. 
  10. Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, and Jonathan T Barron. Unprocessing im- ages for learned raw denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 
  11. Marcos V. Conde, Steven McDonagh, Matteo Maggioni, Ales Leonardis, and Eduardo Perez-Pellitero. Model-based image signal processors via learnable dictionaries. Pro- ceedings of the AAAI Conference on Artificial Intelligence (AAAI), 36(1):481-489, 2022. 
  12. Yazhou Xing, Zian Qian, and Qifeng Chen. Invertible image signal processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 
  13. Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Cycleisp: Real image restoration via improved data synthesis. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2020. 
  14. Nimrod Shabtay, Eli Schwartz, and Raja Giryes. PIP: Positional Encoding Image Prior. 
  15. Qiaole Dong, Chenjie Cao, and Yanwei Fu. Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2022.