KAIST Image and Video Systems lab

Overview

Lab Introduction Image and video systems (IVY) Lab at KAIST, was founded in 1997 and has been led by Prof. Ro since its establishment. Among the years IVY Lab has been conducting research in a wide spectrum of image and video systems research topics. Among those topics; image processing, computer vision, visual recognition, deep learning and machine learning, medical image processing and video representation/compression. IVY Lab has produced about 110 journal papers and 250 conference papers over the last years. The collaborative lab environment and the enthusiasm of its members have made it be in touch with the latest developments of standards and industry. For example the lab has developed the homogeneous texture descriptor for the MPEG standard, ROI descriptor in SVC and various description schemes in user characteristics as a part of the MPEG standard. In addition, in recent years, the lab has accomplished several outstanding research achievements: Deep learning based visual recognition, Face recognition and face expression recognition, Color face recognition for degraded images, Visual discomfort prediction and reduction of stereoscopic 3D contents, Semantic concept based Near-duplicated video clip detection, and Computer-aided detection (CAD) system for digital mammogram. The lab is continuously working hand to hand with industry to be able to innovate and challenge the state of the art in multiple aspects of the image and video systems. Currently the lab is interested in the following research topics: 　

- Deep learning and machine learning on Image processing and computer vision
- High-performance face recognition, Emotion recognition
- Automatic object and action detection/recognition.
- Medical image processing and Computer Aided Diagnostic (CAD) systems.
- 3D rendering/processing, S3D quality measurement.
- Video signature/video analysis
- Large scale image/video retrieval.

Basic Research in Human Perception We are trying to more understand human visual system for more efficient visual effect in image/video system. The work we have done with human perception and its application for image/video system can be shown in project demo. For example we have proposed texture feature based on human visual system. We also are trying to understand the human visual perception. Functional MRI could provide a human brain function non-invasively. Furthermore it is known there are strong evidence one can see the brain response directly with magnetic resonance image. IVY Lab. focuses this direct observation of brain visual function.

Deep learning for Visual Image and Video Recognition In the recent years, the advances in processing power and breakthroughs in learning algorithms have made deep learning resurface in the machine learning area. It has been efficiently utilized in various image and video applications including action recognition, object detection and recognition, etc. In IVY lab, we are working on designing and developing new network architectures, and learning methods to provide deep representations for interpreting visual content in images and videos. A brief description of some of the lab’s activities on topics regarding deep learning for visual image and video recognition are described below:

Deep Learning Analysis for Image/Video Understanding:

Understanding the context of images and videos is a crucial in a multitude of applications, such as, detecting and recognizing objects, summarizing video/image contents and providing meaningful tags for retrieval purposes, or even assisting physician’s interpret medical images to aid the diagnosis of the patients. While previous hand-crafted methods were able to provide low to mid-level interpretations of the context of images and videos, deep learning techniques provide deeper representations of the visual data with higher levels of abstractions, which enables to provide richer understanding of the visual content. Here at IVY lab, we work on designing deep architectures that can provide fast and robust interpretations of the visual content that are utilized in real world applications such as object detection, recognition, expression recognition and medical imaging.

Deep Learning Architectures for Spatio-Temporal Data Analysis:

While visual contents provide a large amount of information, spatio-temporal data provides a richer platform that can be utilized in order to further understand the video contents. In IVY lab, we are interested in designing deep architectures that can analyze the spatio-temporal visual data in order to provide effective representation of video contents. IVY lab is currently investigating dynamics of objects and developing spatio-temporal visual data representation for performing high- performance object identification.

3D Video Processing for Visual Realistic Experience Here in IVY lab, we have long been interested in 3D content generation and processing. One of the main challenges that IVY lab is currently working on is designing objective methods for assessing the overall quality of the 3D content in order to efficiently improve the visual quality. IVY lab is also working side by side with industry in order to develop state of the art methods and technologies for free view (360 degrees) video generation, 360 video processing and holography. A brief description of some of the lab’s activities on topics regarding 3D contents generation and processing are described below:

VR (virtual reality), AR (augmented reality) Video Processing

In order to achieve immersive 360 degrees video content for the viewers, it is necessary to generate spatio- temporal consistent 4D videos considering the factors that affect visual discomfort in human visual systems. To that end, IVY lab has recently developed a novel multi-view generation method, with spatio-temporal consistency and binocular symmetry based on global optimization for VR and AR TV. Improving 4D video processing techniques, such as video stitching and minimizing the visual distortion and VR/AR visual discomfort is a particular interest of IVY lab. To that end, we consider the characteristics of human visual systems as well as 4D VR/AR video contents to improve the quality of the generated 360 degrees VR/AR videos.

3D video Quality Assessment

There increasing availability of 3D content and commercial products have prompted and increased necessity for providing an enhanced and immersive viewing experience in autostereoscopic display and free viewpoint television. However, the easy access to such 3D content has also raised concerns regarding the viewing safety problems associated with the viewing of stereoscopic images in stereoscopic 3D displays or free-viewpoint displays. In IVY lab, we have been interested in assessing the perceptual quality of the 3D images and videos. IVY researchers have proposed visual comfort assessment metrics that takes into considerations various discomfort factors such as excessive screen disparity, excessive disparity difference, window violations, etc. Moreover, we are trying to devise objective quality assessment for 360 degree video or holographic videos.

Computer Generated Holography

Computer Generated Holograms (CGH) are holographic interference patterns that can be embedded into various materials. Because the mass production of computer simulated holograms is fairly affordable, CGH technologies have a lot of potential various industries and applications. However, the heavy computational cost required to calculate the holographic interference patterns of the 3D objects is still a major problem in this field. IVY lab is working on analyzing and utilizing the sparsity of CGH in order to develop fast CGH generation algorithms.