Redefining Open Minds: 如何找研究題目(二)：結合兩種以上不同的主題

如何找研究題目？

(How to come up with new research ideas?)

Jia-Bin Huang

jbhuang0604@gmail.com

Latest update: April 3rd, 2010

To steal ideas from one person is plagiarism. To steal from many is research. - Wilson Mizner

二、結合兩種以上不同的主題 　　　　neXt = X+Y

B. 當X,Y為問題

　　這類題目主要是將兩個以上的問題X, Y做結合。由於兩個問題本身有相關性的緣故，因此同時考慮兩個問題的解法有時會比各自單獨解各別問題效果來得更好。要運用這類找問題方法的重點在於分析問題與問題之間的關聯性在哪？了解各自解決時可能會出現的問題，當合起來一併解決時才會是有意義的結合。

EX 1. High Dynamic Imaging + Image Deblurring

　　High Dynamic Range Imaging 這個問題是在處理在高反差的場景的攝影問題，因為相機的感光元件沒有辦法捕捉這樣光線的反差，所以容易拍出過曝或是曝光不足的照片。一般的作法是架三腳架然後拍攝多張曝光程度不一的影像(e.g, 包圍曝光)，之後再用軟體合成一張曝光正確的影像。
　　Image Deblurring 這個問題起源於在光線不足的場景，為取得正確曝光量相機區要較長的曝光時間，此時若手持相機便容易產生影像模糊，如何正確估計晃動時對應的Blur Kernel是Image deblurring的成功關鍵。
　　這兩個問題長久以來都是各自開發演算法解決各自的問題，但是在實際情況下時，兩個問題卻是息息相關的，手持相機拍HDR，需要較長曝光的影像容易產生模糊。若先將多張照片先做HDR然後再處理Deblur會使得Deblurring難度更高，另一種方式，先各自處例deblur問題之後再做HDR也會因為每張影像的些許誤差導致整張HDR的嚴重缺陷。於是將這兩個問題同時考慮便產生了一個實際的新題目HDR in hand held camera [1]。相較於單張處理deblurring [2]再做HDR以及利用noisy/blurred image pair deblurring的方法 [3]，同時考慮HDR和Deblurring這兩種問題可以產生更好的效果。
　　分析問題的關聯性：HDR問題本身就需要取得多張影像來合成，而恰好同一場景的多張影像對於deblurring具有相當大的幫助，即使每張的blur kernel都不同 [4] (因為從Estimation theory中的MAP estimation可以得知越多的觀察資料對於未知kernel的預測越準，當影像和kernel的大小有一定差距時更是如此 [5])。

[1] High Dynamic Range Image Reconstruction from Hand-held Cameras, CVPR 2009
[2] Image Deblurring with Blurred/Noisy Image Pairs, SIGGRAPH 2007
[3] Removing camera shake from a single image, SIGGRAPH 2006
[4] Two motion-blurred images are better than one, Pattern Recognition Letter 2005
[5] Understanding and evaluating blind deconvolution algorithms, CVPR 2009

EX 2. Human Body Understanding

　　人常常是電腦視覺中相當感興趣的主體，比如說如何從影像中得到人的位置(e.g., 行人偵測)、動作姿勢(e.g., 姿態分析)或是人的形狀(e.g., 高矮胖瘦)等等資訊。這些子問題的合併往往可以得到比解決單獨子問題更好的效果。

[1] Estimating human shape and pose from a single image, ICCV 2009
[2] The naked truth: Estimating human shape under clothing, ECCV 2008
[3] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation, CVPR 2009

EX 3. Image Understanding

　　這個例子是個相當廣泛的主題，終極目標是讓電腦也能像人眼視覺一樣，對於影像或是視訊內容有完整的了解。解決這個問題可以使得非常多有效的應用得以實現(e.g., Content-based Image Retrieval, Object (face, car, pedestrian) detection, tracking and recognition, Surveillance, Machine vision for Industry, Scene reconstruction.)
　　不過因為直接解決這個問題太過於困難，人們往往會將這個問題做切割而定義成較簡單的許多子題目(e.g., detection, tracking, recognition, segmentation and reconstruction)，並各自在每個子問題上做研究，但是實際上Image Understanding是個完整的問題，每個子問題之間往往免不了有一定的相關程度，因此在人們對於子問題有一定的了解時，便開始尋找合併許多子問題而形成新的問題的可能。接下來先介紹傳統上的一些子問題，接著再舉一些近期關於這些子問題整合上的例子。

a. Object detection/localization：
　　在影像當中尋找某種Object (e.g., 人臉, 車)的位置(e.g., 用bounding box表示)
b. Object tracking：
　　給定影像中某種Object的初始位置，在視訊中穩定地追蹤該Object接下來的位置
c. Object recognition/categorization：
　　分辨影像中Object的種類 (e.g., 椅子, 桌子, 飛機, 手機)
d. Image segmentation：
　　針對影像的內容做有意義的分割
e. Image reconstruction：
　　由單張/多張二維的影像中建立三維的模型
f. Scene understanding：
　　從單張影像中得到人眼視覺可能可以得到的資訊(e.g., Geometric Layout, Occlusion Relationships, Camera Viewpoint, Illumination, Geographic Properties, Object attributes, Event)

看看上頭a-f這些子問題的簡單定義，是不是已經可以看到很多關聯性了呢？

EX 3.1 Object Tracking + Detection

　　Object detection 和 tracking在本質上是相當有關聯的兩個問題 [1]，試想在tracking的時候，若每一張影像中的Object都能夠順利地被detector偵測出來，那麼Tracking的工作就變得很簡單。另一方面，如果在每張影像中該物件都被準確的追蹤，那麼該物件在視訊中所產生的許多變化(e.g., 光線, 遮蔽), 都可以用來當作detector新的訓練資料。當你同時將這兩個問題一起考慮時，新的題目便產生了。

[1] People-Tracking -by-Detection and People-Detection -by-Tracking, CVPR 2008

EX 3.2 Object Attribute + Recognition (c+f)

　　當我們描述一個物件時，常常會使用該物件的特性而非實際的圖片，比如說當我說"生活在陸地、體積大、動物、鼻子很長"等等字眼時，八成就可以曉得我在說的是大象。因此這些物件的特性(attribute)是一種相當有用的Intermediate representation。在近年來常常被運用和Object recognition這塊領域做結合 [1-5]。

[1] Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, CVPR 2009
[2] Describing Objects by Their Attributes, CVPR 2009
[3] Attribute and Simile Classifiers for Face Verification, ICCV 2009
[4] Learning Visual Attributes, NIPS 2007
[5] Joint learning of visual attributes, object classes and visual saliency, ICCV 2009

EX 3.3 Object recognition + detection (a+c)

　　這一類作品即是將Object recognition和detection兩個問題做結合，進而達到更好的影像分類效果。

[1] Combining efficient object localization and image classification, ICCV 2009
[2] Fast concurrent object localization and recognition, CVPR 2009

EX 3.4 Image segmentation + Object recognition + Scene understanding (c+d+f)

　　這一類方法大致上是利用影像中的Context去理解影像內容，就像是讀文章時的上下文可以幫助理解含意一樣，也許是考量物件之間的關係、物件在場景裡頭的位置、場景的幾何分佈等等資訊。近年來在這方面的作品非常多[1-9]，在這裡僅列出少部分的研究。

[1] Discriminative Models for Multi-Class Object Layout, ICCV 2009
[2] TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation, ECCV 2006
[3] Contextual Priming for Object Detection, IJCV 2003
[4] Geometric Context from a Single Image, ICCV 2005
[5] Putting Objects in Perspective, CVPR 2006
[6] Decomposing a Scene into Geometric and Semantically Consistent Regions, ICCV 2009
[7] Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Unsupervised Framework, CVPR 2009
[8] Object Categorization using Co-Occurrence, Location and Appearance, CVPR 2008
[9] Scene Understanding Symposium

Redefining Open Minds

Thursday, April 22, 2010

如何找研究題目(二)：結合兩種以上不同的主題 - 結合不同的問題

No comments :

Post a Comment