1. Visual Media Retrieval
The key problems for visual media retrieval are how to describe the semantic similarity of visual media and obtain the optimal compact binary code through hash mapping, and how to address semantic polymorphism and visual polysemy to effectively cross the “semantic gap” and achieve intelligent semantic recognition in accordance with human cognition, and achieve efficient visual media retrieval techniques.
Chenggang Yan has proposed a supervised hash coding method based on deep neural networks to address the the incompatibility of manual feature descriptions with binary codes, and achieves a direct mapping from images to high-quality binary codes. He has established a multi-task feature learning model and has proposed a visual semantic recognition method based on cross-modal bridging and knowledge migration to solve the challenges from semantic polymorphism and visual polysemy of internet images. The proposed multi-task feature learning model is based on cross-modal bridging and knowledge migration.
2. Text Analysis (Uyghur)
Text localization in complex images is a key technique for text analysis. For Chinese and English, re-searchers have proposed many text localization methods based on visual features. Due to the special characteristics of Uyghur characters, its text localization methods are different from those of the rela-tively mature Chinese and English languages. The effectiveness and accuracy of the existing text lo-calization methods in Uyghur are still unsatisfactory. With the population of more 11 million in the world speaking Uyghur, it is important to study the detection of Uyghur text in images.
To address the challenge of Uyghur text localization in images with complex backgrounds, Chenggang Yan has studied the mechanism of character strokes and propose a fast text localization method combining FASTroke keypoint detector and component similarity clustering. This method optimizes the computational complexity by reducing the proportion of non-textual components. In order to en-hance the robustness of text detection, a channel-enhanced maximum stable polar region algorithm is designed, which greatly outperforms the traditional maximum stable polar region algorithm.
3. Automatic Visual Content Description
Automatic visual content description allows computers to recognize complex semantic information of visual content and then automatically generate textual descriptions that conform to natural human language habits. Unlike traditional visual semantic concept detection, automatic visual content de-scription can learn complex semantic recognition and description models from large-scale visual data, and accurately identify complex semantic information of visual content. Automatic visual content description discovers the intrinsic properties and external connections of different semantics, and au-tomatically generate utterance-level text descriptions that are more consistent with human natural language habits using natural language processing techniques.
To address the challenges of recognition errors and loss of details in temporal attention-based video description methods, Chenggang Yan has proposed a temporal attention mechanism in the encod-er-decoder neural network to break the bottleneck of a single temporal attention mechanism. In the proposed mechanism, the decoder automatically selects the important regions in the most relevant time segments for word prediction. The temporal attention mechanism significantly improves the overall performance of video description.