Professional Experiences


Researches


● 04.2015-12.2015 
● Multi-purpose dynamic classifier for document stream digitization 
○ Invited at LIRIS ○ liris.cnrs.fr 
○ Ref: Veronique Eglin {veronique.eglin@insa-lyon.fr}

Description: 
Inside the DIGIDOC project (ANR-10-CORD-0020) - CONTenus et INTeractions (CONTINT), our research was applied to several scenarios of classification of image streams which can correspond to real cases in digitization projects. Most of the time, the processing of documents is considered as a well-defined task: the classes (also called concepts) are defined and known before the processing starts. But in real industrial workflows of document processes, it may frequently happen that the concepts can change during the time. In a context of document stream processing, the information and content included in the digitized pages can evolve over the time as well as the judgment of the user on what he wants to do with the resulting classification. The goal of this application is to create a module of learning, for a steam-based massive document images classification (specially dedicated to a massive digitization process), that adapts different situations for intelligent scanning tasks: adding, extending, contracting, splitting, or merging the classes in on an online mode of streaming data processing.
Keywords: 
machine learning, intelligent scanner, dynamic classification, streaming data, one-class, SVM; human-computer interaction.
Publications:
  1. Anh Khoi Ngo Ho, Nicolas Ragot, Jean-Yves Ramel, Veronique Eglin, A multi-purpose dynamic classifier: application to document classification, International Journal on Document Analysis and Recognition, accepted. 
  2. Anh Khoi Ngo Ho, Nicolas Ragot, Jean-Yves Ramel, Veronique Eglin, Multi one-class incremental SVM for document stream digitization, 12th IAPR International Workshop on Document Analysis Systems, Satorini, Greece, 2016. 

● 10.2011-03.2015 
● Incremental learning for image, application for intelligent scanner of old documents 
○ Researcher at Computer Science Laboratory ○ li.univ-tours.fr 
○ Ref: Jean-Yves Ramel {jean-yves.ramel@univ-tours.fr}

Description: 
This research contributes to the field of dynamic learning and classification in case of stationary and non-stationary environments to deal with very small learning dataset at the beginning of the process and with abilities to adjust itself according to the variability of the incoming data inside a stream. For that purpose, we propose a solution based on a combination of independent one-class SVM classifiers having each one their own incremental learning procedure. Consequently, each classifier is not sensitive to crossed influences which can emanate from the configuration of the models of the other classifiers. The originality of our proposal comes from the use of the former knowledge kept in the SVM models (represented by all the found support vectors) and its combination with the new data coming incrementally from the stream. The proposed classification model (mOC-iSVM) is exploited through three variations in the way of using the existing models at each step of time. The mOC-iSVM.AP model selects the previous support vectors according to their « age »; the mOC-iSVM.EP model selects the support vectors according to their efficiency, and the mOC-iSVM.nB selects vectors from the n-best models in the history. Our contribution states in a state of the art where no solution is proposed today to handle at the same time, the concept drift, the addition or the deletion of concepts, the fusion or division of concepts while offering a privileged solution for interaction with the user. The experiments, at the same time on stationary and non-stationary environments, provide very good classification scores close or even better than those obtained with the most successful incremental classifiers at this moment. Furthermore, in contrary to our method, most of the other dynamic approaches are applicable only to particular environments. 
Keywords
dynamic classification, incremental learning, one-class SVM, stationary and non-stationary environments, document image classification, digitization.
Publications:
  1. Anh Khoi Ngo Ho, Nicolas Ragot, Jean-Yves Ramel, Veronique Eglin, Multi one-class incremental SVM with n-Best, Journal of Machine Learning, In progress.
  2. Anh Khoi Ngo Ho, Méthodes de classifications dynamiques et incrémentales : application à la numérisation cognitive d'images de documents, Doctoral dissertation, Ecole doctorale Mathématiques, Informatique, Physique Théorique et Ingénierie des Systèmes (Centre-Val de Loire), Tours, 2015. 
  3. Anh Khoi Ngo Ho, Nicolas Ragot, Jean-Yves Ramel, Veronique Eglin, Multi one-class incremental SVM for both stationary and non-stationary environment, CAp 2014, 16th Conférence Francophone sur l'Apprentissage Automatique, Saint-Etienne, France, 2014. 
  4. Anh Khoi Ngo Ho, Nicolas Ragot, Jean-Yves Ramel, Veronique Eglin, Nicolas Sidere, Document classification in a non-stationary environment: a one class SVM approach, ICDAR 2013, 12th International Conference on Document Analysis and Recognition, Washington DC, USA, 2013.

● 02.2011-07.2011 
● Panel and speech balloon extraction from comics books, application for mobile.
Researcher at Information, Image and Interaction Laboratory (L3I) ○ l3i.univ-larochelle.fr 
○ Ref: Jean-Chirstophe Burie {jean-christophe.burie@univ-lr.fr}

Description: 
Comic books represent an important cultural heritage in many countries. However, few researches have been done in order to analyse the content of comics such as panels, speech balloons or characters. At first glance, the structure of a comic page may appear easy to determine. In practice, the configuration of the page, the size and the shape of the panels can be different from one page to the next. Moreover, authors often draw extended contents (speech balloon or comic art) that overlap two panels or more. In some situations, the panel extraction can become a real challenge. Speech balloons are other important elements of comics. Full text indexing is only possible if the text can be extracted. However the text is usually embedded among graphic elements. Moreover, unlike newspapers, the text layout in speech balloons can be irregular. Classic text extraction method can fail. We propose a method based on region growing and mathematical morphology to extract automatically the panels of a comic page and a method to detect speech balloons.
Keywords: 
comic book; comics panel extraction; comics page segmentation; region growing, mathematical morphology. 
Publications:
  1. Anh Khoi Ngo Ho, Jean-Christophe Burie, Jean-Marc Ogier, "Panel and speech balloon extraction from comic books", DAS 2012, pp.424-428, 10th IAPR International Workshop on Document Analysis Systems, Gold Coast, Queensland, Australia, 2012. 
  2. Anh Khoi Ngo Ho, Jean-Christophe Burie, Jean-Marc Ogier Comics page structure analysis based on automatic panel extraction, GREC 2011, 9th International Workshop on Graphics Recognition, Seoul, Korea, 2011.

Software Developments

● 01.2010-06.2010 
● Management system for investment projects of Can Tho (web service) 
○ Developer at Can Tho Univesity ○ www.ctu.edu.vn 
○ Ref: Pham Thi Xuan Loc {ptxloc@cit.ctu.edu.vn} 

● 01.2009-06.2009 
● Restaurant Hybrid Management System (software, mobile app and web service) 
○ Developer at Cantho Software Park ○ www.csp.vn 
○ Ref: Tran Van Ut {uttv@csp.vn}