Doctoral thesis

Advances in deep learning for vision, with applications to industrial inspection : classification, segmentation and morphological extensions


165 p

Thèse de doctorat: Università della Svizzera italiana, 2014

English Learning features for object detection and recognition with deep learning has received increasing attention in the past several years and recently attained widespread popularity. In this PhD thesis we investigate its applications to the automatic surface inspection system of our industrial partner ArcelorMittal, for classification and segmentation problems. Currently employed algorithms, in fact, use fixed feature extractors which are hard to tune and require extensive prior-knowledge. Our work, instead, focuses on learnable systems that can be used to improve recognition and detection without requiring hard to obtain task-specific domain knowledge. For image classification we propose extensions to max-pooling convolutional networks, so that they can be applied to solve the general defect classification problem via a new pooling and feature encoding schemes. State-of-the-art deep learning algorithms for object detection/segmentation have reached outstanding performance given high-quality annotated data. Unfortunately, they do not meet the required processing speeds of steel industry. We propose an architecture that does not suffer the same computational bottleneck (1500-fold speed-up) while retaining equal performance. To further advance the field we study the learning of morphological operators, largely used in industry. Only few attempts have been proposed in the literature, but no approach has ever considered the problem in its generality because of its hard formulation. We tackle it from a different perspective and introduce a learnable framework which seamlessly integrates morphological operators; hence bringing these powerful tools to deep learning for the first time. Re-engineering an industrial system requires time. In order to deliver an immediate return we investigate metric learning problems to boost performance of currently used features. Our multimodal similarity sensitive hashing model scales well to web-scale datasets and, thanks to the binary representation, requires little storage and involves a cheap distance computation. It outperforms previous state-of-the-art approaches without requiring additional resources.
  • English
Computer science
License undefined
Persistent URL

Document views: 46 File downloads:
  • 2014INFO001.pdf: 7