Defect detection and classification with AI
In industrial sectors, the presence of defective products is directly linked to several negative aspects, such as increased production costs, brand devaluation, compensation costs, among others. Therefore, one of the main priorities of any industry is to minimize the shipment of defective products.
Many companies rely on ISR’s OIT systems for anomaly detection, enabling them to significantly reduce the batch of defective products. These systems are based on the use of cameras and illuminated panels to identify product irregularities, adapting to a wide variety of shapes and surfaces.
With the advancement of artificial intelligence-based object detection and classification systems, ISR is implementing improvements to its methodologies for integrating these advanced systems. In simple terms, the strategy is to combine the high acquisition quality of OIT systems with artificial intelligence models, with the aim of further improving defect detection and classification capabilities.
From acquisition to prediction
The process of developing a competitive artificial intelligence model consists of several essential steps:
- Dataset creation.
- Model training and evaluation.
- Model selection and implementation.
A crucial aspect of setting up any system is having high quality data to work with. OIT systems allow ISR not only to acquire this data, but also to offer a quality that is unmatched by other competitors in the industry.
To train a model to be able to detect defects, it is necessary to provide it with clear examples of how to do so. Each acquired image must be assigned labels that define the behavior of the model. This procedure is known as “data labeling”, and the final performance of the model will depend mostly on the quality obtained in this phase.
This article will focus on describing the methodology adopted by ISR to optimize and simplify both the labeling process and its results. To this end, a centralized web platform has been developed that allows users to create and manage datasets at a higher level of abstraction, without the need to know the technical details of the underlying technologies.
System architecture
The application has been designed with a clear concept: to act as an intermediary between the labeling system and the data storage system. This allows users to completely abstract from the technical operation of these systems, which simplifies and speeds up the labeling process considerably.
The labeling interface is a tool that offers the user the possibility of labeling a dataset, either for classification (class assignment) or detection (segmentation) problems. There are several tools on the market that offer these functionalities, but ISR has finally chosen to implement Label Studio on its servers. This application, widely used in the artificial intelligence sector, allows data labeling by multiple users, and stands out for the flexibility of its functionalities and its easy integration in Deep Learning workflows.
For data storage, has been implemented a system based on Git repositories, which allows efficient data versioning management.
The system has been developed in Python, using Streamlit for the user interface. The communication between the system and the labeling interface and the storage system is done through the API provided by Label Studio and the one developed for the storage system. Both Label Studio and Streamlit are open-source tools under Apache 2.0 license, which allows for commercial use.
Docker has been used for deploying these tools on ISR’s servers. This technology allows the creation of isolated and reproducible environments, making the implementation and maintenance of each service easier, while also ensuring efficient integration between them.
System workflow
The workflow from the incorporation of new data to obtaining a valid dataset for model training, consists of three main phases
Data uploading
This is the starting point of the process. From this point on, the labeling process begins. From the time the user loads a new batch of data until it is incorporated into Label Studio and the storage system, the process follows three stages.
- Data selection. The batches uploaded by users must follow a predefined structure for each project. In this phase, the data to be included in the labeling process is selected.
- Metadata embedding. In this optional phase, the user can add additional metadata.
- Data filtering. After obtaining the images (and metadata, if added), the user can filter the data according to a specific criteria. For example, one can choose to include only images containing a specific type of defect to balance the dataset. This phase is optional and will depend on the user’s criteria and the needs of the project.
Once these three phases have been completed, the data will be sent to the labeling interface and the storage system, ready to be labeled.
Data uploading
Label Studio allows registered users to label data according to the configuration defined for each project. Once labeled, under user’s demand, the annotations are transformed from the native Label Studio format to a suitable format for the storage system.
Versioning
Once the dataset has been correctly labeled and is ready to be used for model training, the user can create a version of it, defining the different subsets that compose it (train, validation and test).
It is essential to manage different versions of the dataset, as it may be necessary to modify the labeling approach according to the needs of the project or the customer’s requirements. For example, if the criteria for defining a defect are modified and the results of the new approach are not as satisfactory as the previous ones, versioning allows to recover the previous state of the dataset. In this way, checkpoints are established to facilitate the use of different versions according to the needs of the project.
Why is defect classification important in the industry?
The defect detection is the door and entry to the concept of Specular Zero, developed in this previous article. Having the information classified is way better than only detected. No factory in the industry wants to detect defects, they don’t want to have defects!
Once a production line has the scrap parts detected, the following natural step is to know why these parts are NOK. Classifying defects allows the production units to understand the origin of defect, the root-cause in the processes and the conditions that created these defects. Without classification, this information is unreachable and lack the sense of going further.
ISR Specular Vision is convinced about taking the quality control to the next level: Understanding the origin of defects to undermine them.
Summarizing the key points
Let’s sum up the article in the following points:
- The integration of artificial intelligence models into OIT systems represents a significant advance in ISR Specular Vision methodology, allowing more accurate and efficient defect detection on a wide variety of surfaces.
- Data labeling is a critical phase for creating quality datasets essential for training machine learning models with high performance.
- The quality of the labeling directly influences the model performance. It is therefore essential to allocate the necessary resources to ensure that this process is carried out with the required accuracy.
- Having an efficient system for dataset generation and management not only speeds up the process of creating defect detection systems, but also ensures that they comply with customer specifications and requirements.
- The classification of the defects detect is the entry to next level of quality control: Specular Zero.
Author: Daniel Gómez