Image Recognition Enables Complex Object-Handling Applications

AI and machine vision are essential to empowering robots to pick a wide range of objects.


Advanced algorithms can enable the rapid recognition of varied parcels.

The trend of deploying robots for an increasing number of applications is conditioned and propelled by the steady stream of emerging technologies. To automate complex tasks that could previously only be performed manually, robots need to be able to see and understand. For instance, the detection, recognition, and localization of mixed types of objects so they can be picked or otherwise handled by a robot requires powerful 3D machine vision and artificial intelligence.

AI covers a wide spectrum of robotic capabilities, but the most popular class of models is represented by neural networks, thanks to their ability to generalize. Neural networks can adapt to new, previously unseen data and recognize objects that they have never seen before.

This capability is widely used in applications that require the recognition and handling of items that differ in shape, size, color, or material. Examples include automated picking of mixed objects from a bin, robotic unloading of pallets loaded with boxes of various shapes and sizes, and the singulation and sorting of all types of parcels.     

The benefit of classical neural networks is that they can learn basically anything. However, they have a significant limitation—they represent an old, fully-connected architecture, which means that all neurons in one layer are fully connected to neurons in the next layer. This increases the number of parameters that algorithms need to learn. The larger the size of an image, the greater the complexity, effort, and time required for training the neural network.

Convolutional neural networks

The limitation described above can be overcome with convolutional neural networks (CNNs). Inspired by the visual cortex in the human brain, a CNN represents an architecture where neurons in one layer are only connected to neurons in the next layer that are spatially close to them and carry related information. This significantly reduces the number of neurons and therefore also the number of parameters the algorithms need to learn, making the CNNs generally less complex than classical neural networks.

This can bring many benefits. CNNs can learn faster, need fewer samples, and can be used for the recognition of larger images. This makes CNNs great for analyzing visual imagery, including image classification and pattern recognition.

How to train a CNN and boost its performance

In order to use a CNN for applications that require complex image recognition and classification, it needs to be trained on a rich dataset of images. The aim is to make the CNN invariant to factors such as translation, viewpoint, size, or the way the object is illuminated, for instance.

If an image features a cat, the CNN needs to be able to recognize, regardless of whether the cat is placed in the top left corner or in the middle of the picture. This can only be achieved by training the network on a sufficiently large dataset of images.

A very efficient way to enlarge the dataset is data augmentation, such as making slight alterations of the original image via rotating, flipping, cropping, scaling, blurring, and other practices. Data augmentation will ensure that the CNN becomes immune to the modifications of the input data and will not learn irrelevant patterns. A zoomed or flipped cat will thus still be recognized as a cat.

The amount of training data can be reduced through transfer learning. This allows using an existing CNN that was trained on a certain type of objects, such as dogs, for the recognition of a different type of objects, like cats. This is possible by keeping certain filters of the original network, applying them to the new network, and modifying only the rest.

Transfer learning allows a CNN to recognize new types of objects, which can save a great amount of time, energy, and resources.

AI object recognition powers advanced automation

One company that uses CNNs for object localization and segmentation in its robot intelligence systems is Photoneo. The company develops its own AI algorithms and combines them with 3D machine vision to automate a wide range of industrial tasks. These smart systems are intended to help increase productivity and reduce monotonous and physically demanding manual processes.

To achieve high universality and flexibility, Photoneo trains its networks on huge datasets of objects. This enables its algorithms to recognize items of any shape, size, material, texture, color, orientation, or position. And because the algorithms constantly learn, their performance can improve over time.

For instance, the company’s bin-picking system enables robotic recognition and picking of various items placed within a container. These items may include any kind of tubes, cartons, ropes, objects as small as 1x1 cm, or even organic items such as bananas or fish. If trained on a specific type of object, the network’s performance gets even better.

Bin picking performance can improve with training on specific items. Source: Photoneo

Photoneo’s AI can also recognize bags, which is an especially tricky task because they can be shapeless, deformable, and full of wrinkles and irregularities.

The task gets even more complicated if the objects are transparent because makes that makes it even more difficult to create a point cloud and for algorithms to recognize the boundaries between them.

Photoneo bags

Photoneo's algorithms can recognize bags, which can be challenging. Source: Photoneo

Logistics and e-commerce have turned to AI-powered automation to help process huge flows of parcels and boxes. That industry trend has become stronger over the past year, partly because of the COVID-19 pandemic.

For instance, Photoneo's depalletization system can unload pallets laden with mixed types of boxes. Not only can its algorithms recognize regularly shaped and nicely ordered boxes, but also those that are damaged, placed randomly, or even tilted at an angle.

Similarly, the AI can recognize parcels that may be piled up in a container or coming on a conveyor belt. The algorithms can localize and segment each parcel and sort them according to the application’s requirements. Photoneo's systems can process up to 2,250 parcels or 1,000 boxes in one hour, depending on the setup.

The range of applications that can benefit from robotics is truly huge and extends exponentially with advancements in AI. We can already see the vast space that's opening up with new approaches such as reinforcement learning, as automation gradually transform all kinds of industrial processes.

About the authors:

Michal Maly is co-founder and director of AI at Photoneo, and Andrea Pufflerova is public relations specialist at the Bratislava, Slovakia-based company.

Email Sign Up

Get news, papers, media and research delivered
Stay up-to-date with news and resources you need to do your job. Research industry trends, compare companies and get market intelligence every week with Robotics 24/7. Subscribe to our robotics user email newsletter and we'll keep you informed and up-to-date.


Advanced algorithms can enable the rapid recognition of varied parcels.

Robot Technologies