Artificial intelligence (AI) is a non-human model or program that is capable of solving advanced tasks.
Machine learning explained
Machine learning (ML) is a subset of AI that enables machines to learn instead of having to have a human program them.
How does machine learning work?
Through continuous feedback loops, machine learning models are able to identify patterns and repetition in data that they can then use to make inferences and take appropriate actions.
Neural networks explained
A model that, taking inspiration from the brain, is composed of layers (at least one of which is hidden) consisting of mathematically connected units or neurons.
Deep learning explained
Using neural networks, machines are able to learn from extensive data. Deep learning involves a training stage where data is ingested through an intense repetition of mathematical functions and processing, each time tweaking the model to improve the expected outcome. Due to the large and complex structures of deep neural networks, a considerable amount of data is required during the training phase.
AI reinforcement learning
AI reinforcement learning is a type of machine learning where the agent learns an optimal set of decisions that maximizes a reward function. The learning system attempts to solve issues through trial and error and uses the results from each event to influence its next decision. Like a child or pet, the algorithm learns what behavior leads to positive or negative rewards based on the reward function it is trying to optimize.
Learn how Bosch Rexroth studies the use of simulation-based reinforcement learning for industrial automation by clicking the button below.
Sim to real
Sim to real refers to the transfer of a system or algorithm performing in simulation or with synthetic data to a system or algorithm in the real world or using real-world data.
Ground truth is the expected and correct result with no errors and usually compared against measurements by a system with noise or inherent errors to determine the accuracy of the prediction or measurement.
Computer vision revolves around perceiving the world visually in the same way a person does, leading to an understanding of the environment the system is operating in. For machine learning computer vision applications, cameras are used to capture image data that is then labelled and annotated. Through machine learning, a model learns to understand those images and extrapolate the important aspects that were labeled and learned through the training process to visually detect objects and people and understand the environment.
Synthetic data is data created artificially that does not rely on real-world measurement or situations. Synthetic data not only cuts down the cost and time to collect data, but also offers ways to eliminate bias, increase performance, generate perfect labels, and diversify the data collected.
Image: Structured synthetic data (top), unstructured synthetic data (bottom)
When creating synthetic data, the environment that provides the context for the computer vision problem may not necessarily resemble a real-world environment. A structured environment usually represents the real-world environment meant to be simulated, such as a building or home interior. Unstructured environments include a highly randomized background with unrelated images or objects with a high degree of variation.
Domain randomization is a synthetic data technique that helps build performant computer vision models by programmatically varying parameters in a dataset. In each frame, the specific objects, their position and orientation, the lighting and camera angles, and many other parameters can vary. This ensures a diverse dataset that can better train your model to handle variations in environmental conditions and edge cases.
Digital images are constructed of pixels each with red, green, and blue values (RGB) usually ranging from 0 to 255. The combination of these three RGB values represent a large number of colors and shades. RGB images are a common output of synthetic data generation.
Annotations in computer vision are anything accompanying an image to aid in the understanding of the image or objects and actions in the image. For example, bounding boxes may be considered annotations, as they are not a part of an image itself – they are present to help a computer vision model understand the image.
Image: 2D bounding boxes (top), 3D bounding boxes (bottom)
Bounding boxes are annotation squares or rectangles placed around objects in images to track or identify said objects. There are two different types of bounding boxes:
- 2D bounding boxes precisely locate and label objects in screen space to help a computer vision model recognize them.
- 3D bounding boxes provide precise coordinates in the world space of object locations.
Image segmentation is a method for marking objects over bounding boxes, achieved by dividing a digital image into separate segments. Since the labels are applied on a per-pixel basis, image segmentation is more precise and is commonly used in computer vision and digital image processing to improve machine learning. Image segmentation can take the form of semantic segmentation or instance segmentation.
Semantic segmentation, also known as class segmentation, provides a clear and precise mask to identify every instance of a class of objects in an image. For instance, all boxes are segmented in red in this image.
Instance segmentation separates and masks all labelled objects uniquely. For example, all boxes are segmented in unique colors in this image.
Panoptic segmentation unifies semantic segmentation (assigning a class label to each pixel) and instance segmentation (detecting and segmenting each object instance). Panoptic segmentation tasks classify all the pixels in the image as belonging to a class label, yet also identify what instance of that class they belong to.
Panoptic segmentation is typically used for:
- Medical imagery, where instances as well as amorphous regions help shape the context.
- Self-driving cars and autonomous vehicles, as it is needed to know what objects are around the vehicle, but also what surface the vehicle is driving on.
- Digital image processing software that needs to have pixel-wise comprehension of the people in the image as well as what comprises the background.
Human keypoint labels
Human keypoint detection is a computer vision problem that involves simultaneously detecting people and localizing the keypoints (interest points). The keypoints describe certain landmarks on the human body, such as the location of the shoulders, wrists, hips, knees, etc. as viewed from the camera angle in the scene. Keypoints can semantically encapsulate the orientation or pose of the human body. By detecting the keypoints, the computer vision model can more easily recognize the pose, movements, and actions of humans. Keypoint labels can describe the 2D x,y image coordinates of the keypoints as viewed from the camera, or the 3D x,y,z spatial positions of the human keypoints in the scene with respect to the camera position or some other point of reference.
For 3D pose estimation, a machine learning model estimates the position and orientation of an object or person from an image or video by estimating the spatial locations of keypoints. Pose estimation can aid in tracking how objects will move in real-world simulations and is used widely across areas such as AR, animation, gaming, and robotics.
Object detection describes the computer vision tasks that involve identifying and detecting certain classes of objects. For instance, in this image, a computer vision system identified this object as a smartphone.
See how the AI startup Neural Pocket improves object detection with synthetic data.
Check out Unity’s AI and machine learning products and learn how they can help you solve diverse problems.