Made with Unity: Creating and training a robot digital twin

Our Made with Unity: AI series showcases Unity projects made by creators for a range of purposes that involve our artificial intelligence products. In this example, we feature a recent submission to the OpenCV Spatial AI competition that showcases robotics, computer vision, reinforcement learning, and augmented reality in Unity in an impressive suite of examples.
Unity has a world-class, real-time 3D engine. While the engine and tools we have created traditionally supported game developers, the AI@Unity group is building tools around areas like machine learning, computer vision, and robotics to enable applications outside of gaming, especially those that rely on artificial intelligence and real-time 3D environments
Gerard Espona and the Kauda Team’s submission to the OpenCV Spatial AI competition utilized many of our AI tools and packages across multiple examples. They used our Perception Package to help train computer vision models and the ML-Agents toolkit to train their machine learning models and do a sim2real demonstration of a robotic arm. We interviewed Gerard to find out what inspired him to build this project. Read on to learn more about how he brought this project to life in Unity and in the real world.
Where did you get the Kauda Team name from?
Kauda Team is composed of Giovanni Lerda and myself (Gerard Espona) with the name coming from the free and open-source 3D-printable desktop-sized 5-axis robotic arm that Giovanni created called Kauda. This is a great desktop robot arm that anyone can make and allowed us to collaborate remotely on this project.

We developed Kauda Studio which is a Unity application that powers the Kauda digital twin. It provides a fully functional, accurate simulation of Kauda with inverse kinematics (IK) control, USB/Bluetooth connection to the real Kauda, and can support multiple OpenCV OAK-D cameras.
The OAK-D camera combines two stereo depth cameras and a 4K color camera with onboard processing (powered with Intel MyriadX VPU) to automatically process a variety of features. As part of the contest, we built a Unity plug-in for OAK devices, but we also wanted to have a digital twin in Unity as well. The OAK-D Unity digital twin provided a virtual 3D camera with an accurate simulation that could be used for synthetic data gathering. It also allows for virtual images to be fed into the real device’s pipeline. We were able to use the Unity Perception Package to collect synthetic for training custom items with the virtual OAK-D camera.

Having a digital twin allowed us to enable additional features on Kauda. You can also use augmented reality (AR) features of Unity to interact with a virtual robot in the real world. One application is to learn how to perform maintenance on a robot without requiring a robot there. This also allows the programming of sequential tasks with a no-code approach having a virtual and accurate representation of the robot.
The digital twin enabled us to perform reinforcement learning (RL) training. RL is a time-consuming process that requires simulation for anything beyond extremely simple examples. With Kauda in Unity, we used the ML-Agents toolkit to perform RL training for control.
We also began testing human-machine collaboration and safety procedures by replicating the robot in Unity and using the cameras to measure where the human was inside the robot area. You can imagine doing this for a large robot that can cause injury to humans when errors occur. The simulation environment lets us test these scenarios without putting humans in danger.
We believe RL is a powerful framework for robotics and Unity ML-Agents is a great toolkit that allows our digital twin to learn and perform complex tasks. Because of the limited time frame of the contest, the goal was to implement a simple RL “touch” task and transform the resulting model to run inference on the OAK-D device. Using ML-Agents, the robot learned the optimal path using IK control to dynamically touch a detected 3D object.

To accomplish this, we first implemented a 3D object detector using spatial tiny YOLO. The RL model (PPO) uses the resulting detection and the position of the IK control point as input observations. As output actions, we have the 3-axis movement of the IK control point. The reward system was based on a small penalty in each step and a big reward (1.0) when the robot touched the object. To speed up training, we took advantage of multiple agents learning simultaneously developing virtual spatial tiny YOLO with the same outputs as real spatial tiny YOLO.
Once the model was trained, we transformed it to OpenVino IR and Myriad Blob format using the OpenVino toolkit to load the model on an OAK-D device and run inference. The final pipeline is a spatial tiny YOLO plus RL model. Thanks to our Unity plugin, we were able to compare inference using ML-Agents and OAK-D agents inside Unity side by side.
The first stage of our pipeline is a 3D object detector, which is a very common starting point for AI-based computer vision and robotic tasks. In our case, we used a pre-trained tiny YOLO v3 model and thanks to the Unity Perception package we were able to train a custom category. It allowed us to generate a large synthetic dataset of 3D models with automatic ground-truth bounding box labeling in a matter of minutes. Usually, the collection and labeling process is a manual human process that is very time consuming. Having the ability to generate a rich dataset with plenty of randomization options to have different rotations, lightning conditions, texture variations, and more is a big step forward.
The timing required to sync the virtual and real-world items was a little off at times. We think this could be resolved by using ROS in the future and it is nice that Unity officially supports ROS now.
Gerard has a full playlist of videos documenting their journey with a few notable videos including a webinar with OpenCV and the final contest submission video. He has also released the OAK-D Unity plug-in on Github to help others get started on their project.
We are excited to see our tools enable projects like this to come to life! If you are looking to add AI to your projects in Unity, we have many examples and tutorials to get you started! The Unity Perception Package allows you to easily gather synthetic data in Unity. The Unity Robotics Hub has tutorials and packages to get you started with ROS integration and robotics simulation. And the ML-Agents toolkit makes reinforcement learning simple with many environments to get started with.