Xylos brands

Remotely operating a robot? Dive into TensorFlow for JavaScript

At Xylos Inspire 2018, we demonstrated how to operate a remote-controlled Lego Mindstorms robot using your own body movements. In this blog post, we’ll illustrate how we did this. 

How to program a robot: getting started with Lego Mindstorms

Prototyping a controllable robot is easy with Lego Mindstorms. By combining Lego bricks, a programmable EV3 brick, motors and sensors, you can build your own robot that can walk, talk, shoot and move in almost any way you want. The motors and sensors are connected to the programmable EV3 brick. Available sensors include infrared sensors to measure distance, colour sensors, touch sensors and gyroscopes.  

You can create your own design, but we decided to stick with one of the standard robots. The robot’s sensors are not used for this project. 

The EV3 brick comes installed with a default operating system which supports several standard robots designed by Lego. You could also link it to the EV3 software and start programming with the building blocks included with the kit, but the downside is that your options are fairly limited if you work this way.  

For our demo, we decided to start with a basic robot, so that we could fully control everything the robot does. We did this by leveraging the EV3DEV operating system; this is a Debian Linux-based operating system which you flash onto an SD-Card and then insert into the brick. With the Debian operating system running on the robot, we could write our own code for it, which we did in Python because it has the best open source support (for a full list of supported programming languages, visit the EV3DEV page). Being able to write our own Python code allowed us to send commands to the different engines and connect to a remote WebSocket server to respond to external commands. If you’re interested in the code, you can find the open-source code on the GitHub repository at the end of this blog post. 

What about… video movements?

As the user moves in front of a camera, their movements are translated to control commands in real time. These commands are sent to a Lego Mindstorms robot as REST API calls. The process of capturing the video stream from a webcam, analysing the images and sending commands to the robot is written in JavaScript and runs entirely in the browser. The images are interpreted with TensorFlow, Google's popular deep learning framework. A neural network takes a single video frame as input and returns the position of a few body keypoints. These positions are then translated into commands. All this is done with TensorFlow’s awesome new JavaScript API. 

To summarise, the solution has three components: 

  1. A Lego Mindstorms robot with an EV3 brick.
  2. A Kubernetes container, which registers commands and forwards them to the robot with a WebSocket connection. 
  3. A JavaScript application that reads a webcam video stream, interprets the images and sends robot commands. 

What about… forwarding commands via REST API? 

For the client-to-robot communication, we decided to build an API to act as a middleman. The API is a container which consists of two parts. First, it handles all incoming requests from the user. There are multiple endpoints available; some examples include ‘forward’, ‘backward’ and ‘shoot’. Second, the container hosts a WebSocket server to send messages to the robot. The API transforms every request into a message, which is then registered by the listening WebSockets. The main reason why we took this logic out of the robot is its limited hardware capabilities; we already had a Kubernetes cluster running, so the decision to quickly host it as a container was easily made. We’re not going to explain the Kubernetes configuration and the deployment to the cluster in detail in this blog, but if you’re be interested, you can find the deployment file and container file in the git repository. 

What about… sending commands with TensorFlow for JavaScript?

We used TensorFlow’s JavaScript API for this project. Let’s first talk about this new technology and why it is important.  Until recently, Google's popular deep learning framework only had a Python API, so if you wanted to build and train neural networks, you had to install Python. Training a model (a neural network) and inference (making new predictions with the model) were generally done on the server side. For training, user data had to be transferred to the server, since the model was trained there. When the trained model was put to work, new data had to be sent to a server in order to draw conclusions from it and possibly respond to it. 

Today, we have TensorFlow for JavaScript. We can now build, train, and use neural networks directly in the browser with JavaScript; this makes it possible to create intelligent, highly interactive web applications. The end user does not have to install Python; all he or she needs is a web browser, such as Google Chrome, Firefox or Edge. 

These browser applications don't just work on laptops, but also on smartphones. If granted permission, they have access to the device's built-in camera, accelerometer and other sensors. With TensorFlow running in the browser, the user's personal data doesn’t need to be forwarded for inference, which saves bandwidth and protects the user’s privacy. Deep learning tasks that would normally be performed with Python on the server side can now be done with JavaScript on the client side. 

1. Training GPU’s in Python

Training a deep learning model (or any model) requires a lot of data and computing power, so it’s usually done on strong GPU's, not on a laptop or smartphone. Fortunately, you can still collect data on the server side and train a neural network in Python on a strong GPU. After that, you can simply import it into JavaScript. You could also import a general model which has been pretrained with data from different sources, and then retrain it locally with your own personal data. When retraining, the deep learning neural network keeps learning and is gradually finetuned with your own data, so that it becomes better at working for you specifically. 

2. Extracting body positions with Posenet

To showcase the possibilities of TensorFlow for JavaScript, Google created an open-sourced Posenet library for JavaScript. With Posenet, any developer can take photos or videos of people and extract the body positions from the images. The images are fed to a pretrained neural network, which recognises the position of 17 body keypoints as coordinates in the image.  

Using these coordinates, simple logic can be applied to formulate robot commands based on the user's pose. In our case, holding both hands down in a natural standing pose will stop the robot, holding both hands up horizontally will prompt the robot to move forward, and raising both hands will make it drive backwards. You can change its direction by holding up both arms in a straight line and leaning to the left or right, like an airplane taking a turn. The JavaScript application sends commands to the remote API based on the pose, and the API forwards these commands to the robot with WebSockets. 

3. Shooting projectiles with Logitech

Finally, as our robot can also shoot projectiles, the user should be able to trigger shots while controlling the robot. Our solution is a simple Logitech presentation remote: as it turns out, our JavaScript code can capture pressing the remote's Play button as a simple F5 or ESC keypress event. The button alternates between F5 and ESC each time it is clicked. This makes sense: as PowerPoint fans know, you can start a presentation with F5 and end it with ESC. The default behaviour of the F5 key in the browser (refreshing the page) is overridden. 

For the demo, we decided to run the application locally and host the API which accepts commands and forwards them to the robot on Azure. With the application securely hosted on the internet, anyone with the necessary permissions could control the robot.   

We now have a JavaScript app on the client side, a robot which takes commands, and an API that forwards commands from the application to the robot using WebSockets. 
Interested in the source code for this project? You can find it here.  

Discover the world of IoT and get in touch with our experts and our offer.

Share this blogpost

Also interesting for you

Leave a reply

Your email address will not be published. Required fields are marked.