A simple way to describe I've done in this project is to state that I am evaluating the efficacy of using Quaternion Neural Networks to do Inverse Kinematics. (I'll get to what Quaternion neural networks are in a moment, for now let's just say a neural network)
So if someone tells you that they're attempting to do inverse kinematics using a neural network, the first thing that comes to mind is that a neural network would be trained to predict the values for joint parameters for a robot for a given desired position of the robot's end-effectors.
However, instead of predicting what the values of the robot's joint parameters should be for the robot's end-effector be in a desired position, I intend to predict what the position of end of each link should be, such that the end-effector ends up in the desired position.
Just to sketch out what I mean -
The motivation in doing this is simple, joint parameters are a nice way to encode the configuration of a robot, but you lose a lot of meaning in the process.
For instance, knowing where the end of the link will end up for a rotational joint is impossible unless you know the lenght of the link. As such I do not believe asking a neural network to predict joint parameters is a good idea - the joint parameters are just arbritary numbers that only have meaning once you are provided with additonal information about the robot, such as link lenght.
Also worth noting is that joint parameters, depending on the topology of their C-space, follow some abritary rules like 2π being the same as 4π, and follow such rules for only some of the joints (which are rotational), and not the others.
I think all of this means that joint parameters are just not conducive to neural network learning, and any attempts to make it happen would only work with a complex neural network that simply memorizes or overfits the input-output mapping.
Let's just take a step back and think about what we want to do -
When I say a transformation matrix describing a movement, what I mean is - .
- Quaternions are 4-dimensional vectors - $a+x\hat{i}+y\hat{j}+z\hat{k}$
- $a$ is the real part, while $x\hat{i}+y\hat{j}+z\hat{k}$ is the imaginary part.
- For a set of euler angles $(u_{x},u_{y},u_{z})$, the corresponding quaternion is $q = cos(\frac{θ}{2}) +sin(\frac{θ}{2}) u_{x}\hat{i} +sin(\frac{θ}{2})u_{y}\hat{j} + sin(\frac{θ}{2})u_{z}\hat{k}$
Quaternions are often associated with rotation and people tend to forget what they actually are - an extension of complex number's to 4 dimensions, complete with their own algebra. The $q * v * q^{-1}$ thing we do is just a trick that let's us rotate in 3-dimensional space.
I realised during my project demonstration what I am giving to the neural network as input, and what I expect out of it as output is not obvious. As such I'll use a very simple example to explain. Consider a single rotational joint attached to a fixed point -
Let's say this is our desired end-effector position -
Consider the movement we need to make here -
What we will give to the network as input is the co-ordinates of the end-point of each link - in this case just the one link - along with a quaternion depicting the orientation of each link. This is information on where the link is in space currently; along with the co-ordinates of the end effector. All represented as a quaternion with the real part set to zero.
We expect the neural network to tell us how to move each link so that the end-effector reaches the desired co-ordinates - by giving the translational and rotational components of the movement to be made separately.
Since we're currently in 2 dimensions, we can set the rotation component is $(1,0,0,0)$ (identity, signifies no rotation), and the translation is $(dx, dy)$, which would be depicted as a quaternion $(0, 0, dx, dy)$.
Regular neural networks take in real-valued inputs, multiply them with real-valued weights, and then this output is propogated to the neurons in the next layer's as input.
Quaternion neural networks are just networks that take in quaternion inputs, multiply them with quaternion weights, and then this output is propogated to the neurons in the next layer.
Backpropogation is used to conduct learning. Nonlinear activation functions are usually non-analytic and thus quaternion derivatives cannot be used, as such the activation function is applied element-wise on each individual component of a quaternion.
import torch
import torch.nn as nn
import torchvision
%pylab inline
import numpy as np
import pandas as pd
import os
from tqdm import tqdm
from quaternion_layers import *
device = "cuda:0" if torch.cuda.is_available() else "cpu"
First let's take a look at the robot arm we'll be working with -
fixed [⚓]
torso_linear [↕+Z]
shoulder_yaw [⚙+Z]
elbow_yaw [⚙+Z]
wrist_yaw [⚙+Z]
- Fixed joint shown by ⚓
- Linear joint along the Z-axis shown by ↕+Z
- Rotational joint along the Z-axis shown by ⚙+Z
I had trouble coming up with a good configuration for a robot arm with enough "reach" to be able to produce a nice and varied dataset, and this was the best I could come up with at the time.
This is what it looks like - each cube represents an end-point of a link. There's a small black cube that represents the fixed joint, the pink cube is the endpoint of the first link l0
, red is the endpoint of the second link l1
, so on and so forth.
And this is how it can move -
To quote my project proposal, I had said -
A 2R, 3R, and 4R robotic arm would be simulated using ROS and Gazebo. The user would input either via console or through a web interface the desired co-ordinates for the end-effector.
The desired end-effector co-ordinates would be fed into a neural network as in the form of a quaternion that describes the transformation required to go from the origin to the desired end-effector co-ordinates, along with the quaternions describing the current orientation of each link in the robotic arm.
As such the data we need to gather is -
These would be the input to the network; for training, we also need the corresponding outputs, which would be -
I computed all of this data by writing a script in Rust, using a kinematics library called k. The data computed was written to a JSON
file, as an array of JSON
objects, each representing one input-output pair. One such input-output pair is detailed below -
The script used to generate this data can be found here - https://github.com/DhruvDh/intelligent_robotics_project/blob/master/k/examples/my_arm.rs
{
// initially here means before moving the end effector,
// final pertains to values after the end-effector was placed at the desired co-ordinates
"l0": [-0.0008749962, 1.0989358, 0.10087502], // co-ordinates of l0 initially (input)
"l1": [-0.00087475777, 1.098697, 0.4008749], // co-ordinates of l1 initially
"l2": [0.047727853, 1.0989714, 0.10483825], // and so on
"l3": [-0.24934354, 1.0987016, 0.14665347],
"l0_rot": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // orientation of l0 initially (input)
"l1_rot": [-0.53895015, 0.45731235, 0.45810595, 0.5390149], // orientation of l1 initially
"l2_rot": [-0.04910004, -0.7051177, -0.7056401, 0.049701005], // and so on
"l3_rot": [-0.49960187, -0.49999973, -0.49999982, 0.50039816],
"l0_final": [-0.000079125166, 0.09939951, 0.100079186], // final co-ordinates for l0
"l1_final": [-0.00007888675, 0.09916064, 0.40007907], // final co-ordinates for l1
"l2_final": [0.009956747, 0.099407375, 0.100247085], // and so on
"l3_final": [-0.14984122, 0.09948231, -0.15365165],
"l0_rot_final": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // final orientation for l0
"l1_rot_final": [-0.5082875, 0.49116763, 0.49196374, 0.50830084], // final orientation for l1
"l2_rot_final": [0.34192163, -0.6186206, -0.6193856, -0.34170097],
"l3_rot_final": [-0.49960193, -0.4999997, -0.49999976, 0.50039816],
"l0_trans": [-0.00015924126, 0.19999987, 0.00015926361], // dx, dy, dz for l0 (change in co-ordinates)
"l1_trans": [-0.000079125166, 0.09939951, 0.100079186], // dx, dy, dz for l1 (change in co-ordinates)
"l2_trans": [-0.00007888675, 0.09916064, 0.40007907], // (output)
"l3_trans": [0.009956747, 0.099407375, 0.100247085],
"l0_rot_trans": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // change in orientation (output)
"l1_rot_trans": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // (output)
"l2_rot_trans": [-0.5082875, 0.49116763, 0.49196374, 0.50830084],
"l3_rot_trans": [0.34192163, -0.6186206, -0.6193856, -0.34170097],
"a_joint_pos": [0.7990161, 2.978866, -4.409823, 1.4309571], // initial joint parameters
"b_joint_pos": [-0.20052081, 3.1081338, -5.6879864, 2.5798528] // final joint parameters
}
The following piece of code reads the aforementioned JSON
file and makes a pandas dataframe out of it, which we will then convert to PyTorch tensors and feed them into the network.
data = pd.read_json('DATA.json')
data['l0'] = data['l0'].apply(lambda x: np.array([0.0] + x))
data['l1'] = data['l1'].apply(lambda x: np.array([0.0] + x))
data['l2'] = data['l2'].apply(lambda x: np.array([0.0] + x))
data['l3'] = data['l3'].apply(lambda x: np.array([0.0] + x))
data['l0_rot'] = data['l0_rot'].apply(np.array)
data['l1_rot'] = data['l1_rot'].apply(np.array)
data['l2_rot'] = data['l2_rot'].apply(np.array)
data['l3_rot'] = data['l3_rot'].apply(np.array)
data['l0_final'] = data['l0_final'].apply(lambda x: np.array([0.0] + x))
data['l1_final'] = data['l1_final'].apply(lambda x: np.array([0.0] + x))
data['l2_final'] = data['l2_final'].apply(lambda x: np.array([0.0] + x))
data['l3_final'] = data['l3_final'].apply(lambda x: np.array([0.0] + x))
data['l0_rot_final'] = data['l0_rot_final'].apply(np.array)
data['l1_rot_final'] = data['l1_rot_final'].apply(np.array)
data['l2_rot_final'] = data['l2_rot_final'].apply(np.array)
data['l3_rot_final'] = data['l3_rot_final'].apply(np.array)
data['l0_trans'] = data['l0_trans'].apply(lambda x: np.array([0.0] + x))
data['l1_trans'] = data['l1_trans'].apply(lambda x: np.array([0.0] + x))
data['l2_trans'] = data['l2_trans'].apply(lambda x: np.array([0.0] + x))
data['l3_trans'] = data['l3_trans'].apply(lambda x: np.array([0.0] + x))
data['l0_rot_trans'] = data['l0_rot_trans'].apply(np.array)
data['l1_rot_trans'] = data['l1_rot_trans'].apply(np.array)
data['l2_rot_trans'] = data['l2_rot_trans'].apply(np.array)
data['l3_rot_trans'] = data['l3_rot_trans'].apply(np.array)
# data = data[ ['l0', 'l1', 'l2', 'l3', 'l3_final',
# 'l0_rot', 'l1_rot', 'l2_rot', 'l3_rot', 'l3_rot_final',
# 'l0_trans', 'l1_trans', 'l2_trans',
# 'l0_rot_trans', 'l1_rot_trans', 'l2_rot_trans',
# 'l0_trans', 'l1_trans', 'l2_trans',
# 'l0_rot_trans', 'l1_rot_trans', 'l2_rot_trans'] ]
_data = data[ ['l0', 'l1', 'l2', 'l3', 'l3_final',
'l0_rot', 'l1_rot', 'l2_rot', 'l3_rot',
'l0_trans', 'l1_trans', 'l2_trans', 'l3_trans',
'l0_rot_trans', 'l1_rot_trans', 'l2_rot_trans', 'l3_rot_trans', 'b_joint_pos'] ]
_data.head()
The columns -
l0
l1
l2
l3
l0_rot
l1_rot
l2_rot
l3_rot
l3_final
will be given to the network as input, and we'll expect the following as output -
l0_trans
l1_trans
l2_trans
l3_trans
l0_rot_trans
l1_rot_trans
l2_rot_trans
l3_rot_trans
_data = np.array([np.hstack(list(x)) for x in list(_data.values)])
_data.shape
We have 136,446
input-output samples of data.
def train(_data, model):
loss = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, amsgrad=True)
_data = torch.tensor(_data).float()
dataset = torch.utils.data.TensorDataset(_data)
train_batches = torch.utils.data.DataLoader(
dataset,
batch_size=4084,
shuffle=True,
pin_memory=True
)
model.to(device)
print(model)
train_errors = []
for epoch in tqdm(range(100)):
errors = []
for batch in train_batches:
batch = batch[0].to(device)
x = batch.narrow(1, 0, 36)
y = batch.narrow(1, 36, 32)
optimizer.zero_grad()
pred = model(x)
error = loss(pred, y)
if not epoch == 0:
error.backward()
optimizer.step()
errors.append(error.data.item())
train_errors.append(np.mean(errors))
return train_errors, train_batches
9 quaternions input -> 8 quaternions output
)model = nn.Sequential(QuaternionLinearAutograd(36, 32))
loss_curve, _ = train(_data, model)
plot(loss_curve)
The final loss -
loss_curve[-1]
36 neurons input -> 32 neurons output
)model = nn.Sequential(nn.Linear(36, 32))
loss_curve, _ = train(_data, model)
plot(loss_curve)
The final loss -
loss_curve[-1]
9 quaternions input -> 10 quaternions output
)10 quaternions input -> 8 quaternions output
)model = nn.Sequential(
QuaternionLinearAutograd(36, 40),
nn.ELU(),
QuaternionLinearAutograd(40, 32)
)
loss_curve, _ = train(_data, model)
plot(loss_curve)
The final loss -
loss_curve[-1]
9 quaternions input -> 18 quaternions output
)18 quaternions input -> 8 quaternions output
)model = nn.Sequential(
QuaternionLinearAutograd(36, 72),
nn.ELU(),
QuaternionLinearAutograd(72, 32)
)
loss_curve, _ = train(_data, model)
plot(loss_curve)
The final loss -
loss_curve[-1]
36 neurons input -> 72 neurons output
)72 neurons input -> 32 neurons output
)model = nn.Sequential(
nn.Linear(36, 72),
nn.ELU(),
nn.Linear(72, 32)
)
loss_curve, _ = train(_data, model)
plot(loss_curve)
The final loss -
loss_curve[-1]