Robotic Arm Manipulation using Quaternion Neural Networks.

  • Dhruv Dhamani
  • ddhamani@uncc.edu

Introduction

A simple way to describe I've done in this project is to state that I am evaluating the efficacy of using Quaternion Neural Networks to do Inverse Kinematics. (I'll get to what Quaternion neural networks are in a moment, for now let's just say a neural network)

So if someone tells you that they're attempting to do inverse kinematics using a neural network, the first thing that comes to mind is that a neural network would be trained to predict the values for joint parameters for a robot for a given desired position of the robot's end-effectors.

However, instead of predicting what the values of the robot's joint parameters should be for the robot's end-effector be in a desired position, I intend to predict what the position of end of each link should be, such that the end-effector ends up in the desired position.

Just to sketch out what I mean -

What I mean

The motivation in doing this is simple, joint parameters are a nice way to encode the configuration of a robot, but you lose a lot of meaning in the process.

For instance, knowing where the end of the link will end up for a rotational joint is impossible unless you know the lenght of the link. As such I do not believe asking a neural network to predict joint parameters is a good idea - the joint parameters are just arbritary numbers that only have meaning once you are provided with additonal information about the robot, such as link lenght.

Also worth noting is that joint parameters, depending on the topology of their C-space, follow some abritary rules like 2π being the same as 4π, and follow such rules for only some of the joints (which are rotational), and not the others.

I think all of this means that joint parameters are just not conducive to neural network learning, and any attempts to make it happen would only work with a complex neural network that simply memorizes or overfits the input-output mapping.

Okay, so you say joint parameters won't work for learning. Now what?

Let's just take a step back and think about what we want to do -

  • We want to figure out how to move the links in our robot, so that the end-effector of the robot ends up at a certain position.
  • Since, I want to figure out how to move the links in our robot, we can also say that what we actually want is a transformation matrix which describes the movement we need to make for the ends of each link.

    When I say a transformation matrix describing a movement, what I mean is - . movement

  • We cannot however simply ask a neural network to predict a transformation matrix.
  • But what we can do, is decompose the transformation matrix into components representing rotation, and translation.
  • Translations can be described as a 3-vector, and rotations can be described by either 3-sets of euler angles (yaw-pitch-roll), orthogonal rotation matrices (SO3), or by quaternions.
  • I chose to represent the rotation component with a quaternion, as would be the natural choice for a Quaternion Neural Network.
  • And I chose to represent the translation component, as the imaginary part of quaternion. Just to jog everyone's memory -
    • Quaternions are 4-dimensional vectors - $a+x\hat{i}+y\hat{j}+z\hat{k}$
    • $a$ is the real part, while $x\hat{i}+y\hat{j}+z\hat{k}$ is the imaginary part.
    • For a set of euler angles $(u_{x},u_{y},u_{z})$, the corresponding quaternion is $q = cos(\frac{θ}{2}) +sin(\frac{θ}{2}) u_{x}\hat{i} +sin(\frac{θ}{2})u_{y}\hat{j} + sin(\frac{θ}{2})u_{z}\hat{k}$

    Quaternions are often associated with rotation and people tend to forget what they actually are - an extension of complex number's to 4 dimensions, complete with their own algebra. The $q * v * q^{-1}$ thing we do is just a trick that let's us rotate in 3-dimensional space.

  • As such I see nothing wrong with representing the 3-dimensional translation vector, as a quaternion by setting the 4th dimension (the real part) as zero. If $\vec{v} = (v_{x}, v_{y}, v{z})$ then the quaternion representation of $\vec{v}$ is $(0,v_{x}, v_{y}, v{z})$. It's just a point in 4-dimensional space.
  • So instead of having a neural network predict joint parameters (which I think won't work out well), or a transformation matrix, I thought of having it predict the translation and rotation components separately instead.

An Example.

I realised during my project demonstration what I am giving to the neural network as input, and what I expect out of it as output is not obvious. As such I'll use a very simple example to explain. Consider a single rotational joint attached to a fixed point -

Example

Let's say this is our desired end-effector position -

End-effector

Consider the movement we need to make here -

change

What we will give to the network as input is the co-ordinates of the end-point of each link - in this case just the one link - along with a quaternion depicting the orientation of each link. This is information on where the link is in space currently; along with the co-ordinates of the end effector. All represented as a quaternion with the real part set to zero.

We expect the neural network to tell us how to move each link so that the end-effector reaches the desired co-ordinates - by giving the translational and rotational components of the movement to be made separately.

Since we're currently in 2 dimensions, we can set the rotation component is $(1,0,0,0)$ (identity, signifies no rotation), and the translation is $(dx, dy)$, which would be depicted as a quaternion $(0, 0, dx, dy)$.

What is a Quaternion Neural Network?

Regular neural networks take in real-valued inputs, multiply them with real-valued weights, and then this output is propogated to the neurons in the next layer's as input.

Quaternion neural networks are just networks that take in quaternion inputs, multiply them with quaternion weights, and then this output is propogated to the neurons in the next layer.

Backpropogation is used to conduct learning. Nonlinear activation functions are usually non-analytic and thus quaternion derivatives cannot be used, as such the activation function is applied element-wise on each individual component of a quaternion.

In [1]:
import torch
import torch.nn as nn
import torchvision
%pylab inline

import numpy as np
import pandas as pd
import os
from tqdm import tqdm

from quaternion_layers import *

device = "cuda:0" if torch.cuda.is_available() else "cpu"
Populating the interactive namespace from numpy and matplotlib

The Robot

First let's take a look at the robot arm we'll be working with -

fixed [⚓]
    torso_linear [↕+Z]
        shoulder_yaw [⚙+Z]
            elbow_yaw [⚙+Z]
                wrist_yaw [⚙+Z]
  • Fixed joint shown by ⚓
  • Linear joint along the Z-axis shown by ↕+Z
  • Rotational joint along the Z-axis shown by ⚙+Z

I had trouble coming up with a good configuration for a robot arm with enough "reach" to be able to produce a nice and varied dataset, and this was the best I could come up with at the time.

This is what it looks like - each cube represents an end-point of a link. There's a small black cube that represents the fixed joint, the pink cube is the endpoint of the first link l0, red is the endpoint of the second link l1, so on and so forth.

The arm

And this is how it can move -

The moving arm

The Data

To quote my project proposal, I had said -

A 2R, 3R, and 4R robotic arm would be simulated using ROS and Gazebo. The user would input either via console or through a web interface the desired co-ordinates for the end-effector.

The desired end-effector co-ordinates would be fed into a neural network as in the form of a quaternion that describes the transformation required to go from the origin to the desired end-effector co-ordinates, along with the quaternions describing the current orientation of each link in the robotic arm.

As such the data we need to gather is -

  • Desired end-effector co-ordinates
  • A vector for each link that goes from the origin to the end of that link as shown below (as a quaternion)

    drawing

  • A quaternion for the initial orientation of the link

These would be the input to the network; for training, we also need the corresponding outputs, which would be -

  • The translation needed to be made for each link, as a quaternion.
  • The rotation needed to be made for each link, as a quaternion.

I computed all of this data by writing a script in Rust, using a kinematics library called k. The data computed was written to a JSON file, as an array of JSON objects, each representing one input-output pair. One such input-output pair is detailed below -

The script used to generate this data can be found here - https://github.com/DhruvDh/intelligent_robotics_project/blob/master/k/examples/my_arm.rs

{
    // initially here means before moving the end effector,
    // final pertains to values after the end-effector was placed at the desired co-ordinates

    "l0": [-0.0008749962, 1.0989358, 0.10087502], // co-ordinates of l0 initially (input)
    "l1": [-0.00087475777, 1.098697, 0.4008749],  // co-ordinates of l1 initially
    "l2": [0.047727853, 1.0989714, 0.10483825],   // and so on
    "l3": [-0.24934354, 1.0987016, 0.14665347],
    "l0_rot": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // orientation of l0 initially (input)
    "l1_rot": [-0.53895015, 0.45731235, 0.45810595, 0.5390149],   // orientation of l1 initially
    "l2_rot": [-0.04910004, -0.7051177, -0.7056401, 0.049701005], // and so on
    "l3_rot": [-0.49960187, -0.49999973, -0.49999982, 0.50039816],
    "l0_final": [-0.000079125166, 0.09939951, 0.100079186], // final co-ordinates for l0 
    "l1_final": [-0.00007888675, 0.09916064, 0.40007907],   // final co-ordinates for l1
    "l2_final": [0.009956747, 0.099407375, 0.100247085],    // and so on
    "l3_final": [-0.14984122, 0.09948231, -0.15365165],
    "l0_rot_final": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // final orientation for l0
    "l1_rot_final": [-0.5082875, 0.49116763, 0.49196374, 0.50830084],   // final orientation for l1
    "l2_rot_final": [0.34192163, -0.6186206, -0.6193856, -0.34170097],
    "l3_rot_final": [-0.49960193, -0.4999997, -0.49999976, 0.50039816],
    "l0_trans": [-0.00015924126, 0.19999987, 0.00015926361],  // dx, dy, dz for l0 (change in co-ordinates)
    "l1_trans": [-0.000079125166, 0.09939951, 0.100079186],   // dx, dy, dz for l1 (change in co-ordinates)
    "l2_trans": [-0.00007888675, 0.09916064, 0.40007907],     // (output)
    "l3_trans": [0.009956747, 0.099407375, 0.100247085],
    "l0_rot_trans": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // change in orientation (output)
    "l1_rot_trans": [-0.49960187, -0.49999982, -0.49999982, 0.5003981], // (output)
    "l2_rot_trans": [-0.5082875, 0.49116763, 0.49196374, 0.50830084],
    "l3_rot_trans": [0.34192163, -0.6186206, -0.6193856, -0.34170097],
    "a_joint_pos": [0.7990161, 2.978866, -4.409823, 1.4309571],    // initial joint parameters
    "b_joint_pos": [-0.20052081, 3.1081338, -5.6879864, 2.5798528] // final joint parameters
}

The following piece of code reads the aforementioned JSON file and makes a pandas dataframe out of it, which we will then convert to PyTorch tensors and feed them into the network.

In [2]:
data = pd.read_json('DATA.json')

data['l0'] = data['l0'].apply(lambda x: np.array([0.0] + x))
data['l1'] = data['l1'].apply(lambda x: np.array([0.0] + x))
data['l2'] = data['l2'].apply(lambda x: np.array([0.0] + x))
data['l3'] = data['l3'].apply(lambda x: np.array([0.0] + x))
data['l0_rot'] = data['l0_rot'].apply(np.array)
data['l1_rot'] = data['l1_rot'].apply(np.array)
data['l2_rot'] = data['l2_rot'].apply(np.array)
data['l3_rot'] = data['l3_rot'].apply(np.array)

data['l0_final'] = data['l0_final'].apply(lambda x: np.array([0.0] + x))
data['l1_final'] = data['l1_final'].apply(lambda x: np.array([0.0] + x))
data['l2_final'] = data['l2_final'].apply(lambda x: np.array([0.0] + x))
data['l3_final'] = data['l3_final'].apply(lambda x: np.array([0.0] + x))
data['l0_rot_final'] = data['l0_rot_final'].apply(np.array)
data['l1_rot_final'] = data['l1_rot_final'].apply(np.array)
data['l2_rot_final'] = data['l2_rot_final'].apply(np.array)
data['l3_rot_final'] = data['l3_rot_final'].apply(np.array)

data['l0_trans'] = data['l0_trans'].apply(lambda x: np.array([0.0] + x))
data['l1_trans'] = data['l1_trans'].apply(lambda x: np.array([0.0] + x))
data['l2_trans'] = data['l2_trans'].apply(lambda x: np.array([0.0] + x))
data['l3_trans'] = data['l3_trans'].apply(lambda x: np.array([0.0] + x))
data['l0_rot_trans'] = data['l0_rot_trans'].apply(np.array)
data['l1_rot_trans'] = data['l1_rot_trans'].apply(np.array)
data['l2_rot_trans'] = data['l2_rot_trans'].apply(np.array)
data['l3_rot_trans'] = data['l3_rot_trans'].apply(np.array)

# data = data[ ['l0', 'l1', 'l2', 'l3', 'l3_final',
#               'l0_rot', 'l1_rot', 'l2_rot', 'l3_rot', 'l3_rot_final',
#             'l0_trans', 'l1_trans', 'l2_trans',
#              'l0_rot_trans', 'l1_rot_trans', 'l2_rot_trans',
#               'l0_trans', 'l1_trans', 'l2_trans',
#              'l0_rot_trans', 'l1_rot_trans', 'l2_rot_trans'] ]

_data = data[ ['l0', 'l1', 'l2', 'l3', 'l3_final',
              'l0_rot', 'l1_rot', 'l2_rot', 'l3_rot',
              'l0_trans', 'l1_trans', 'l2_trans', 'l3_trans',
             'l0_rot_trans', 'l1_rot_trans', 'l2_rot_trans', 'l3_rot_trans', 'b_joint_pos'] ]
_data.head()
Out[2]:
l0 l1 l2 l3 l3_final l0_rot l1_rot l2_rot l3_rot l0_trans l1_trans l2_trans l3_trans l0_rot_trans l1_rot_trans l2_rot_trans l3_rot_trans b_joint_pos
0 [0.0, -0.00019985437, 0.2510149, 0.10019991] [0.0, -0.00019961595999999998, 0.25077602, 0.4... [0.0, 0.034441292000000005, 0.25056633, 0.698193] [0.0, -0.24929011, 0.25026283, 0.7956424400000... [0.0, 0.14988443, -0.14983344, -0.103550166000... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.52768135, -0.4702704, -0.47022435, 0.52847... [-0.11611071, -0.6972231999999999, -0.69768584... [-0.4996019, -0.4999998, -0.4999998, 0.50039816] [0.0, -0.00015924125999999998, 0.19999987, 0.0... [0.0, 0.000119552016, -0.15011469, 0.09988052] [0.0, 0.000119790435, -0.15035355, 0.399880399... [0.0, 0.21400562, -0.15001574, 0.18951705] [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.65424514, 0.26751652, 0.26825088, 0.65455294] [0.44339997000000003, -0.55045277, -0.5512444,... [-0.45003515, 2.3478916, -5.2740855, 2.9261942]
1 [0.0, 0.000119552016, -0.15011469, 0.09988052] [0.0, 0.000119790435, -0.15035355, 0.399880399... [0.0, 0.21400562, -0.15001574, 0.18951705] [0.0, 0.14988443, -0.14983344, -0.103550166000... [0.0, 0.24937618, -0.15022174, -0.10424779] [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.65424514, 0.26751652, 0.26825088, 0.65455294] [0.44339997000000003, -0.55045277, -0.5512444,... [-0.49960193, -0.49999976, -0.4999997300000000... [0.0, -0.00015924125999999998, 0.19999987, 0.0... [0.0, 0.000119924545, -0.15058279, 0.09988013] [0.0, 0.000120162964, -0.15082166, 0.39988002] [0.0, 0.21847203, -0.15048397, 0.1941560700000... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.6570787, 0.26047919999999997, 0.26121026, ... [0.5251017, -0.473149, -0.47394422, -0.5251428] [-0.45050326, 2.3264256, -5.5712156, 3.2447903]
2 [0.0, 0.000119924545, -0.15058279, 0.09988013] [0.0, 0.000120162964, -0.15082166, 0.39988002] [0.0, 0.21847203, -0.15048397, 0.1941560700000... [0.0, 0.24937618, -0.15022174, -0.10424779] [0.0, -0.24936235, -0.050322868, -0.10379206] [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.6570787, 0.26047919999999997, 0.26121026, ... [0.5251017, -0.473149, -0.47394422, -0.5251428] [-0.4996019, -0.49999976, -0.49999976, 0.50039... [0.0, -0.00015924125999999998, 0.19999987, 0.0... [0.0, 4.0054319999999995e-05, -0.0502865, 0.09... [0.0, 4.029274e-05, -0.050525367, 0.39995986] [0.0, -0.030634344, -0.05031219, 0.10153228] [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.473769, 0.52454245, 0.52533764, 0.47372824] [0.26040164, -0.6571095, -0.6578402, -0.26008534] [-0.3502069, 3.24402, -5.568617, 2.3245971]
3 [0.0, 4.0054319999999995e-05, -0.0502865, 0.09... [0.0, 4.029274e-05, -0.050525367, 0.39995986] [0.0, -0.030634344, -0.05031219, 0.10153228] [0.0, -0.24936235, -0.050322868, -0.10379206] [0.0, -0.15061736, -0.05022552, -0.10425120600... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.473769, 0.52454245, 0.52533764, 0.47372824] [0.26040164, -0.6571095, -0.6578402, -0.26008534] [-0.49960196, -0.49999976, -0.4999998, 0.5003982] [0.0, -0.00015924125999999998, 0.19999987, 0.0... [0.0, 4.0039419999999996e-05, -0.0502681730000... [0.0, 4.027784e-05, -0.05050704, 0.3999599] [0.0, 0.06279153, -0.050223485000000005, 0.106... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.5497696400000001, 0.4442468, 0.44503862, 0... [0.26877823, -0.6537278999999999, -0.6544626, ... [-0.35018859999999996, 2.930865, -5.2810225, 2...
4 [0.0, 4.0039419999999996e-05, -0.0502681730000... [0.0, 4.027784e-05, -0.05050704, 0.3999599] [0.0, 0.06279153, -0.050223485000000005, 0.106... [0.0, -0.15061736, -0.05022552, -0.10425120600... [0.0, -0.05064021, -0.05022476, -0.10428929] [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.5497696400000001, 0.4442468, 0.44503862, 0... [0.26877823, -0.6537278999999999, -0.6544626, ... [-0.4996019, -0.49999988, -0.49999988, 0.5003981] [0.0, -0.00015924125999999998, 0.19999987, 0.0... [0.0, 4.0099025e-05, -0.05034703, 0.09995994] [0.0, 4.0337443e-05, -0.050585896000000005, 0.... [0.0, 0.1344908, -0.050265312000000006, 0.1317... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.49960187, -0.49999982, -0.49999982, 0.5003... [-0.60160756, 0.37103929999999996, 0.37181413,... [0.30952302, -0.6354504999999999, -0.63620305,... [-0.35026747, 2.6768768, -5.1534133, 2.4765368]

The columns -

  • l0
  • l1
  • l2
  • l3
  • l0_rot
  • l1_rot
  • l2_rot
  • l3_rot
  • l3_final

will be given to the network as input, and we'll expect the following as output -

  • l0_trans
  • l1_trans
  • l2_trans
  • l3_trans
  • l0_rot_trans
  • l1_rot_trans
  • l2_rot_trans
  • l3_rot_trans
In [3]:
_data = np.array([np.hstack(list(x)) for x in list(_data.values)])
_data.shape
Out[3]:
(136446, 72)

We have 136,446 input-output samples of data.

In [4]:
def train(_data, model):
    loss = torch.nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01, amsgrad=True)
    
    _data = torch.tensor(_data).float()
    dataset = torch.utils.data.TensorDataset(_data)

    train_batches = torch.utils.data.DataLoader(
        dataset,
        batch_size=4084,
        shuffle=True,
        pin_memory=True
    )
    model.to(device)
    
    print(model)
    
    train_errors = []
    for epoch in tqdm(range(100)):
        errors = []
        for batch in train_batches:
            batch = batch[0].to(device)
            x = batch.narrow(1, 0, 36)
            y = batch.narrow(1, 36, 32)

            optimizer.zero_grad()
            pred = model(x)
            error = loss(pred, y)

            if not epoch == 0:
                error.backward()
                optimizer.step()

            errors.append(error.data.item())

        train_errors.append(np.mean(errors))
    
    return train_errors, train_batches

Experiment - 1

  • A single Quaternion Linear Layer (9 quaternions input -> 8 quaternions output)
In [5]:
model = nn.Sequential(QuaternionLinearAutograd(36, 32))
loss_curve, _ = train(_data, model)
plot(loss_curve)
Sequential(
  (0): QuaternionLinearAutograd(in_features=9, out_features=8, bias=True, init_criterion=glorot, weight_init=quaternion, seed=552)
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:35<00:00,  1.95s/it]
Out[5]:
[<matplotlib.lines.Line2D at 0x1ec11d28da0>]

The final loss -

In [6]:
loss_curve[-1]
Out[6]:
0.010426114915924914

Experiment - 2

  • A single Linear Layer (36 neurons input -> 32 neurons output)
In [7]:
model = nn.Sequential(nn.Linear(36, 32))
loss_curve, _ = train(_data, model)
plot(loss_curve)
Sequential(
  (0): Linear(in_features=36, out_features=32, bias=True)
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:32<00:00,  1.69s/it]
Out[7]:
[<matplotlib.lines.Line2D at 0x1ec0b75b400>]

The final loss -

In [8]:
loss_curve[-1]
Out[8]:
0.0077298307270907305

Experiment - 3

  1. Quaternion Linear Layer (9 quaternions input -> 10 quaternions output)
  2. ELU activation.
  3. Quaternion Linear Layer (10 quaternions input -> 8 quaternions output)
In [9]:
model = nn.Sequential(
    QuaternionLinearAutograd(36, 40),
    nn.ELU(),
    QuaternionLinearAutograd(40, 32)
)
loss_curve, _ = train(_data, model)
plot(loss_curve)
Sequential(
  (0): QuaternionLinearAutograd(in_features=9, out_features=10, bias=True, init_criterion=glorot, weight_init=quaternion, seed=648)
  (1): ELU(alpha=1.0)
  (2): QuaternionLinearAutograd(in_features=10, out_features=8, bias=True, init_criterion=glorot, weight_init=quaternion, seed=1003)
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [03:49<00:00,  2.77s/it]
Out[9]:
[<matplotlib.lines.Line2D at 0x1ec27feb4a8>]

The final loss -

In [10]:
loss_curve[-1]
Out[10]:
0.005530443877967841

Experiment - 4

  1. Quaternion Linear Layer (9 quaternions input -> 18 quaternions output)
  2. ELU activation.
  3. Quaternion Linear Layer (18 quaternions input -> 8 quaternions output)
In [11]:
model = nn.Sequential(
    QuaternionLinearAutograd(36, 72),
    nn.ELU(),
    QuaternionLinearAutograd(72, 32)
)
loss_curve, _ = train(_data, model)
plot(loss_curve)
Sequential(
  (0): QuaternionLinearAutograd(in_features=9, out_features=18, bias=True, init_criterion=glorot, weight_init=quaternion, seed=1068)
  (1): ELU(alpha=1.0)
  (2): QuaternionLinearAutograd(in_features=18, out_features=8, bias=True, init_criterion=glorot, weight_init=quaternion, seed=460)
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [03:42<00:00,  2.27s/it]
Out[11]:
[<matplotlib.lines.Line2D at 0x1ec2b5d3a58>]

The final loss -

In [12]:
loss_curve[-1]
Out[12]:
0.004528026507400414

Experiment - 5

  1. Linear Layer (36 neurons input -> 72 neurons output)
  2. ELU activation.
  3. Quaternion Linear Layer (72 neurons input -> 32 neurons output)
In [13]:
model = nn.Sequential(
    nn.Linear(36, 72),
    nn.ELU(),
    nn.Linear(72, 32)
)
loss_curve, _ = train(_data, model)
plot(loss_curve)
Sequential(
  (0): Linear(in_features=36, out_features=72, bias=True)
  (1): ELU(alpha=1.0)
  (2): Linear(in_features=72, out_features=32, bias=True)
)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [03:12<00:00,  2.25s/it]
Out[13]:
[<matplotlib.lines.Line2D at 0x1ec2b5d2748>]

The final loss -

In [14]:
loss_curve[-1]
Out[14]:
0.0034824326045482475