TensorFlow tutorial

In this tutorial we'll show how to build deep learning models in Tribuo, using Tribuo's TensorFlow interface. Tribuo uses TensorFlow-Java which is build by the TensorFlow SIG-JVM group. Tribuo's development team are active participants in SIG-JVM, and we're trying to make TensorFlow work well for everyone on the Java platform, in addition to making it work well inside Tribuo.

Note that Tribuo's TensorFlow interface is not covered by the same stability guarantee as the rest of Tribuo. SIG-JVM has not released a 1.0 version of the TensorFlow Java API, and the API is currently in flux. When TensorFlow Java has API stability we'll be able to stabilize Tribuo's TensorFlow interface to provide the same guarantees as the rest of Tribuo.

We're going to train MLPs (Multi-Layer Perceptrons) for classification and regression, along with a CNN (Convolutional Neural Network) for classifying MNIST digits. We'll discuss loading in externally trained TensorFlow models and serving them alongside Tribuo's natively trained models. Finally we'll see how to export TensorFlow models trained in Tribuo into TensorFlow's SavedModelBundle format for interop with TensorFlow Serving and the rest of the TensorFlow ecosystem.

Unfortunately TensorFlow-Java has some non-determinism in it's gradient calculations which we're working on fixing in the TensorFlow-Java project, so repeated runs of this notebook will not produce identical answers, which unfortunately breaks some of Tribuo's provenance and reproducibility guarantees. When this is fixed upstream we'll apply any necessary fixes in Tribuo.

Setup

You'll need to get a copy of the MNIST dataset in the original IDX format. You may have this from the configuration tutorial, in which case you can skip this step.

First the training data:

wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz

Then the test data:

wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

Tribuo's IDX loader natively reads gzipped files so you don't need to unzip them.

We'll also need to download the winequality dataset from UCI. Again, if you've followed the regression tutorial you might already have this, so you can skip this step.

wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv

Next we'll load the Tribuo TensorFlow jar and import the packages we'll need for the rest of the tutorial.

In [1]:
%jars ./tribuo-tensorflow-4.3.0-jar-with-dependencies.jar
In [2]:
import java.nio.file.Path;
import java.nio.file.Paths;
In [3]:
import org.tribuo.*;
import org.tribuo.data.csv.CSVLoader;
import org.tribuo.datasource.IDXDataSource;
import org.tribuo.evaluation.TrainTestSplitter;
import org.tribuo.classification.*;
import org.tribuo.classification.evaluation.*;
import org.tribuo.interop.tensorflow.*;
import org.tribuo.interop.tensorflow.example.*;
import org.tribuo.regression.*;
import org.tribuo.regression.evaluation.*;
import org.tribuo.util.Util;
In [4]:
import org.tensorflow.*;
import org.tensorflow.framework.initializers.*;
import org.tensorflow.ndarray.Shape;
import org.tensorflow.op.*;
import org.tensorflow.op.core.*;
import org.tensorflow.types.*;

Loading the data

This is the same as the configuration and regression tutorials respectively, first we instantiate a DataSource for the particular dataset, then feed the data sources into datasets. We'll need to split the wine quality dataset into train & test as it doesn't have a predefined train/test split.

In [5]:
// First we load winequality
var regressionFactory = new RegressionFactory();
var regEval = new RegressionEvaluator();
var csvLoader = new CSVLoader<>(';',regressionFactory);
var wineSource = csvLoader.loadDataSource(Paths.get("winequality-red.csv"),"quality");
var wineSplitter = new TrainTestSplitter<>(wineSource, 0.7f, 0L);
var wineTrain = new MutableDataset<>(wineSplitter.getTrain());
var wineTest = new MutableDataset<>(wineSplitter.getTest());

// Now we load MNIST
var labelFactory = new LabelFactory();
var labelEval = new LabelEvaluator();
var mnistTrainSource = new IDXDataSource<>(Paths.get("train-images-idx3-ubyte.gz"),Paths.get("train-labels-idx1-ubyte.gz"),labelFactory);
var mnistTestSource = new IDXDataSource<>(Paths.get("t10k-images-idx3-ubyte.gz"),Paths.get("t10k-labels-idx1-ubyte.gz"),labelFactory);
var mnistTrain = new MutableDataset<>(mnistTrainSource);
var mnistTest = new MutableDataset<>(mnistTestSource);

Defining a TensorFlow graph

Tribuo's TensorFlow API operates on TensorFlow graphs. You can construct those using TensorFlow's Java API, load in ones already generated by another TensorFlow API, or use one of Tribuo's example graph generators. We're going to define a simple MLP for the wine quality regression task in the notebook, but we'll use Tribuo's example graph generators for classifying MNIST (to make this tutorial a little shorter).

TensorFlow Java is working on a higher level layer wise API (similar to Keras), but at the moment we have to define the graph using the low level ops. Once the layer API is available in TensorFlow Java, we'll add entry points so that those APIs can be used with Tribuo, making the next section of this tutorial a lot shorter. For the moment it'll be rather long, but hopefully it's not too hard to follow.

Tribuo's TensorFlow trainer will add the appropriate output node, loss function and gradient optimizer, so what you need to supply is the graph which emits the output (before any softmax, sigmoid or other output function), the name of the output op, the names of the input ops and the name of the graph initialization op.

Building a regression model using an MLP

To solve this regression task we're going to build a 3 layer neural network, where each layer is a "dense" or "MLP" layer. We'll use a sigmoid as the activation function, but any supported one in TensorFlow will work. We'll need to know the number of input features and the number of output dimensions (i.e., the number of labels or regression dimensions), which is a little unfortunate as nothing else in Tribuo requires it, but it's required to build the structure.

In [6]:
var wineGraph = new Graph();
// This object is used to write operations into the graph
var wineOps = Ops.create(wineGraph);
var wineInputName = "WINE_INPUT";
long wineNumFeatures = wineTrain.getFeatureMap().size();
var wineInitializer = new Glorot<TFloat32>(// Initializer distribution
                                           VarianceScaling.Distribution.TRUNCATED_NORMAL,
                                           // Initializer seed
                                           Trainer.DEFAULT_SEED
                                          );

// The input placeholder that we'll feed the features into
var wineInput = wineOps.withName(wineInputName).placeholder(TFloat32.class,
                Placeholder.shape(Shape.of(-1, wineNumFeatures)));
                
// Fully connected layer (numFeatures -> 30)
var fc1Weights = wineOps.variable(wineInitializer.call(wineOps,wineOps.array(wineNumFeatures, 30L),
                                                   TFloat32.class));
var fc1Biases = wineOps.variable(wineOps.fill(wineOps.array(30), wineOps.constant(0.1f)));
var sigmoid1 = wineOps.math.sigmoid(wineOps.math.add(wineOps.linalg.matMul(wineInput, fc1Weights),
                                             fc1Biases));

// Fully connected layer (30 -> 20)
var fc2Weights = wineOps.variable(wineInitializer.call(wineOps,wineOps.array(30L, 20L), 
                                                       TFloat32.class));
var fc2Biases = wineOps.variable(wineOps.fill(wineOps.array(20), wineOps.constant(0.1f)));
var sigmoid2 = wineOps.math.sigmoid(wineOps.math.add(wineOps.linalg.matMul(sigmoid1, fc2Weights), 
                                             fc2Biases));

// Output layer (20 -> 1)
var outputWeights = wineOps.variable(wineInitializer.call(wineOps,wineOps.array(20L, 1L), 
                                                          TFloat32.class));
var outputBiases = wineOps.variable(wineOps.fill(wineOps.array(1), wineOps.constant(0.1f)));
var outputOp = wineOps.math.add(wineOps.linalg.matMul(sigmoid2, outputWeights), outputBiases);

// Get the operation name to pass into the trainer
var wineOutputName = outputOp.op().name();

We can query the operation names by asking the various objects for their name(), which Tribuo will use to supply the appropriate inputs and outputs to the graph during training and inference.

Now we have the graph, input name, output name and init name, stored in wineGraph, wineInputName, wineOutputName and wineInitName respectively. Next we'll define the gradient optimization algorithm and it's hyperparameters. These are separate from Tribuo's built in gradient optimizers as they are part of the TensorFlow native library, but it turns out that most of the same algorithms are available. We're going to use AdaGrad, set it's learning rate to 0.1f and the initial accumulator value to 0.01f.

In [7]:
var gradAlgorithm = GradientOptimiser.ADAGRAD;
var gradParams = Map.of("learningRate",0.1f,"initialAccumulatorValue",0.01f);

We also need to create an object to convert from Tribuo's feature representation to a TensorFlow Tensor, and an object that can convert to and from Tensor and Regressor. These are defined using the ExampleTransformer and OutputTransformer interfaces.

Converting Features into Tensors with FeatureConverter

Tribuo provides two implementations of FeatureConverter, one for dense inputs (like those used by MLPs) called DenseFeatureConverter and one for image shaped inputs (like those used by CNNs) called ImageConverter. If you need more specialised transformations (e.g., text) then you should implement the FeatureConverter interface and tailor it to your task's needs.

The FeatureConverter needs the name of the input placeholder which the features will be fed into, so it can produce the appropriate values in the Map that is fed into the TensorFlow graph.

Converting Outputs into Tensors (and back again) with OutputConverter

There are implementations of OutputConverter for Label, MultiLabel and Regressor, as those cover the main use cases for TensorFlow. You are free to implement these interfaces for more specialised use cases, though they should be thread-safe and idempotent. The OutputConverter contains the loss function and output function which is used to attach the appropriate training hooks to the graph. LabelConverter uses the softmax function to produce probabilistic outputs, and the Categorical Cross Entropy to provide the loss for back-propagation. RegressorConverter uses the identity function to produce the output (as it's already producing a real value), and the Mean-Squared Error as the loss function. MultiLabelConverter uses an independent sigmoid function for each label as the output, thresholded at 0.5, and Binary Cross Entropy as the loss function.

In [8]:
var wineDenseConverter = new DenseFeatureConverter(wineInputName);
var wineOutputConverter = new RegressorConverter();

We're finally ready to build our first TensorFlowTrainer. We need to specify a few more parameters in the constructor, namely the training batch size, the test batch size, and the number of training epochs. We'll set the batch sizes to 16 for all experiments, and we use 100 epochs for the regression task (because it's a small dataset), 20 epochs for the MNIST MLP, and 3 for the MNIST CNN (as the CNN converges much faster than the MLP).

In [9]:
var wineTrainer = new TensorFlowTrainer<Regressor>(wineGraph,
                wineOutputName,
                gradAlgorithm,
                gradParams,
                wineDenseConverter,
                wineOutputConverter,
                16,   // training batch size
                100,  // number of training epochs
                16,   // test batch size of the trained model
                -1    // disable logging of the loss value
                );

// Now we close the original graph to free the associated native resources.
// The TensorFlowTrainer keeps a copy of the GraphDef protobuf to rebuild it when necessary.
wineGraph.close();

TensorFlowTrainer will accept a Graph, a GraphDef protobuf, or a path to a GraphDef protobuf on disk. The Graph should be closed after it's supplied to the trainer, to free the native resources associated with it. Tribuo manages a copy of the Graph inside the trainer so users don't need to worry about resource allocation. The trainer automatically adds the loss function, gradient update operations and the final output operation to the supplied graph.

We can use this trainer the way we'd use any other Tribuo trainer, we call trainer.train() and pass it in a dataset. In the case of TensorFlow it will throw an IllegalArgumentException if the number of features or outputs in the training dataset doesn't match what the trainer is expecting, as those parameters are coupled to the graph structure.

In [10]:
var wineStart = System.currentTimeMillis();
var wineModel = wineTrainer.train(wineTrain);
var wineEnd = System.currentTimeMillis();
System.out.println("Wine quality training took " + Util.formatDuration(wineStart,wineEnd));
Wine quality training took (00:00:02:255)

And we can evaluate it in the same way we evaluate other Tribuo regression models:

In [11]:
var wineEvaluation = regEval.evaluate(wineModel,wineTest);
var dimension = new Regressor("DIM-0",Double.NaN);
System.out.println(String.format("Wine quality evaluation:%n  RMSE %f%n  MAE %f%n  R^2 %f%n",
            wineEvaluation.rmse(dimension),
            wineEvaluation.mae(dimension),
            wineEvaluation.r2(dimension)));
Wine quality evaluation:
  RMSE 0.649654
  MAE 0.507282
  R^2 0.351000

We can see the MLP did ok there, and it's managed to fit the task almost as well as the tree ensemble we showed in the regression tutorial. If we standardised the regression variable, and performed further tuning of the architecture and gradient parameters we could improve on this, but let's move on to classification.

Building a classification model using an MLP

Building classification models using the TensorFlow interface is pretty similar to building regression models, thanks to Tribuo's common API for these tasks. The differences come in the choice of OutputConverter.

We're going to use Tribuo's MLPExamples and CNNExamples to build the networks for MNIST, as it's a bit shorter. These classes build simple predefined TensorFlow Graphs which are useful for demos, Tribuo's tests and getting started with deep learning. Currently there aren't many options in those classes, but we plan to expand them over time, and we welcome community contributions to do so. If you're interested in how the graphs are constructed you can check out the source for them on GitHub. For complex tasks we recommend that users build their own Graphs just as we did in the regression portion of the tutorial. TensorFlow-Java exposes a wide variety of operations for building graphs, and as the high level API improves it will become easier to specify complex structures.

Tribuo's graph building functions return a GraphDefTuple, which is a nominal tuple for a GraphDef along with the strings representing the necessary operation names. As Tribuo targets Java 8 and upwards it's not a java.lang.Record, but it will be one day.

In [12]:
var mnistInputName = "MNIST_INPUT";
var mnistMLPTuple = MLPExamples.buildMLPGraph(
                        mnistInputName, // The input placeholder name
                        mnistTrain.getFeatureMap().size(), // The number of input features
                        new int[]{300,200,30}, // The hidden layer sizes
                        mnistTrain.getOutputs().size() // The number of output labels
                        );
var mnistDenseConverter = new DenseFeatureConverter(mnistInputName);
var mnistOutputConverter = new LabelConverter();

This built an MLP with 3 hidden layers. The first maps from the feature space to an internal dimension of size 300, then the second is also of size 200, and the third has an internal dimension of 30. Tribuo then adds an output layer mapping down from those 30 dimensions to the 10 output dimensions in MNIST, one per digit.

We'll use the same gradient optimiser as before, along with the same hyperparameters.

In [13]:
var mnistMLPTrainer = new TensorFlowTrainer<Label>(mnistMLPTuple.graphDef,
                mnistMLPTuple.outputName, // the name of the logit operation
                gradAlgorithm,            // the gradient descent algorithm
                gradParams,               // the gradient descent hyperparameters
                mnistDenseConverter,      // the input feature converter
                mnistOutputConverter,     // the output label converter
                16,  // training batch size
                20,  // number of training epochs
                16,  // test batch size of the trained model
                -1   // disable logging of the loss value
                );

And we train the model as before:

In [14]:
var mlpStart = System.currentTimeMillis();
var mlpModel = mnistMLPTrainer.train(mnistTrain);
var mlpEnd = System.currentTimeMillis();
System.out.println("MNIST MLP training took " + Util.formatDuration(mlpStart,mlpEnd));
MNIST MLP training took (00:01:08:593)

And evaluate it in the standard way:

In [15]:
var mlpEvaluation = labelEval.evaluate(mlpModel,mnistTest);
System.out.println(mlpEvaluation.toString());
System.out.println(mlpEvaluation.getConfusionMatrix().toString());
Class                           n          tp          fn          fp      recall        prec          f1
0                             980         960          20          94       0.980       0.911       0.944
1                           1,135       1,105          30          23       0.974       0.980       0.977
2                           1,032         922         110          39       0.893       0.959       0.925
3                           1,010         916          94         121       0.907       0.883       0.895
4                             982         930          52          61       0.947       0.938       0.943
5                             892         813          79          74       0.911       0.917       0.914
6                             958         914          44          36       0.954       0.962       0.958
7                           1,028         942          86          37       0.916       0.962       0.939
8                             974         893          81         107       0.917       0.893       0.905
9                           1,009         926          83          87       0.918       0.914       0.916
Total                      10,000       9,321         679         679
Accuracy                                                                    0.932
Micro Average                                                               0.932       0.932       0.932
Macro Average                                                               0.932       0.932       0.931
Balanced Error Rate                                                         0.068
               0       1       2       3       4       5       6       7       8       9
0            960       0       1       1       4       3       5       1       2       3
1              1   1,105       4       4       0       3       3       1      14       0
2             14       5     922      42      12       2       7       4      22       2
3             13       0       9     916       1      34       0       9      22       6
4              3       0       3       0     930       0      11       3       5      27
5             18       0       0      17       5     813       6       4      23       6
6             11       4       1       0       5      16     914       0       6       1
7             10       7      15       8       4       1       0     942       4      37
8             12       2       6      28      10      12       4       2     893       5
9             12       5       0      21      20       3       0      13       9     926

An MLP works pretty well on MNIST, but when working with images it's usually better to exploit the natural structure, and for that we use a Convolutional Neural Network.

Training a Convolutional Neural Network

This is an even smaller transition than the switch between regression and classification. All we need to do is supply a ImageConverter which knows the size and pixel depth of the images, and build an appropriately shaped CNN.

We'll use CNNExamples.buildLeNetGraph to build a version of the venerable LeNet 5 CNN. We specify the image shape (this method assumes images are square), the pixel depth and the number of outputs. So for MNIST that's 28 pixels across, a pixel depth of 255, and 10 output classes one per digit. We'll also need the appropriate ImageConverter which needs the name of the input placeholder, the width and height of the image (so allowing rectangular images), and the number of colour channels. MNIST is grayscale, so there's only a single colour channel.

In [16]:
var mnistCNNTuple = CNNExamples.buildLeNetGraph(mnistInputName,28,255,mnistTrain.getOutputs().size());
var mnistImageConverter = new ImageConverter(mnistInputName,28,28,1);

We can build the trainer and train in the same way as before, but we will train for fewer epochs as the CNN converges faster:

In [17]:
var mnistCNNTrainer = new TensorFlowTrainer<Label>(mnistCNNTuple.graphDef,
                mnistCNNTuple.outputName, // the name of the logit operation
                gradAlgorithm,            // the gradient descent algorithm
                gradParams,               // the gradient descent hyperparameters
                mnistImageConverter,      // the input feature converter
                mnistOutputConverter,     // the output label converter
                16, // training batch size
                3,  // number of training epochs
                16, // test batch size of the trained model
                -1  // disable logging of the loss value
                );
                
// Training the model
var cnnStart = System.currentTimeMillis();
var cnnModel = mnistCNNTrainer.train(mnistTrain);
var cnnEnd = System.currentTimeMillis();
System.out.println("MNIST CNN training took " + Util.formatDuration(cnnStart,cnnEnd));
MNIST CNN training took (00:01:59:597)

And evaluate it the standard way:

In [18]:
var cnnPredictions = cnnModel.predict(mnistTest);
var cnnEvaluation = labelEval.evaluate(cnnModel,cnnPredictions,mnistTest.getProvenance());
System.out.println(cnnEvaluation.toString());
System.out.println(cnnEvaluation.getConfusionMatrix().toString());
Class                           n          tp          fn          fp      recall        prec          f1
0                             980         968          12          22       0.988       0.978       0.983
1                           1,135       1,127           8          15       0.993       0.987       0.990
2                           1,032       1,011          21          59       0.980       0.945       0.962
3                           1,010         959          51          14       0.950       0.986       0.967
4                             982         977           5          35       0.995       0.965       0.980
5                             892         877          15          54       0.983       0.942       0.962
6                             958         934          24           9       0.975       0.990       0.983
7                           1,028         981          47          12       0.954       0.988       0.971
8                             974         931          43          29       0.956       0.970       0.963
9                           1,009         969          40          17       0.960       0.983       0.971
Total                      10,000       9,734         266         266
Accuracy                                                                    0.973
Micro Average                                                               0.973       0.973       0.973
Macro Average                                                               0.973       0.973       0.973
Balanced Error Rate                                                         0.027
               0       1       2       3       4       5       6       7       8       9
0            968       0       3       0       0       0       2       2       4       1
1              0   1,127       3       0       1       1       1       0       1       1
2              3       2   1,011       5       5       0       1       1       4       0
3              0       1      14     959       0      28       0       1       5       2
4              0       0       1       0     977       0       1       0       0       3
5              1       0       1       4       0     877       3       1       4       1
6             10       3       1       1       3       5     934       0       1       0
7              0       2      25       1       9       2       0     981       3       5
8              8       3      10       1       3      11       1       2     931       4
9              0       4       1       2      14       7       0       5       7     969

As we might expect, exploiting the structured nature of images lets us get better performance, with ~98% accuracy after only 3 epochs. There is a wide variety of different CNN architectures, each suited for different kinds of tasks. Some are even applied to sequential data like text.

Exporting and Importing TensorFlow models

TensorFlow's canonical model storage format is the SavedModelBundle. You can export TensorFlow models trained in Tribuo in this format by calling model.exportModel(String path) which writes a directory at that path which contains the model as a SavedModel.

In [19]:
var outputPath = "./tf-cnn-mnist-model";
cnnModel.exportModel(outputPath);

Tribuo can also load in SavedModels and serve them as an ExternalModel. See the external models tutorial for more details on how Tribuo works with models built in other packages. The short version is that you need to specify the mapping from Tribuo's feature names into the id numbers the model expects, and from the output indices to Tribuo's output dimensions. We'll show how to load in the CNN that we just exported, and validate that it gives the same predictions as the original.

First we'll setup the feature and output mappings. This is easy in our case as we already have the relevant information, but in most cases this requires understanding how the features were prepared when the original model was trained. We discuss this in more detail in the external models tutorial.

In [20]:
var outputMapping = new HashMap<Label,Integer>();
for (var p : cnnModel.getOutputIDInfo()) {
    outputMapping.put(p.getB(),p.getA());
}
var featureIDMap = cnnModel.getFeatureIDMap();
var featureMapping = new HashMap<String,Integer>();
for (var info : featureIDMap) {
    featureMapping.put(info.getName(),featureIDMap.getID(info.getName()));
}

Now we build the TensorFlowSavedModelExternalModel using it's factory, supplying the feature mapping, output mapping, the softmax output operation name, the image transformer, the label transformer and finally the path to the SavedModel on disk.

In [21]:
var externalModel = TensorFlowSavedModelExternalModel.createTensorflowModel(
                        labelFactory,             // the output factory
                        featureMapping,           // the feature mapping
                        outputMapping,            // the output mapping
                        cnnModel.getOutputName(), // the name of the *softmax* output
                        mnistImageConverter,      // the input feature converter
                        mnistOutputConverter,     // The label converter
                        outputPath.toString()     // path to the saved model
                        );

This model behaves like any other, so we pass it some test data and generate it's predictions.

In [22]:
var externalPredictions = externalModel.predict(mnistTest);

Now let's compare the output predictions. It's a little convoluted, but we're going to compare each predicted probability distribution to make sure they are the same.

In [23]:
var isEqual = true;
for (int i = 0; i < cnnPredictions.size(); i++) {
    var tribuo = cnnPredictions.get(i);
    var external = externalPredictions.get(i);
    isEqual &= tribuo.getOutput().fullEquals(external.getOutput());
    isEqual &= tribuo.distributionEquals(external);
}
System.out.println("Predictions are " + (isEqual ? "equal" : "not equal"));
Predictions are equal

As we can see, the models produce identical predictions, which means that we've successfully exported all our model weights and managed to load them back in as an external model.

Conclusion

We saw how to build MLPs and CNNs in Tribuo & TensorFlow for both regression and classification, along with how to export Tribuo-trained models into TensorFlow's format, and import TensorFlow SavedModels into Tribuo.

By default Tribuo pulls in the CPU version of TensorFlow Java, but if you supply the GPU jar at runtime it will automatically run everything on a compatible Nvidia GPU. We'll look at exposing explicit GPU support from Tribuo as the relevant support matures in TensorFlow Java.