Class SparseVector

java.lang.Object
org.tribuo.math.la.SparseVector
All Implemented Interfaces:
Serializable, Iterable<VectorTuple>, SGDVector, Tensor, ProtoSerializable<org.tribuo.math.protos.TensorProto>

public class SparseVector extends Object implements SGDVector
A sparse vector. Stored as a sorted array of indices and an array of values.

Uses binary search to look up a specific index, so it's usually faster to use the iterator to iterate the values.

This vector has immutable indices. It cannot get new indices after construction, and will throw IllegalArgumentException if such an operation is tried.

See Also:
  • Field Details

    • CURRENT_VERSION

      public static final int CURRENT_VERSION
      Protobuf serialization version.
      See Also:
    • indices

      protected final int[] indices
      The indices array.
    • values

      protected final double[] values
      The values array.
  • Constructor Details

    • SparseVector

      public SparseVector(int size, int[] indices, double value)
      Creates a sparse vector of the specified size, with the supplied value at each of the indices.
      Parameters:
      size - The vector size.
      indices - The active indices.
      value - The value for those indices.
  • Method Details

    • createSparseVector

      public static <T extends Output<T>> SparseVector createSparseVector(Example<T> example, ImmutableFeatureMap featureInfo, boolean addBias)
      Builds a SparseVector from an Example.

      Used in training and inference.

      Throws IllegalArgumentException if the Example contains NaN-valued features.

      Type Parameters:
      T - The type parameter of the example.
      Parameters:
      example - The example to convert.
      featureInfo - The feature information, used to calculate the dimension of this SparseVector.
      addBias - Add a bias feature.
      Returns:
      A SparseVector representing the example's features.
    • createSparseVector

      public static SparseVector createSparseVector(int dimension, int[] indices, double[] values)
      Defensively copies the input, and checks that the indices are sorted. If not, it sorts them.

      Throws IllegalArgumentException if the arrays are not the same length, or if size is less than the max index.

      Parameters:
      dimension - The dimension of this vector.
      indices - The indices of the non-zero elements.
      values - The values of the non-zero elements.
      Returns:
      A SparseVector encapsulating the indices and values.
    • createSparseVector

      public static SparseVector createSparseVector(int dimension, Map<Integer,Double> indexMap)
      Builds a SparseVector from a map.

      Throws IllegalArgumentException if dimension is less than the max index.

      Parameters:
      dimension - The dimension of this vector.
      indexMap - The map from indices to values.
      Returns:
      A SparseVector.
    • deserializeFromProto

      public static SparseVector deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
    • serialize

      public org.tribuo.math.protos.TensorProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Specified by:
      serialize in interface ProtoSerializable<org.tribuo.math.protos.TensorProto>
      Returns:
      The protobuf.
    • copy

      public SparseVector copy()
      Description copied from interface: SGDVector
      Returns a deep copy of this vector.
      Specified by:
      copy in interface SGDVector
      Specified by:
      copy in interface Tensor
      Returns:
      A copy of this vector.
    • getShape

      public int[] getShape()
      Description copied from interface: Tensor
      Returns an int array specifying the shape of this Tensor.
      Specified by:
      getShape in interface Tensor
      Returns:
      An int array.
    • reshape

      public Tensor reshape(int[] newShape)
      Description copied from interface: Tensor
      Reshapes the Tensor to the supplied shape. Throws IllegalArgumentException if the shape isn't compatible.
      Specified by:
      reshape in interface Tensor
      Parameters:
      newShape - The desired shape.
      Returns:
      A Tensor of the desired shape.
    • size

      public int size()
      Description copied from interface: SGDVector
      Returns the dimensionality of this vector.
      Specified by:
      size in interface SGDVector
      Returns:
      The dimensionality of the vector.
    • numActiveElements

      public int numActiveElements()
      Description copied from interface: SGDVector
      Returns the number of non-zero elements (on construction, an element could be set to zero and it would still remain active).
      Specified by:
      numActiveElements in interface SGDVector
      Returns:
      The number of non-zero elements.
    • equals

      public boolean equals(Object other)
      Equals is defined mathematically, that is two SGDVectors are equal iff they have the same indices and the same values at those indices.
      Overrides:
      equals in class Object
      Parameters:
      other - Object to compare against.
      Returns:
      True if this vector and the other vector contain the same values in the same order.
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • add

      public SGDVector add(SGDVector other)
      Adds other to this vector, producing a new SGDVector. If other is a SparseVector then the returned vector is also a SparseVector otherwise it's a DenseVector.
      Specified by:
      add in interface SGDVector
      Parameters:
      other - The vector to add.
      Returns:
      A new SGDVector where each element value = this.get(i) + other.get(i).
    • subtract

      public SGDVector subtract(SGDVector other)
      Subtracts other from this vector, producing a new SGDVector. If other is a SparseVector then the returned vector is also a SparseVector otherwise it's a DenseVector.
      Specified by:
      subtract in interface SGDVector
      Parameters:
      other - The vector to subtract.
      Returns:
      A new SGDVector where each element value = this.get(i) - other.get(i).
    • intersectAndAddInPlace

      public void intersectAndAddInPlace(Tensor other, DoubleUnaryOperator f)
      Description copied from interface: Tensor
      Updates this Tensor by adding all the values from the intersection with other.

      The function f is applied to all values from other before the addition.

      Each value is updated as value += f(otherValue).

      Specified by:
      intersectAndAddInPlace in interface Tensor
      Parameters:
      other - The other Tensor.
      f - A function to apply.
    • hadamardProductInPlace

      public void hadamardProductInPlace(Tensor other, DoubleUnaryOperator f)
      Description copied from interface: Tensor
      Updates this Tensor with the Hadamard product (i.e., a term by term multiply) of this and other.

      The function f is applied to all values from other before the addition.

      Each value is updated as value *= f(otherValue).

      Specified by:
      hadamardProductInPlace in interface Tensor
      Parameters:
      other - The other Tensor.
      f - A function to apply.
    • foreachInPlace

      public void foreachInPlace(DoubleUnaryOperator f)
      Applies a DoubleUnaryOperator elementwise to this SGDVector.

      Only applies the function to the elements which are present.

      If you need to operate over the whole vector then densify it first.

      Specified by:
      foreachInPlace in interface Tensor
      Parameters:
      f - The function to apply.
    • foreachIndexedInPlace

      public void foreachIndexedInPlace(ToDoubleBiFunction<Integer,Double> f)
      Applies a ToDoubleBiFunction elementwise to this SGDVector.

      The first argument to the function is the index, the second argument is the current value.

      Only applies the function to the elements which are present.

      If you need to operate over the whole vector then densify it first.

      Specified by:
      foreachIndexedInPlace in interface SGDVector
      Parameters:
      f - The function to apply.
    • scale

      public SparseVector scale(double coefficient)
      Description copied from interface: SGDVector
      Generates a new vector with each element scaled by coefficient.
      Specified by:
      scale in interface SGDVector
      Parameters:
      coefficient - The coefficient to scale the elements by.
      Returns:
      A new SGDVector.
    • add

      public void add(int index, double value)
      Description copied from interface: SGDVector
      Adds value to the element at index.
      Specified by:
      add in interface SGDVector
      Parameters:
      index - The index to update.
      value - The value to add.
    • dot

      public double dot(SGDVector other)
      Description copied from interface: SGDVector
      Calculates the dot product between this vector and other.
      Specified by:
      dot in interface SGDVector
      Parameters:
      other - The other vector.
      Returns:
      The dot product.
    • outer

      public Matrix outer(SGDVector other)
      This generates the outer product when dotted with another SparseVector.

      It throws an IllegalArgumentException if used with a DenseVector.

      Specified by:
      outer in interface SGDVector
      Parameters:
      other - A vector.
      Returns:
      A DenseSparseMatrix representing the outer product.
    • sum

      public double sum()
      Description copied from interface: SGDVector
      Calculates the sum of this vector.
      Specified by:
      sum in interface SGDVector
      Returns:
      The sum.
    • twoNorm

      public double twoNorm()
      Description copied from interface: SGDVector
      Calculates the euclidean norm for this vector.
      Specified by:
      twoNorm in interface SGDVector
      Specified by:
      twoNorm in interface Tensor
      Returns:
      The euclidean norm.
    • oneNorm

      public double oneNorm()
      Description copied from interface: SGDVector
      Calculates the Manhattan norm for this vector.
      Specified by:
      oneNorm in interface SGDVector
      Returns:
      The Manhattan norm.
    • get

      public double get(int index)
      Description copied from interface: SGDVector
      Gets an element from this vector.
      Specified by:
      get in interface SGDVector
      Parameters:
      index - The index of the element.
      Returns:
      The value at that index.
    • set

      public void set(int index, double value)
      Description copied from interface: SGDVector
      Sets the index to the value.
      Specified by:
      set in interface SGDVector
      Parameters:
      index - The index to set.
      value - The value to set it to.
    • indexOfMax

      public int indexOfMax()
      Description copied from interface: SGDVector
      Returns the index of the maximum value. Requires probing the array.
      Specified by:
      indexOfMax in interface SGDVector
      Returns:
      The index of the maximum value.
    • maxValue

      public double maxValue()
      Description copied from interface: SGDVector
      Returns the maximum value. Requires probing the array.
      Specified by:
      maxValue in interface SGDVector
      Returns:
      The maximum value.
    • minValue

      public double minValue()
      Description copied from interface: SGDVector
      Returns the minimum value. Requires probing the array.
      Specified by:
      minValue in interface SGDVector
      Returns:
      The minimum value.
    • difference

      public int[] difference(SparseVector other)
      Generates an array of the indices that are active in this vector but are not present in other.
      Parameters:
      other - The vector to compare.
      Returns:
      An array of indices that are active only in this vector.
    • intersection

      public int[] intersection(SparseVector other)
      Generates an array of the indices that are active in both this vector and other
      Parameters:
      other - The vector to intersect.
      Returns:
      An array of indices that are active in both vectors.
    • normalize

      public void normalize(VectorNormalizer normalizer)
      Description copied from interface: SGDVector
      Normalizes the vector using the supplied vector normalizer.
      Specified by:
      normalize in interface SGDVector
      Parameters:
      normalizer - The kind of normalization to apply.
    • reduce

      public double reduce(double initial, DoubleUnaryOperator transform, DoubleBinaryOperator reduction)
      Description copied from interface: SGDVector
      Reduces the vector, applying the transformation to every value (including the implicit zeros) and reducing the output by applying the supplied reduction operator (where the right argument is the current reduction value, and the left argument is the transformed value). The reduction operation is seeded with the initial value.
      Specified by:
      reduce in interface SGDVector
      Parameters:
      initial - The initial value for the reduction.
      transform - The transformation operator.
      reduction - The reduction operator.
      Returns:
      The reduction of this vector.
    • euclideanDistance

      public double euclideanDistance(SGDVector other)
      Description copied from interface: SGDVector
      The l2 or euclidean distance between this vector and the other vector.
      Specified by:
      euclideanDistance in interface SGDVector
      Parameters:
      other - The other vector.
      Returns:
      The euclidean distance between them.
    • l1Distance

      public double l1Distance(SGDVector other)
      Description copied from interface: SGDVector
      The l1 or Manhattan distance between this vector and the other vector.
      Specified by:
      l1Distance in interface SGDVector
      Parameters:
      other - The other vector.
      Returns:
      The l1 distance.
    • distance

      public double distance(SGDVector other, DoubleUnaryOperator transformFunc, DoubleUnaryOperator normalizeFunc)
      Computes the distance between this vector and the other vector.
      Parameters:
      other - The other vector.
      transformFunc - The transformation function to apply to each paired dimension difference.
      normalizeFunc - The normalization to apply after summing the transformed differences.
      Returns:
      The distance between the two vectors.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • densify

      public DenseVector densify()
      Returns a dense vector copying this sparse vector.
      Returns:
      A dense copy of this vector.
    • toDenseArray

      @Deprecated public double[] toDenseArray()
      Deprecated.
      Generates a dense array copy of this SparseVector.
      Returns:
      A dense array containing this vector along with the implicit zeros.
    • toArray

      public double[] toArray()
      Description copied from interface: SGDVector
      Returns an array containing all the values in the vector (including any implicit zeros).
      Specified by:
      toArray in interface SGDVector
      Returns:
      An array copy.
    • variance

      public double variance(double mean)
      Description copied from interface: SGDVector
      Calculates the variance of this vector based on the supplied mean.
      Specified by:
      variance in interface SGDVector
      Parameters:
      mean - The mean of the vector.
      Returns:
      The variance of the vector.
    • iterator

      public VectorIterator iterator()
      Specified by:
      iterator in interface Iterable<VectorTuple>
    • transpose

      public static SparseVector[] transpose(SparseVector[] input)
      Transposes an array of sparse vectors from row-major to column-major or vice versa.
      Parameters:
      input - Input sparse vectors.
      Returns:
      A column-major array of SparseVectors.
    • transpose

      public static <T extends Output<T>> SparseVector[] transpose(Dataset<T> dataset)
      Converts a dataset of row-major examples into an array of column-major sparse vectors.
      Type Parameters:
      T - The type of the dataset.
      Parameters:
      dataset - Input dataset.
      Returns:
      A column-major array of SparseVectors.
    • transpose

      public static <T extends Output<T>> SparseVector[] transpose(Dataset<T> dataset, ImmutableFeatureMap fMap)
      Converts a dataset of row-major examples into an array of column-major sparse vectors.
      Type Parameters:
      T - The type of the dataset.
      Parameters:
      dataset - Input dataset.
      fMap - The feature map to use. If it's different to the feature map used by the dataset then behaviour is undefined.
      Returns:
      A column-major array of SparseVectors.