java.lang.Object

org.tribuo.math.la.SparseVector

All Implemented Interfaces:: Serializable, Iterable<VectorTuple>, SGDVector, Tensor, ProtoSerializable<org.tribuo.math.protos.TensorProto>

public class SparseVector extends Object implements SGDVector

A sparse vector. Stored as a sorted array of indices and an array of values.

Uses binary search to look up a specific index, so it's usually faster to use the iterator to iterate the values.

This vector has immutable indices. It cannot get new indices after construction, and will throw IllegalArgumentException if such an operation is tried.

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

static final int

CURRENT_VERSION

Protobuf serialization version.

protected final int[]

indices

The indices array.

protected final double[]

values

The values array.

Fields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER
Constructor Summary

Constructors

Constructor

Description

SparseVector(int size, int[] indices, double value)

Creates a sparse vector of the specified size, with the supplied value at each of the indices.
Method Summary

Modifier and Type

Method

Description

void

add(int index, double value)

Adds value to the element at index.

SGDVector

add(SGDVector other)

Adds other to this vector, producing a new SGDVector.

SparseVector

copy()

Returns a deep copy of this vector.

static SparseVector

createSparseVector(int dimension, int[] indices, double[] values)

Defensively copies the input, and checks that the indices are sorted.

static SparseVector

createSparseVector(int dimension, Map<Integer,Double> indexMap)

Builds a SparseVector from a map.

static <T extends Output<T>> SparseVector

createSparseVector(Example<T> example, ImmutableFeatureMap featureInfo, boolean addBias)

Builds a SparseVector from an Example.

DenseVector

densify()

Returns a dense vector copying this sparse vector.

static SparseVector

deserializeFromProto(int version, String className, com.google.protobuf.Any message)

Deserialization factory.

int[]

difference(SparseVector other)

Generates an array of the indices that are active in this vector but are not present in other.

double

distance(SGDVector other, DoubleUnaryOperator transformFunc, DoubleUnaryOperator normalizeFunc)

Computes the distance between this vector and the other vector.

double

dot(SGDVector other)

Calculates the dot product between this vector and other.

boolean

equals(Object other)

Equals is defined mathematically, that is two SGDVectors are equal iff they have the same indices and the same values at those indices.

double

euclideanDistance(SGDVector other)

The l2 or euclidean distance between this vector and the other vector.

void

foreachIndexedInPlace(ToDoubleBiFunction<Integer,Double> f)

Applies a ToDoubleBiFunction elementwise to this SGDVector.

void

foreachInPlace(DoubleUnaryOperator f)

Applies a DoubleUnaryOperator elementwise to this SGDVector.

double

get(int index)

Gets an element from this vector.

int[]

getShape()

Returns an int array specifying the shape of this Tensor.

void

hadamardProductInPlace(Tensor other, DoubleUnaryOperator f)

Updates this Tensor with the Hadamard product (i.e., a term by term multiply) of this and other.

int

hashCode()

int

indexOfMax()

Returns the index of the maximum value.

void

intersectAndAddInPlace(Tensor other, DoubleUnaryOperator f)

Updates this Tensor by adding all the values from the intersection with other.

int[]

intersection(SparseVector other)

Generates an array of the indices that are active in both this vector and other

VectorIterator

iterator()

double

l1Distance(SGDVector other)

The l1 or Manhattan distance between this vector and the other vector.

double

maxValue()

Returns the maximum value.

double

minValue()

Returns the minimum value.

void

normalize(VectorNormalizer normalizer)

Normalizes the vector using the supplied vector normalizer.

int

numActiveElements()

Returns the number of non-zero elements (on construction, an element could be set to zero and it would still remain active).

double

oneNorm()

Calculates the Manhattan norm for this vector.

Matrix

outer(SGDVector other)

This generates the outer product when dotted with another SparseVector.

double

reduce(double initial, DoubleUnaryOperator transform, DoubleBinaryOperator reduction)

Reduces the vector, applying the transformation to every value (including the implicit zeros) and reducing the output by applying the supplied reduction operator (where the right argument is the current reduction value, and the left argument is the transformed value).

Tensor

reshape(int[] newShape)

Reshapes the Tensor to the supplied shape.

SparseVector

scale(double coefficient)

Generates a new vector with each element scaled by coefficient.

org.tribuo.math.protos.TensorProto

serialize()

Serializes this object to a protobuf.

void

set(int index, double value)

Sets the index to the value.

int

size()

Returns the dimensionality of this vector.

SGDVector

subtract(SGDVector other)

Subtracts other from this vector, producing a new SGDVector.

double

sum()

Calculates the sum of this vector.

double[]

toArray()

Returns an array containing all the values in the vector (including any implicit zeros).

double[]

toDenseArray()

Deprecated.

String

toString()

static <T extends Output<T>> SparseVector[]

transpose(Dataset<T> dataset)

Converts a dataset of row-major examples into an array of column-major sparse vectors.

static <T extends Output<T>> SparseVector[]

transpose(Dataset<T> dataset, ImmutableFeatureMap fMap)

Converts a dataset of row-major examples into an array of column-major sparse vectors.

static SparseVector[]

transpose(SparseVector[] input)

Transposes an array of sparse vectors from row-major to column-major or vice versa.

double

twoNorm()

Calculates the euclidean norm for this vector.

double

variance(double mean)

Calculates the variance of this vector based on the supplied mean.

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface java.lang.Iterable
forEach, spliterator

Methods inherited from interface org.tribuo.math.la.SGDVector
cosineDistance, cosineSimilarity, l2Distance, variance

Methods inherited from interface org.tribuo.math.la.Tensor
hadamardProductInPlace, intersectAndAddInPlace, scalarAddInPlace, scaleInPlace

Field Details
- CURRENT_VERSION
  public static final int CURRENT_VERSION
  
  Protobuf serialization version.
  
  See Also:
  
  Constant Field Values
- indices
  
  protected final int[] indices
  
  The indices array.
- values
  
  protected final double[] values
  
  The values array.
Constructor Details
- SparseVector
  
  public SparseVector(int size, int[] indices, double value)
  
  Creates a sparse vector of the specified size, with the supplied value at each of the indices.
  
  Parameters:
  
  size - The vector size.
  
  indices - The active indices.
  
  value - The value for those indices.
Method Details
- createSparseVector
  
  public static <T extends Output<T>> SparseVector createSparseVector(Example<T> example, ImmutableFeatureMap featureInfo, boolean addBias)
  
  Builds a SparseVector from an Example.
  Used in training and inference.
  Throws IllegalArgumentException if the Example contains NaN-valued features.
  
  Type Parameters:
  
  T - The type parameter of the example.
  
  Parameters:
  
  example - The example to convert.
  
  featureInfo - The feature information, used to calculate the dimension of this SparseVector.
  
  addBias - Add a bias feature.
  
  Returns:
  
  A SparseVector representing the example's features.
- createSparseVector
  
  public static SparseVector createSparseVector(int dimension, int[] indices, double[] values)
  
  Defensively copies the input, and checks that the indices are sorted. If not, it sorts them.
  Throws IllegalArgumentException if the arrays are not the same length, or if size is less than the max index.
  
  Parameters:
  
  dimension - The dimension of this vector.
  
  indices - The indices of the non-zero elements.
  
  values - The values of the non-zero elements.
  
  Returns:
  
  A SparseVector encapsulating the indices and values.
- createSparseVector
  
  public static SparseVector createSparseVector(int dimension, Map<Integer,Double> indexMap)
  
  Builds a SparseVector from a map.
  Throws IllegalArgumentException if dimension is less than the max index.
  
  Parameters:
  
  dimension - The dimension of this vector.
  
  indexMap - The map from indices to values.
  
  Returns:
  
  A SparseVector.
- deserializeFromProto
  
  public static SparseVector deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
  
  Deserialization factory.
  
  Parameters:
  
  version - The serialized object version.
  
  className - The class name.
  
  message - The serialized data.
  
  Returns:
  
  The deserialized object.
  
  Throws:
  
  com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
- serialize
  
  public org.tribuo.math.protos.TensorProto serialize()
  
  Description copied from interface: ProtoSerializable
  
  Serializes this object to a protobuf.
  
  Specified by:
  
  serialize in interface ProtoSerializable<org.tribuo.math.protos.TensorProto>
  
  Returns:
  
  The protobuf.
- copy
  
  public SparseVector copy()
  
  Description copied from interface: SGDVector
  
  Returns a deep copy of this vector.
  
  Specified by:
  
  copy in interface SGDVector
  
  Specified by:
  
  copy in interface Tensor
  
  Returns:
  
  A copy of this vector.
- getShape
  
  public int[] getShape()
  
  Description copied from interface: Tensor
  
  Returns an int array specifying the shape of this Tensor.
  
  Specified by:
  
  getShape in interface Tensor
  
  Returns:
  
  An int array.
- reshape
  
  public Tensor reshape(int[] newShape)
  
  Description copied from interface: Tensor
  
  Reshapes the Tensor to the supplied shape. Throws IllegalArgumentException if the shape isn't compatible.
  
  Specified by:
  
  reshape in interface Tensor
  
  Parameters:
  
  newShape - The desired shape.
  
  Returns:
  
  A Tensor of the desired shape.
- size
  
  public int size()
  
  Description copied from interface: SGDVector
  
  Returns the dimensionality of this vector.
  
  Specified by:
  
  size in interface SGDVector
  
  Returns:
  
  The dimensionality of the vector.
- numActiveElements
  
  public int numActiveElements()
  
  Description copied from interface: SGDVector
  
  Returns the number of non-zero elements (on construction, an element could be set to zero and it would still remain active).
  
  Specified by:
  
  numActiveElements in interface SGDVector
  
  Returns:
  
  The number of non-zero elements.
- equals
  
  public boolean equals(Object other)
  
  Equals is defined mathematically, that is two SGDVectors are equal iff they have the same indices and the same values at those indices.
  
  Overrides:
  
  equals in class Object
  
  Parameters:
  
  other - Object to compare against.
  
  Returns:
  
  True if this vector and the other vector contain the same values in the same order.
- hashCode
  
  public int hashCode()
  
  Overrides:
  
  hashCode in class Object
- add
  
  public SGDVector add(SGDVector other)
  
  Adds other to this vector, producing a new SGDVector. If other is a SparseVector then the returned vector is also a SparseVector otherwise it's a DenseVector.
  
  Specified by:
  
  add in interface SGDVector
  
  Parameters:
  
  other - The vector to add.
  
  Returns:
  
  A new SGDVector where each element value = this.get(i) + other.get(i).
- subtract
  
  public SGDVector subtract(SGDVector other)
  
  Subtracts other from this vector, producing a new SGDVector. If other is a SparseVector then the returned vector is also a SparseVector otherwise it's a DenseVector.
  
  Specified by:
  
  subtract in interface SGDVector
  
  Parameters:
  
  other - The vector to subtract.
  
  Returns:
  
  A new SGDVector where each element value = this.get(i) - other.get(i).
- intersectAndAddInPlace
  
  public void intersectAndAddInPlace(Tensor other, DoubleUnaryOperator f)
  
  Description copied from interface: Tensor
  
  Updates this Tensor by adding all the values from the intersection with other.
  The function f is applied to all values from other before the addition.
  Each value is updated as value += f(otherValue).
  
  Specified by:
  
  intersectAndAddInPlace in interface Tensor
  
  Parameters:
  
  other - The other Tensor.
  
  f - A function to apply.
- hadamardProductInPlace
  
  public void hadamardProductInPlace(Tensor other, DoubleUnaryOperator f)
  
  Description copied from interface: Tensor
  
  Updates this Tensor with the Hadamard product (i.e., a term by term multiply) of this and other.
  The function f is applied to all values from other before the addition.
  Each value is updated as value *= f(otherValue).
  
  Specified by:
  
  hadamardProductInPlace in interface Tensor
  
  Parameters:
  
  other - The other Tensor.
  
  f - A function to apply.
- foreachInPlace
  
  public void foreachInPlace(DoubleUnaryOperator f)
  
  Applies a DoubleUnaryOperator elementwise to this SGDVector.
  Only applies the function to the elements which are present.
  If you need to operate over the whole vector then densify it first.
  
  Specified by:
  
  foreachInPlace in interface Tensor
  
  Parameters:
  
  f - The function to apply.
- foreachIndexedInPlace
  
  public void foreachIndexedInPlace(ToDoubleBiFunction<Integer,Double> f)
  
  Applies a ToDoubleBiFunction elementwise to this SGDVector.
  The first argument to the function is the index, the second argument is the current value.
  Only applies the function to the elements which are present.
  If you need to operate over the whole vector then densify it first.
  
  Specified by:
  
  foreachIndexedInPlace in interface SGDVector
  
  Parameters:
  
  f - The function to apply.
- scale
  
  public SparseVector scale(double coefficient)
  
  Description copied from interface: SGDVector
  
  Generates a new vector with each element scaled by coefficient.
  
  Specified by:
  
  scale in interface SGDVector
  
  Parameters:
  
  coefficient - The coefficient to scale the elements by.
  
  Returns:
  
  A new SGDVector.
- add
  
  public void add(int index, double value)
  
  Description copied from interface: SGDVector
  
  Adds value to the element at index.
  
  Specified by:
  
  add in interface SGDVector
  
  Parameters:
  
  index - The index to update.
  
  value - The value to add.
- dot
  
  public double dot(SGDVector other)
  
  Description copied from interface: SGDVector
  
  Calculates the dot product between this vector and other.
  
  Specified by:
  
  dot in interface SGDVector
  
  Parameters:
  
  other - The other vector.
  
  Returns:
  
  The dot product.
- outer
  
  public Matrix outer(SGDVector other)
  
  This generates the outer product when dotted with another SparseVector.
  It throws an IllegalArgumentException if used with a DenseVector.
  
  Specified by:
  
  outer in interface SGDVector
  
  Parameters:
  
  other - A vector.
  
  Returns:
  
  A DenseSparseMatrix representing the outer product.
- sum
  
  public double sum()
  
  Description copied from interface: SGDVector
  
  Calculates the sum of this vector.
  
  Specified by:
  
  sum in interface SGDVector
  
  Returns:
  
  The sum.
- twoNorm
  
  public double twoNorm()
  
  Description copied from interface: SGDVector
  
  Calculates the euclidean norm for this vector.
  
  Specified by:
  
  twoNorm in interface SGDVector
  
  Specified by:
  
  twoNorm in interface Tensor
  
  Returns:
  
  The euclidean norm.
- oneNorm
  
  public double oneNorm()
  
  Description copied from interface: SGDVector
  
  Calculates the Manhattan norm for this vector.
  
  Specified by:
  
  oneNorm in interface SGDVector
  
  Returns:
  
  The Manhattan norm.
- get
  
  public double get(int index)
  
  Description copied from interface: SGDVector
  
  Gets an element from this vector.
  
  Specified by:
  
  get in interface SGDVector
  
  Parameters:
  
  index - The index of the element.
  
  Returns:
  
  The value at that index.
- set
  
  public void set(int index, double value)
  
  Description copied from interface: SGDVector
  
  Sets the index to the value.
  
  Specified by:
  
  set in interface SGDVector
  
  Parameters:
  
  index - The index to set.
  
  value - The value to set it to.
- indexOfMax
  
  public int indexOfMax()
  
  Description copied from interface: SGDVector
  
  Returns the index of the maximum value. Requires probing the array.
  
  Specified by:
  
  indexOfMax in interface SGDVector
  
  Returns:
  
  The index of the maximum value.
- maxValue
  
  public double maxValue()
  
  Description copied from interface: SGDVector
  
  Returns the maximum value. Requires probing the array.
  
  Specified by:
  
  maxValue in interface SGDVector
  
  Returns:
  
  The maximum value.
- minValue
  
  public double minValue()
  
  Description copied from interface: SGDVector
  
  Returns the minimum value. Requires probing the array.
  
  Specified by:
  
  minValue in interface SGDVector
  
  Returns:
  
  The minimum value.
- difference
  
  public int[] difference(SparseVector other)
  
  Generates an array of the indices that are active in this vector but are not present in other.
  
  Parameters:
  
  other - The vector to compare.
  
  Returns:
  
  An array of indices that are active only in this vector.
- intersection
  
  public int[] intersection(SparseVector other)
  
  Generates an array of the indices that are active in both this vector and other
  
  Parameters:
  
  other - The vector to intersect.
  
  Returns:
  
  An array of indices that are active in both vectors.
- normalize
  
  public void normalize(VectorNormalizer normalizer)
  
  Description copied from interface: SGDVector
  
  Normalizes the vector using the supplied vector normalizer.
  
  Specified by:
  
  normalize in interface SGDVector
  
  Parameters:
  
  normalizer - The kind of normalization to apply.
- reduce
  
  public double reduce(double initial, DoubleUnaryOperator transform, DoubleBinaryOperator reduction)
  
  Description copied from interface: SGDVector
  
  Reduces the vector, applying the transformation to every value (including the implicit zeros) and reducing the output by applying the supplied reduction operator (where the right argument is the current reduction value, and the left argument is the transformed value). The reduction operation is seeded with the initial value.
  
  Specified by:
  
  reduce in interface SGDVector
  
  Parameters:
  
  initial - The initial value for the reduction.
  
  transform - The transformation operator.
  
  reduction - The reduction operator.
  
  Returns:
  
  The reduction of this vector.
- euclideanDistance
  
  public double euclideanDistance(SGDVector other)
  
  Description copied from interface: SGDVector
  
  The l2 or euclidean distance between this vector and the other vector.
  
  Specified by:
  
  euclideanDistance in interface SGDVector
  
  Parameters:
  
  other - The other vector.
  
  Returns:
  
  The euclidean distance between them.
- l1Distance
  
  public double l1Distance(SGDVector other)
  
  Description copied from interface: SGDVector
  
  The l1 or Manhattan distance between this vector and the other vector.
  
  Specified by:
  
  l1Distance in interface SGDVector
  
  Parameters:
  
  other - The other vector.
  
  Returns:
  
  The l1 distance.
- distance
  
  public double distance(SGDVector other, DoubleUnaryOperator transformFunc, DoubleUnaryOperator normalizeFunc)
  
  Computes the distance between this vector and the other vector.
  
  Parameters:
  
  other - The other vector.
  
  transformFunc - The transformation function to apply to each paired dimension difference.
  
  normalizeFunc - The normalization to apply after summing the transformed differences.
  
  Returns:
  
  The distance between the two vectors.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- densify
  
  public DenseVector densify()
  
  Returns a dense vector copying this sparse vector.
  
  Returns:
  
  A dense copy of this vector.
- toDenseArray
  
  @Deprecated public double[] toDenseArray()
  
  Deprecated.
  
  Generates a dense array copy of this SparseVector.
  
  Returns:
  
  A dense array containing this vector along with the implicit zeros.
- toArray
  
  public double[] toArray()
  
  Description copied from interface: SGDVector
  
  Returns an array containing all the values in the vector (including any implicit zeros).
  
  Specified by:
  
  toArray in interface SGDVector
  
  Returns:
  
  An array copy.
- variance
  
  public double variance(double mean)
  
  Description copied from interface: SGDVector
  
  Calculates the variance of this vector based on the supplied mean.
  
  Specified by:
  
  variance in interface SGDVector
  
  Parameters:
  
  mean - The mean of the vector.
  
  Returns:
  
  The variance of the vector.
- iterator
  
  public VectorIterator iterator()
  
  Specified by:
  
  iterator in interface Iterable<VectorTuple>
- transpose
  
  public static SparseVector[] transpose(SparseVector[] input)
  
  Transposes an array of sparse vectors from row-major to column-major or vice versa.
  
  Parameters:
  
  input - Input sparse vectors.
  
  Returns:
  
  A column-major array of SparseVectors.
- transpose
  
  public static <T extends Output<T>> SparseVector[] transpose(Dataset<T> dataset)
  
  Converts a dataset of row-major examples into an array of column-major sparse vectors.
  
  Type Parameters:
  
  T - The type of the dataset.
  
  Parameters:
  
  dataset - Input dataset.
  
  Returns:
  
  A column-major array of SparseVectors.
- transpose
  
  public static <T extends Output<T>> SparseVector[] transpose(Dataset<T> dataset, ImmutableFeatureMap fMap)
  
  Converts a dataset of row-major examples into an array of column-major sparse vectors.
  
  Type Parameters:
  
  T - The type of the dataset.
  
  Parameters:
  
  dataset - Input dataset.
  
  fMap - The feature map to use. If it's different to the feature map used by the dataset then behaviour is undefined.
  
  Returns:
  
  A column-major array of SparseVectors.

Class SparseVector

Field Summary

Fields inherited from interface org.tribuo.protos.ProtoSerializable

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.lang.Iterable

Methods inherited from interface org.tribuo.math.la.SGDVector

Methods inherited from interface org.tribuo.math.la.Tensor

Field Details

CURRENT_VERSION

indices

values

Constructor Details

SparseVector

Method Details

createSparseVector

createSparseVector

createSparseVector

deserializeFromProto

serialize

copy

getShape

reshape

size

numActiveElements

equals

hashCode

add

subtract

intersectAndAddInPlace

hadamardProductInPlace

foreachInPlace

foreachIndexedInPlace

scale

add

dot

outer

sum

twoNorm

oneNorm

get

set

indexOfMax

maxValue

minValue

difference

intersection

normalize

reduce

euclideanDistance

l1Distance

distance

toString

densify

toDenseArray

toArray

variance

iterator

transpose

transpose

transpose