Class SparseVector
- All Implemented Interfaces:
Serializable
,Iterable<VectorTuple>
,SGDVector
,Tensor
,ProtoSerializable<org.tribuo.math.protos.TensorProto>
Uses binary search to look up a specific index, so it's usually faster to use the iterator to iterate the values.
This vector has immutable indices. It cannot get new indices after construction,
and will throw IllegalArgumentException
if such an operation is tried.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Protobuf serialization version.protected final int[]
The indices array.protected final double[]
The values array.Fields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER
-
Constructor Summary
ConstructorDescriptionSparseVector
(int size, int[] indices, double value) Creates a sparse vector of the specified size, with the supplied value at each of the indices. -
Method Summary
Modifier and TypeMethodDescriptionvoid
add
(int index, double value) Addsvalue
to the element atindex
.Addsother
to this vector, producing a newSGDVector
.copy()
Returns a deep copy of this vector.static SparseVector
createSparseVector
(int dimension, int[] indices, double[] values) Defensively copies the input, and checks that the indices are sorted.static SparseVector
createSparseVector
(int dimension, Map<Integer, Double> indexMap) Builds a SparseVector from a map.static <T extends Output<T>>
SparseVectorcreateSparseVector
(Example<T> example, ImmutableFeatureMap featureInfo, boolean addBias) Builds aSparseVector
from anExample
.densify()
Returns a dense vector copying this sparse vector.static SparseVector
deserializeFromProto
(int version, String className, com.google.protobuf.Any message) Deserialization factory.int[]
difference
(SparseVector other) Generates an array of the indices that are active in this vector but are not present inother
.double
distance
(SGDVector other, DoubleUnaryOperator transformFunc, DoubleUnaryOperator normalizeFunc) Computes the distance between this vector and the other vector.double
Calculates the dot product between this vector andother
.boolean
Equals is defined mathematically, that is two SGDVectors are equal iff they have the same indices and the same values at those indices.double
euclideanDistance
(SGDVector other) The l2 or euclidean distance between this vector and the other vector.void
Applies aToDoubleBiFunction
elementwise to thisSGDVector
.void
Applies aDoubleUnaryOperator
elementwise to thisSGDVector
.double
get
(int index) Gets an element from this vector.int[]
getShape()
Returns an int array specifying the shape of thisTensor
.void
hadamardProductInPlace
(Tensor other, DoubleUnaryOperator f) Updates thisTensor
with the Hadamard product (i.e., a term by term multiply) of this andother
.int
hashCode()
int
Returns the index of the maximum value.void
intersectAndAddInPlace
(Tensor other, DoubleUnaryOperator f) Updates thisTensor
by adding all the values from the intersection withother
.int[]
intersection
(SparseVector other) Generates an array of the indices that are active in both this vector andother
iterator()
double
l1Distance
(SGDVector other) The l1 or Manhattan distance between this vector and the other vector.double
maxValue()
Returns the maximum value.double
minValue()
Returns the minimum value.void
normalize
(VectorNormalizer normalizer) Normalizes the vector using the supplied vector normalizer.int
Returns the number of non-zero elements (on construction, an element could be set to zero and it would still remain active).double
oneNorm()
Calculates the Manhattan norm for this vector.This generates the outer product when dotted with anotherSparseVector
.double
reduce
(double initial, DoubleUnaryOperator transform, DoubleBinaryOperator reduction) Reduces the vector, applying the transformation to every value (including the implicit zeros) and reducing the output by applying the supplied reduction operator (where the right argument is the current reduction value, and the left argument is the transformed value).reshape
(int[] newShape) Reshapes the Tensor to the supplied shape.scale
(double coefficient) Generates a new vector with each element scaled bycoefficient
.org.tribuo.math.protos.TensorProto
Serializes this object to a protobuf.void
set
(int index, double value) Sets theindex
to thevalue
.int
size()
Returns the dimensionality of this vector.Subtractsother
from this vector, producing a newSGDVector
.double
sum()
Calculates the sum of this vector.double[]
toArray()
Returns an array containing all the values in the vector (including any implicit zeros).double[]
Deprecated.toString()
static <T extends Output<T>>
SparseVector[]Converts a dataset of row-major examples into an array of column-major sparse vectors.static <T extends Output<T>>
SparseVector[]transpose
(Dataset<T> dataset, ImmutableFeatureMap fMap) Converts a dataset of row-major examples into an array of column-major sparse vectors.static SparseVector[]
transpose
(SparseVector[] input) Transposes an array of sparse vectors from row-major to column-major or vice versa.double
twoNorm()
Calculates the euclidean norm for this vector.double
variance
(double mean) Calculates the variance of this vector based on the supplied mean.Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
Methods inherited from interface org.tribuo.math.la.SGDVector
cosineDistance, cosineSimilarity, l2Distance, variance
Methods inherited from interface org.tribuo.math.la.Tensor
hadamardProductInPlace, intersectAndAddInPlace, scalarAddInPlace, scaleInPlace
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
indices
protected final int[] indicesThe indices array. -
values
protected final double[] valuesThe values array.
-
-
Constructor Details
-
SparseVector
public SparseVector(int size, int[] indices, double value) Creates a sparse vector of the specified size, with the supplied value at each of the indices.- Parameters:
size
- The vector size.indices
- The active indices.value
- The value for those indices.
-
-
Method Details
-
createSparseVector
public static <T extends Output<T>> SparseVector createSparseVector(Example<T> example, ImmutableFeatureMap featureInfo, boolean addBias) Builds aSparseVector
from anExample
.Used in training and inference.
Throws
IllegalArgumentException
if the Example contains NaN-valued features.- Type Parameters:
T
- The type parameter of theexample
.- Parameters:
example
- The example to convert.featureInfo
- The feature information, used to calculate the dimension of this SparseVector.addBias
- Add a bias feature.- Returns:
- A SparseVector representing the example's features.
-
createSparseVector
Defensively copies the input, and checks that the indices are sorted. If not, it sorts them.Throws
IllegalArgumentException
if the arrays are not the same length, or if size is less than the max index.- Parameters:
dimension
- The dimension of this vector.indices
- The indices of the non-zero elements.values
- The values of the non-zero elements.- Returns:
- A SparseVector encapsulating the indices and values.
-
createSparseVector
Builds a SparseVector from a map.Throws
IllegalArgumentException
if dimension is less than the max index.- Parameters:
dimension
- The dimension of this vector.indexMap
- The map from indices to values.- Returns:
- A SparseVector.
-
deserializeFromProto
public static SparseVector deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version
- The serialized object version.className
- The class name.message
- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException
- If the protobuf could not be parsed from themessage
.
-
serialize
public org.tribuo.math.protos.TensorProto serialize()Description copied from interface:ProtoSerializable
Serializes this object to a protobuf.- Specified by:
serialize
in interfaceProtoSerializable<org.tribuo.math.protos.TensorProto>
- Returns:
- The protobuf.
-
copy
Description copied from interface:SGDVector
Returns a deep copy of this vector. -
getShape
public int[] getShape()Description copied from interface:Tensor
Returns an int array specifying the shape of thisTensor
. -
reshape
Description copied from interface:Tensor
Reshapes the Tensor to the supplied shape. ThrowsIllegalArgumentException
if the shape isn't compatible. -
size
public int size()Description copied from interface:SGDVector
Returns the dimensionality of this vector. -
numActiveElements
public int numActiveElements()Description copied from interface:SGDVector
Returns the number of non-zero elements (on construction, an element could be set to zero and it would still remain active).- Specified by:
numActiveElements
in interfaceSGDVector
- Returns:
- The number of non-zero elements.
-
equals
Equals is defined mathematically, that is two SGDVectors are equal iff they have the same indices and the same values at those indices. -
hashCode
public int hashCode() -
add
Addsother
to this vector, producing a newSGDVector
. Ifother
is aSparseVector
then the returned vector is also aSparseVector
otherwise it's aDenseVector
. -
subtract
Subtractsother
from this vector, producing a newSGDVector
. Ifother
is aSparseVector
then the returned vector is also aSparseVector
otherwise it's aDenseVector
. -
intersectAndAddInPlace
Description copied from interface:Tensor
Updates thisTensor
by adding all the values from the intersection withother
.The function
f
is applied to all values fromother
before the addition.Each value is updated as value += f(otherValue).
- Specified by:
intersectAndAddInPlace
in interfaceTensor
- Parameters:
other
- The otherTensor
.f
- A function to apply.
-
hadamardProductInPlace
Description copied from interface:Tensor
Updates thisTensor
with the Hadamard product (i.e., a term by term multiply) of this andother
.The function
f
is applied to all values fromother
before the addition.Each value is updated as value *= f(otherValue).
- Specified by:
hadamardProductInPlace
in interfaceTensor
- Parameters:
other
- The otherTensor
.f
- A function to apply.
-
foreachInPlace
Applies aDoubleUnaryOperator
elementwise to thisSGDVector
.Only applies the function to the elements which are present.
If you need to operate over the whole vector then densify it first.
- Specified by:
foreachInPlace
in interfaceTensor
- Parameters:
f
- The function to apply.
-
foreachIndexedInPlace
Applies aToDoubleBiFunction
elementwise to thisSGDVector
.The first argument to the function is the index, the second argument is the current value.
Only applies the function to the elements which are present.
If you need to operate over the whole vector then densify it first.
- Specified by:
foreachIndexedInPlace
in interfaceSGDVector
- Parameters:
f
- The function to apply.
-
scale
Description copied from interface:SGDVector
Generates a new vector with each element scaled bycoefficient
. -
add
public void add(int index, double value) Description copied from interface:SGDVector
Addsvalue
to the element atindex
. -
dot
Description copied from interface:SGDVector
Calculates the dot product between this vector andother
. -
outer
This generates the outer product when dotted with anotherSparseVector
.It throws an
IllegalArgumentException
if used with aDenseVector
.- Specified by:
outer
in interfaceSGDVector
- Parameters:
other
- A vector.- Returns:
- A
DenseSparseMatrix
representing the outer product.
-
sum
public double sum()Description copied from interface:SGDVector
Calculates the sum of this vector. -
twoNorm
public double twoNorm()Description copied from interface:SGDVector
Calculates the euclidean norm for this vector. -
oneNorm
public double oneNorm()Description copied from interface:SGDVector
Calculates the Manhattan norm for this vector. -
get
public double get(int index) Description copied from interface:SGDVector
Gets an element from this vector. -
set
public void set(int index, double value) Description copied from interface:SGDVector
Sets theindex
to thevalue
. -
indexOfMax
public int indexOfMax()Description copied from interface:SGDVector
Returns the index of the maximum value. Requires probing the array.- Specified by:
indexOfMax
in interfaceSGDVector
- Returns:
- The index of the maximum value.
-
maxValue
public double maxValue()Description copied from interface:SGDVector
Returns the maximum value. Requires probing the array. -
minValue
public double minValue()Description copied from interface:SGDVector
Returns the minimum value. Requires probing the array. -
difference
Generates an array of the indices that are active in this vector but are not present inother
.- Parameters:
other
- The vector to compare.- Returns:
- An array of indices that are active only in this vector.
-
intersection
Generates an array of the indices that are active in both this vector andother
- Parameters:
other
- The vector to intersect.- Returns:
- An array of indices that are active in both vectors.
-
normalize
Description copied from interface:SGDVector
Normalizes the vector using the supplied vector normalizer. -
reduce
Description copied from interface:SGDVector
Reduces the vector, applying the transformation to every value (including the implicit zeros) and reducing the output by applying the supplied reduction operator (where the right argument is the current reduction value, and the left argument is the transformed value). The reduction operation is seeded with the initial value. -
euclideanDistance
Description copied from interface:SGDVector
The l2 or euclidean distance between this vector and the other vector.- Specified by:
euclideanDistance
in interfaceSGDVector
- Parameters:
other
- The other vector.- Returns:
- The euclidean distance between them.
-
l1Distance
Description copied from interface:SGDVector
The l1 or Manhattan distance between this vector and the other vector.- Specified by:
l1Distance
in interfaceSGDVector
- Parameters:
other
- The other vector.- Returns:
- The l1 distance.
-
distance
public double distance(SGDVector other, DoubleUnaryOperator transformFunc, DoubleUnaryOperator normalizeFunc) Computes the distance between this vector and the other vector.- Parameters:
other
- The other vector.transformFunc
- The transformation function to apply to each paired dimension difference.normalizeFunc
- The normalization to apply after summing the transformed differences.- Returns:
- The distance between the two vectors.
-
toString
-
densify
Returns a dense vector copying this sparse vector.- Returns:
- A dense copy of this vector.
-
toDenseArray
Deprecated.Generates a dense array copy of this SparseVector.- Returns:
- A dense array containing this vector along with the implicit zeros.
-
toArray
public double[] toArray()Description copied from interface:SGDVector
Returns an array containing all the values in the vector (including any implicit zeros). -
variance
public double variance(double mean) Description copied from interface:SGDVector
Calculates the variance of this vector based on the supplied mean. -
iterator
- Specified by:
iterator
in interfaceIterable<VectorTuple>
-
transpose
Transposes an array of sparse vectors from row-major to column-major or vice versa.- Parameters:
input
- Input sparse vectors.- Returns:
- A column-major array of SparseVectors.
-
transpose
Converts a dataset of row-major examples into an array of column-major sparse vectors.- Type Parameters:
T
- The type of the dataset.- Parameters:
dataset
- Input dataset.- Returns:
- A column-major array of SparseVectors.
-
transpose
public static <T extends Output<T>> SparseVector[] transpose(Dataset<T> dataset, ImmutableFeatureMap fMap) Converts a dataset of row-major examples into an array of column-major sparse vectors.- Type Parameters:
T
- The type of the dataset.- Parameters:
dataset
- Input dataset.fMap
- The feature map to use. If it's different to the feature map used by the dataset then behaviour is undefined.- Returns:
- A column-major array of SparseVectors.
-