org.debellor.core
Class Sample

java.lang.Object
  extended by org.debellor.core.DataObject
      extended by org.debellor.core.Sample

public final class Sample
extends DataObject

Sample of data, also known as an instance/object/vector, the basic unit of data transfer between cells (see Cell.Stream.next()). Sample is composed of input data and an associated decision (output data). Samples are constant (immutable), like String objects, so you may freely share them without risk of accidental modification.

In contrast to some other data mining systems, e.g. Weka, Debellor's samples may contain various types of data and decisions, not necessarily vectors. The data and decision fields are declared as references to the base DataObject class, so it is possible to add new data types by defining new subclasses of Data. When the cell receives a sample, it usually has to downcast manually the contained Data objects to specific subclasses, as expected by this cell, in order to process the sample.

It is up to the cell which fields (data, decision) of the sample it actually uses. The cell may choose to read and/or write both, only one or none of them - this depends on the type of the cell (is it a decision system? preprocessing algorithm? etc.), its parameters (e.g., a cell could take a parameter which controls whether the processing is applied to data or decision) and whether the sample is presented at the input or generated at the output of the cell. Every cell should define a contract which specifies what type of samples is expected at the input and what type of samples is generated at the output.

If the cell wants to know in advance what type of samples will be generated by Stream.next() of input stream, it may read the Sample.SampleType from Cell.Stream.sampleType field - its value is available immediately after the stream in opened, so the cell may prepare internal structures as necessary for a given data type, e.g., arrays of appropriate length if the data will be composed of vectors.

On the other hand, before the cell starts generating output samples, it should create a sampleType object describing the samples to be produced as precisely as possible. This object should be returned from overriden Cell.onOpen(). Providing a meaningful (non-null) sampleType object is not obligatory, but in other case the usability of the cell is low, because most cells that could be connected to the given cell as consumers would fail on runtime due to unhandled type of input data.

Algorithms from Weka and Rseslib libraries operate on samples whose data field is a DataVector composed of NumericFeature or SymbolicFeature objects, while the decision is a single feature object.

Author:
Marcin Wojnarski
See Also:
Cell.Stream.next(), Cell.onNext()

Nested Class Summary
static class Sample.SampleType
          Describes common properties of all Sample objects in a given data Cell.Stream.
 
Field Summary
 DataObject data
          Input data on which data processing algorithms will primarily work.
 DataObject decision
          Decision (also known as target/decision/prediction/output value) associated with the data.
 
Constructor Summary
Sample(DataObject data, DataObject decision)
           
 
Method Summary
 boolean equals(java.lang.Object obj)
          Must be implemented by every subclass.
 int hashCode()
          Must be implemented by every subclass.
 Sample setData(DataObject data)
           
 Sample setDecision(DataObject decision)
           
 java.lang.String toString()
           
 
Methods inherited from class org.debellor.core.DataObject
asDataVector, asNumericFeature, asSymbolicFeature
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

data

public final DataObject data
Input data on which data processing algorithms will primarily work. Can be null for some or all samples in a data set. May have an associated decision.


decision

public final DataObject decision
Decision (also known as target/decision/prediction/output value) associated with the data. Either assigned by a supervisor (ground truth / target) OR predicted by a decision system (prediction / output value). Can be null for some or all samples in a data set.

Constructor Detail

Sample

public Sample(DataObject data,
              DataObject decision)
Method Detail

setData

public Sample setData(DataObject data)

setDecision

public Sample setDecision(DataObject decision)

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

equals

public boolean equals(java.lang.Object obj)
Description copied from class: DataObject
Must be implemented by every subclass.

Specified by:
equals in class DataObject

hashCode

public int hashCode()
Description copied from class: DataObject
Must be implemented by every subclass.

Specified by:
hashCode in class DataObject