org.debellor.core
Class Cell

java.lang.Object
  extended by org.debellor.core.Cell
Direct Known Subclasses:
ArffReader, BatchOfSamples, Buffer, Distortion, EvaluatorCell, FlattenVectors, KMeans, RseslibClassifier, SingleSample, TimeWindows, WekaClassifier, WekaFilter

public class Cell
extends java.lang.Object

Base class for all Data Processing Cells - elementary units which realize all kinds of data processing in Debellor. Basic properties of cells:

  1. Cell may be a data producer - generates stream of samples when requested. When the producer is generating samples it is called to be operating.
  2. Cell may be a data consumer - needs input data for operating or learning. The data are pulled from a producer that must be connected to the consumer beforehand. When a producer and consumer are connected together, they are called source and receiver cell, respectively.
  3. Cell may be trainable - must learn on supplied training data before it can start operating. Non-trainable cells are called fixed - they can operate immediately after construction.
  4. Cell may be parameterized - exposes a number of parameters that can be set by the user to control its behavior during learning or operating.

It is not obligatory for a cell to implement all of these functionalities. What exactly is implemented depends on the type of the cell. See next sections for more details.

Data in Debellor are represented and transfered as a Cell.Stream of Samples, accompanied by a Sample.SampleType which defines the structure of samples in a given stream.

Guidelines for using cells

Cell may be a data producer. In this case, to retrieve samples generated by the cell you must open() communication stream of the cell and retrieve consecutive samples using method Cell.Stream.next() of the returned Cell.Stream object or method next() of the cell. At the end, you must close the stream with Cell.Stream.close() of the stream or close() of the cell.

Cell may be a data consumer. In this case, before the cell can be used, it must be informed where to find input data, by calling setSource(Cell) with the source cell as an argument. In this way, the two cells become connected. A number of interconnected cells form a graph called Data Processing Network (DPN). DPN can be executed as a whole by calling method learn or open/next/close on the last cell.

Cell may be trainable: before its use it must be trained how to process data. Training of a cell consists typically of connecting it with a source of training data (setSource(Cell)), setting parameter values (setParameters(Parameters) or set(String, String) or its variants for int/double/boolean value types) and invoking its learning procedure, learn(). The learning procedure and the list of available parameters are specific to the Cell's subclass actually used.

The term "learning procedure" as used in Debellor has very wide meaning. It includes not only generation of a decision system that could be used subsequently for data processing, but also any other data-driven operation that only accumulates some information (knowledge) internally in the cell and does not generate a stream of output samples. For example, learning procedure may implement:

Knowledge gained by the cell during learning can be erased by call to erase(). After erasure the cell can be trained again.

Cell may be parameterized. In this case, you may call setParameters(Parameters) or set(String, String) to pass parameter values before training or using the cell. To find out what parameters are exposed by the cell, call getAvailableParams(). Usually, if you do not pass some required parameter, its default value will be used - it depends, however, on implementation of the particular Cell subclass.

DPN can be executed concurrently in several threads. To declare that some part of DPN should be executed in a separate thread, before execution call newThread() on the cell of DPN which should lie on the boundary between threads. Then, this cell and all its preceding (source) cells will run in a separate thread. Call newThread() on several cells to create more than 2 threads.

Guidelines for writing new cells

To implement new data processing algorithm, you have to write a subclass of Cell and override some or all of protected methods named "on...": onLearn(), onOpen(), onNext(), onClose(), onErase(). They are called during calls to similarly named public methods (learn, open, ...). If you do not need some method, leave its default implementation, which will throw exception when called. Do not call super in overriders. You can also override public method toString(). Other methods cannot be overridden.

If your cell represents a decision system (classifier, clusterer etc.), the most important methods will be onLearn() and onNext(). Training algorithm of the decision system will be implemented in onLearn(), while onNext() will perform application of the trained system to the next input sample. You will also have to override onOpen() and onClose() to open and close input stream before and after calls to onNext(). Optionally, you may also override onErase(). to erase trained decision model without deallocation of the whole cell.

In your implementation of on...() methods, input data can be accessed through the call to openInputStream(), which opens input stream and returns it to the caller. The caller may retrieve consecutive samples with Cell.Stream.next(). At the end, the caller must invoke Cell.Stream.close() to close the stream. It is possible to call openInputStream() again after the stream has been closed.

If the cell is not trainable (e.g., it implements data reading from file), you must inform the base class about this fact through the call to Cell(boolean) with argument false in the cell's constructor.

In subclasses you can use:

Public methods should not be called in subclasses. They are reserved for clients.

Author:
Marcin Wojnarski

Nested Class Summary
static class Cell.CellMethod
          IDs of Cell's methods, used for error reporting.
static class Cell.State
          Constants for defining state of the cell.
static class Cell.Stream
          Represents a stream of samples flowing between two cells.
 
Field Summary
protected  Parameters parameters
          Values of parameters passed from the client, to be used in learn() and/or during activity of the cell (open(), Cell.Stream.next(), Cell.Stream.close()).
protected  java.util.Random random
          Random number generator that should be used by the subclass instead of the global one, so that the user can control (non)determinism of the cell's behavior.
 
Constructor Summary
protected Cell()
          Statistics; currently switched off.
protected Cell(boolean isTrainable)
           
 
Method Summary
 void close()
          Closes the currently open output stream.
 void erase()
          Erases the content of the cell created by learn() and brings the cell from Cell.State.CLOSED to Cell.State.EMPTY state.
 ParametersInfo getAvailableParams()
          Returns all available parameters with their default values and description of their meaning.
 Parameters getParameters()
          Returns currently set parameters of the cell.
 void learn()
          Invokes cell's learning procedure onLearn().
 void newThread()
          Sets that method Cell.Stream.next() of this cell will run in a separate thread.
 void newThread(boolean b)
          Sets whether method Cell.Stream.next() of this cell will run in a separate thread.
 Sample next()
          Returns next sample from the currently open output stream.
protected  void onClose()
          Called by Cell.Stream.close().
protected  void onErase()
          Called by erase().
protected  void onLearn()
          Learning procedure of the cell.
protected  Sample onNext()
          Called by Cell.Stream.next().
protected  Sample.SampleType onOpen()
          Called by open().
 Cell.Stream open()
          Opens the stream of samples generated by this cell.
protected  Cell.Stream openInputStream()
          To be used by the subclass in order to open input stream from the source cell.
 void set(java.lang.String name, boolean b)
          Sets value of the parameter to string representation of boolean b.
 void set(java.lang.String name, double x)
          Sets value of the parameter to string representation of real number x.
 void set(java.lang.String name, int k)
          Sets value of the parameter to string representation of integer k.
 void set(java.lang.String name, java.lang.String value)
          Sets value of a single parameter of the cell.
protected  void setAvailableParams(ParametersInfo availParams)
          To be used by the subclass in order to set information about parameters (names, default values, description) that will be returned to the user by method getAvailableParams().
 void setParameters(Parameters parameters)
          Sets parameters of this cell, to be used by the cell during learning or data processing.
 void setRandomSeed(long seed)
          Sets seed of the random number generator used by this cell.
 void setSource(Cell source)
          Connects this cell with another cell that will serve as a source of input data when this cell starts learning or operating.
 Cell.State state()
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

parameters

protected Parameters parameters
Values of parameters passed from the client, to be used in learn() and/or during activity of the cell (open(), Cell.Stream.next(), Cell.Stream.close()). Should be used directly in the subclass to read parameter values.

The subclass may freely modify this object. Modifications will not be seen outside - getParameters() will return initial parameter values. Moreover, when learn() (for trainable cell) or open() (for fixed cell) is called again, the initial values of parameters will be automatically restored.


random

protected java.util.Random random
Random number generator that should be used by the subclass instead of the global one, so that the user can control (non)determinism of the cell's behavior. It is desirable that the cell's behavior be fully repeatable under a given random seed and sequence of input samples, e.g., for the user to be able to enforce that "random" initialization of the training process is always done in the same way. Such ability is very useful in many common experiments.

Constructor Detail

Cell

protected Cell()
Statistics; currently switched off.


Cell

protected Cell(boolean isTrainable)
Parameters:
isTrainable - True if the cell object being created is trainable, which means that its initial state is Cell.State.EMPTY and method learn() can be called on this cell. Otherwise, the cell is fixed: it does not require learning and cannot be erased.
See Also:
learn()
Method Detail

state

public final Cell.State state()

getAvailableParams

public final ParametersInfo getAvailableParams()
Returns all available parameters with their default values and description of their meaning. Execute:

System.out.print( cell.getAvailableParams() );

to have all information about parameters printed on console.

Returns:
ParametersInfo with information about available parameters. null if this information was not provided by the subclass (in such case, consult documentation of the cell class). Empty ParametersInfo if there are no parameters that can be set by the user.

setParameters

public final void setParameters(Parameters parameters)
                         throws CellAccessException
Sets parameters of this cell, to be used by the cell during learning or data processing.

It is recommended as a general rule that if some required parameter has not been set before cell starts learning and/or processing data, subclass implementation will use its default value. However, clients should be aware that some subclasses may not follow strictly this rule.

Parameters can be set only if the cell is in Cell.State.EMPTY (for trainable cells) or Cell.State.CLOSED (for fixed cells).

Throws:
CellAccessException
See Also:
learn(), set(String, String)

set

public final void set(java.lang.String name,
                      java.lang.String value)
               throws CellAccessException
Sets value of a single parameter of the cell.

Throws:
CellAccessException
See Also:
Parameters.set(String, String), setParameters(Parameters), learn()

set

public final void set(java.lang.String name,
                      int k)
               throws CellAccessException
Sets value of the parameter to string representation of integer k.

Throws:
CellAccessException

set

public void set(java.lang.String name,
                double x)
         throws CellAccessException
Sets value of the parameter to string representation of real number x. Caution: x is a real value and its conversion to String may introduce rounding errors, so the value decoded later on may slightly differ from the value passed to this method.

Throws:
CellAccessException

set

public void set(java.lang.String name,
                boolean b)
         throws CellAccessException
Sets value of the parameter to string representation of boolean b.

Throws:
CellAccessException

getParameters

public final Parameters getParameters()
Returns currently set parameters of the cell. If no parameters have been set, empty list of parameters is returned.


setRandomSeed

public final void setRandomSeed(long seed)
                         throws CellAccessException
Sets seed of the random number generator used by this cell. Use this if the cell works in non-deterministic way, but you want to enforce deterministic behavior, e.g. you want to repeat the same experiment several times and every time the cell's output must be exactly the same, because it is passed to another algorithm, which you are analyzing.

Throws:
CellAccessException

newThread

public final void newThread(boolean b)
                     throws CellAccessException
Sets whether method Cell.Stream.next() of this cell will run in a separate thread.

Throws:
CellAccessException

newThread

public final void newThread()
                     throws CellAccessException
Sets that method Cell.Stream.next() of this cell will run in a separate thread.

Throws:
CellAccessException

setSource

public final void setSource(Cell source)
                     throws CellAccessException
Connects this cell with another cell that will serve as a source of input data when this cell starts learning or operating. To disconnect the cell pass null to this method.

Throws:
CellAccessException

learn

public final void learn()
                 throws CellException
Invokes cell's learning procedure onLearn(). Causes transition of the cell from Cell.State.EMPTY to Cell.State.CLOSED state. Usually the learning procedure creates internal content of the cell, for example: a decision model (classifier, clusterer etc.); a set of buffered samples (in the case of a buffer cell); results of an evaluation (e.g. TrainAndTest) etc. Only after training the cell can process data in open(), Cell.Stream.next() and Cell.Stream.close() methods.

Most cells require that the source of training data is provided before learning is invoked. This can be done by a call to setSource(Cell). The cell may also expose a number of parameters that can be set by the user in setParameters(Parameters) or set(String, String) to control behavior of learn().

Some cells may not implement learning, in which case they are ready to process data just after construction. Such cells are said to be fixed, in contrast to the cells that require learning, which are said to be trainable. Initial state of a fixed cell after construction is Cell.State.CLOSED. Fixed cell cannot be erased and can never move to Cell.State.EMPTY.

Throws:
CellAccessException - if the cell is not in Cell.State.EMPTY state.
CellMethodNotImplementedException - if method is not implemented in subclass.
CellInternalException - if method fails for some other reason.
CellException
See Also:
onLearn(), erase()

erase

public final void erase()
                 throws CellException
Erases the content of the cell created by learn() and brings the cell from Cell.State.CLOSED to Cell.State.EMPTY state. This is a reversion of learn(). After erase() the cell can be trained again. Note that only the content is erased, other fields of the cell, like parameter values or source link, are not changed! Implementation in subclasses should guarantee that the cell after erasure behaves exactly in the same way as a newly allocated one, with the same parameters.

Throws:
CellException
See Also:
learn()

open

public final Cell.Stream open()
                       throws CellException
Opens the stream of samples generated by this cell. Note that this may automatically trigger generation of samples, even before the first call to next(), because the stream may work in multithreaded way. In this case, samples are generated by a separate thread in advance, before they are requested.

Returns:
the stream.
Throws:
CellAccessException - if the cell is not in Cell.State.CLOSED.
CellMethodNotImplementedException - if method is not implemented in subclass.
CellInternalException - if method fails for some other reason.
CellException

next

public final Sample next()
                  throws CellException
Returns next sample from the currently open output stream.

Returns:
next sample or null
Throws:
CellException
See Also:
Cell.Stream.next()

close

public final void close()
                 throws CellException
Closes the currently open output stream.

Throws:
CellException
See Also:
Cell.Stream.close()

openInputStream

protected final Cell.Stream openInputStream()
                                     throws CellException
To be used by the subclass in order to open input stream from the source cell.

Throws:
CellException

setAvailableParams

protected final void setAvailableParams(ParametersInfo availParams)
To be used by the subclass in order to set information about parameters (names, default values, description) that will be returned to the user by method getAvailableParams(). In the future, this information will be used also to validate parameters supplied by the user (setParameters(Parameters), set(String, String)) and to automatically assign default values to parameters not assigned by the user.


onLearn

protected void onLearn()
                throws java.lang.Exception
Learning procedure of the cell. For example, may train the internal decision model; read and buffer input data; calculate an evaluation measure of another cell; calculate data-driven parameters of a preprocessing algorithm (e.g. attribute means for normalization algorithm) etc. Called by learn().

Must be overridden in all subclasses that implement trainable cells. If your cell is not trainable, you must provide this information to the Cell base class by calling Cell(boolean) instead of Cell() in your constructor. Overriders may safely assume that the cell is in Cell.State.EMPTY state when onLearn is called - this is guaranteed by implementation of learn().

Throws:
java.lang.Exception

onErase

protected void onErase()
                throws java.lang.Exception
Called by erase(). Performs the actual erasure of cell content, while erase checks only against access violation and handles exceptions. Must be overridden in subclasses if erasure is to be used. Overriders may assume that the cell is in Cell.State.CLOSED state.

Throws:
java.lang.Exception

onOpen

protected Sample.SampleType onOpen()
                            throws java.lang.Exception
Called by open(). Performs the actual opening of communication session, while open checks only against access violation and handles exceptions. Must be overridden in subclasses if open is to be used. Overriders may assume that the cell is in Cell.State.CLOSED state.

Throws:
java.lang.Exception

onNext

protected Sample onNext()
                 throws java.lang.Exception
Called by Cell.Stream.next(). Performs the actual generation of the next output sample, while Stream.next() checks only against access violation and handles exceptions. Must be overridden in the subclass if next is to be used, i.e. if the subclass should generate some output data. Overriders may assume that the cell is in Cell.State.OPEN state.

Throws:
java.lang.Exception

onClose

protected void onClose()
                throws java.lang.Exception
Called by Cell.Stream.close(). Performs the actual closing of communication session, while close checks only against access violation and handles exceptions. Must be overridden in subclasses if close is to be used. Usually the overrider will use onClose to release resources, to let them be garbage-collected. Overriders may assume that the cell is in Cell.State.OPEN state.

Throws:
java.lang.Exception

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object