Binary Filing

In Smalltalk, whenever you save your image, your entire world is saved to an Image File so that when it is reloaded, everything is miraculously, and conveniently, still there. However, it can be useful to save objects to disk independently of the 'all or nothing' image file. For instance, you may want to copy objects to another image, perhaps to that of a customer, or maybe you need to 'clone' an object a number of times when the original is not present in the image.

If you encounter these or similar situations, consider making use of Dolphin's Smalltalk Binary (STB) Filing classes. These classes provide a mechanism for serialising objects into a compact binary data structure that can reside either in a ByteArray or on a disk file. Dolphin uses STB itself in the Package system and as the storage format for its Resources to achieve both object delivery and cloning.

Storing objects
Storing globals
Retrieving objects
Mixing STB with other data

Customising storage
How to use a proxy
Deferred fixup
Overriding a proxy
Making use of the filer context

Converting STB data after layout changes

Binary filing classes
Debugging
Exceptions


Storing objects

When objects are stored they include any objects contained within their named and indexed instance variables. The translation to the compact binary format is performed by an instance of STBOutFiler. This routes its output via an attached, binary-aware stream which you must create first (this is typically a writeable FileStream in byte mode). Once the STBOutFiler is instanciated on the stream, objects can be output using the filer's #nextPut: method.

S := FileStream write:'Test.STB' text: false.
(STBOutFiler on: S)
	nextPut: X;
	nextPut: Y;
	nextPut: Z.
S close.

To store the objects to a ByteArray rather than a file, simply replace the binary-aware FileStream with a WriteStream on a ByteArray.

Storing globals

Do you need to make a filed-out object into a Smalltalk global variable automatically when it is filed back? Are you sure? If so, then you can use STBOutFiler>>#register:asGlobal: before filing out the object (or any references to it) to achieve this. The first parameter is the object which is to be made global and the second is its global symbol. #register:asGlobal: does no output itself but when the object is subsequently #nextPut: it marks the object such that when it is subsequently loaded it will be installed as a global with the given symbol.

You may register a global whose value is nil provided that nil is subsequently #nextPut:. Similarly, it is also OK to register the same object with several different global names provided the object is subsequently #nextPut: the corresponding number of times.

Retrieving objects

This is, as you might expect, just the reverse of storing. You need to create a binary-aware stream on the data and attach it to an STBInFiler. Then each send of the #next method will recreate and answer the corresponding object that was #nextPut: into the data stream by the STBOutFiler.

S := FileStream read: 'Test.STB' text: false.
STB := STBInFiler on: S.
X := STB next.
Y := STB next.
Z := STB next.
S close.

To retrieve the objects from a ByteArray rather than a file, simply substitute a ReadStream on the byte array instead of a file stream.

Mixing STB with other data

It is possible to store any number of STB 'sessions' on to a single stream or even mixing STB data with other data. The start of an STB session is marked with an STB header, this is read/written at the current position of the stream when the STBInFiler or STBOutFiler is created. If the STB data does not start at the very beginning of your stream then you must position the stream appropriately before creating the filer on it but you may subsequently reposition the filer for a new session on the same stream using the #position: method.

There is an #atEnd method for determining whether the end of the stream has been reached but this is really only useful in single-session streams because it does not indicate the end of an STB session, just the end of the underlying Stream. While reading a mixed/multiple session stream you will need to know how many objects were stored. In this case you can either store the data in a collection and #nextPut: that, or #nextPut: the count of objects at the start followed by a #nextPut: for each of the objects themselves.


Customising storage

Sometimes it is necessary to perform some special action on an object as it is being read from an STB data stream. For instance, a Symbol object cannot just be constructed from the data, Smalltalk must be made aware of its existence by 're-interning' it. Another example is a Collection subclass whose elements are hashed; they must be properly created with respect to their hash values or they will fail to work properly. If the object you are saving contains a reference to some object that must not be re-created, you may instead want to find the contemporary of the object in the target image on load.

In all of these cases you can achieve the required objective by using or creating a subclass of STBProxy. You simply store an instance of the proxy in place of the original object, the proxy contains sufficient methods and information to properly resolve itself back into the original object when re-loaded.

How to use a proxy

You indicate that you wish to change the way an objects is filed by adding an #stbSaveOn: instance method to your class. This controls the STBOutFiler passed as a parameter while filing the instance. Your #stbSaveOn: method must perform one of the following operations:

  1. To let the filer output the receiver in the normal way
    (this is the default inherited from Object):
    STBOutFiler>>#saveObject: self.
    or
    STBOutFiler>>#saveObject: self as: 0. 
  2. To output nil instead of the receiver:
    STBOutFiler>>#saveObject: self as: nil. 
  3. To output a proxy in place of the receiver:
    STBOutFiler>>#saveObject: self as: anSTBProxy.
    

You need to create a new subclass of STBProxy with instance variables to hold the crucial data. The proxy cannot, of course, contain the object that it trying to represent as this would cause and endless recursion. There are two main types of proxy, those that

  1. generate a new object,
  2. represent a way of finding an existing object.

If the proxy generates a new object then its #fixup:at: should instantiate a new object instanciated from its stored data and use #become: to swap it for the proxy thus resolving any other references to the proxy. In most cases you can just implement a #value method which answers the new instance and inherit the #become: behaviour from STBProxy.

If the proxy represents a way of finding an existing object then using #become: is going to have disastrous consequences for those objects that refer to the existing object. In this case your proxy will need to re-implement #fixup:at: method, sending STBInFiler>>#fixup:to: to fixup the filer to the desired object. Finally, answer the new object. E.g.

fixup: anSTBInfiler at: proxyIndex
	"Private - Tell anSTBInFiler to replace in its
	map of read objects the entry at proxyIndex
	(the receiver) by the object we specify.
	We must answer the new object."

	| newObject |
	newObject := Magic find: key.
 	anSTBInFiler fixup: proxyIndex to: newObject
	^newObject

STBInFiler>>#fixup:to: only replaces the proxy itself by the new object - it can not fixup up any other references to the proxy, so such a proxy should not contain any references to itself. However, if self references by the proxy are unavoidable then you can use #oneWayBecome: to fixup the nested references but note that this method is slower than #become:.

Deferred fixup

It is not always possible to complete the fixup operation until the STBInFiler>>#next has completed. For example, a hierarchy of windows must be created parent before child. The STBInFiler offers a #deferredAction: method which will perform the given niladic valuable (a MessageSend perhaps) on completion of the STBInFiler>>#next. So, in our hypothetical window hierarchy, the proxy representing the top parent window can defer the creation the real windows until all proxies have been loaded. If there is more than one deferred action then they are performed in the order in which they were issued.

Overriding a proxy

It is sometimes desirable to override the whole proxy mechanism for particular instances on a per output session basis. For instance, You may have a hierarchy of objects and you want to output from a certain point down, i.e. treating the crucial parent reference as nil. You can, of course, change the object before the filing process commences and then change it back again afterwards, or perhaps get #stbSaveOn: to file out different instances of the same class in different ways, but these techniques may prove inconvenient.

You can achieve a temporary proxy override using STBOutFiler>>#override:with:. Before filing out the object that needs special attention (or any references to it) you:

...
outFiler #override: theObject with: thisProxy.
...

where thisProxy is similar in concept to the as: parameter of STBOutFiler>>#saveObject:as:, i.e.

Making use of the filer context

Both STBInFilers and STBOutFilers contain a single, user-specified instance variable which can be accessed using #context: and #context. These are typically set before a filing session commences and then queried by the filed objects or proxies to determine the 'context' of the session. The filer itself puts no interpretation on this information, it just makes it available.

For example, it is sometimes useful to 'snip out' an object from its environment before you save it so that we don't save its entire neighbourhood - when the object is subsequently loaded it initially lives nowhere but its proxy can query STBInFiler>>#context to ascertain the new neighbourhood. Use STBInFiler>>#context: to specify the context information at any time prior to the send of #next that requires the information.


Converting STB data after layout changes

This can be a difficult job so it is worth stating right at the start that if you never rely on old STB data then you don't need to bother yourself with STB conversion issues!

When an object's data is written to the stream, its instance variables are written out in the order they appear in its definition, followed by any indexed variables. If the order or number of instance variables is changed then STB data stored before the change will either have the wrong meaning in a new instance, will not fit a new instance or both. Loading old-format instances is possible using the STB versioning mechanism.

All object data is prefixed by a reference to its basic class credentials including the STB version number of the class when the data was written. If a class undergoes a change in its format and it is necessary to load its old format from STB data then you need to write a conversion method to translate the old data to the new format or indeed to an instance of some other class.

You need to make essentially two changes:

  1. Increment the STB version number of the class
  2. Provide a conversion method to translate the layout

Override the class method #stbVersion to answer the new format version number. By default this method answers 0 so for a first change a new version number of 1 is required - be careful to ascertain the impact of the version number on subclasses.

Then you need to provide/amend the #stbConvertFrom: class method which is passed an instance of STBClassFormat describing the format and version number of the old object. This method should answer nil if it can't convert from the version identified causing the STBInFiler to signal an STBError.

The method should answer a monadic valuable (generally a block) that answers a new, current format instance initialised from the old data. It is important that the block answer a new object because the STBInFiler uses #become: to swap the temporary old data object with the new object. The old data is represented by an Array or ByteArray passed as the block's single parameter. You can calculate the basicSize (indexed size) of the old object by querying STBClassFormat>>#instSize.

Where several version changes occur it may be advisable to re-use previous converters using the following technique.

stbConvertFrom: anSTBClassFormat
	"Private - Answer a block to convert the given data array to
	an instance of the current version."

	| selector array newInstance |
	^[:data |
	array := data.
	anSTBClassFormat version to: self stbVersion - 1 do: [:version |
		selector := ('convertFromVersion', version asString, ':') asSymbol.
		array := self perform: selector with: array].

	newInstance := self basicNew.
	1 to: self instSize do: [:i |
		newInstance instVarAt: i put: (array at: i)].
	newInstance]

Note that the old object could be replaced by an object of an entirely different class.


Binary Filing Classes

STBClassFiler can be used to store a Smalltalk class in STB format. When saved to disk a compiled class file is usually given the extension STC. The compiled classes are not saved in their entirety:

They DO contain:

They DO NOT contain:

A binary class is loaded using an ordinary STBInFiler, what you get from #next is a 'private' class, i.e.. it is not global and it does not appear in the subclasses collection of its superclass. These private classes form part of Dolphin's scheme for deploying an applet over the web.

Binary classes can be created by the Class Hierarchy Browser Class/Save Binary menu or by the Package Browser File/Save Binary Classes menu. But beware - each class is in a separate file so there is a possibility that you can end up with a set of circular class references which would be unloadable. This situation is detected only at load time.


Debugging

If you should have trouble creating a proxy, loading from an STB stream or simply want to know more about STB then try using an STBDebugger in place of an STBInFiler. This simply writes a text representation of what is being read to the Transcript.

Each object read from the stream outputs a line of text starting with the offset within the stream. Then, after indenting to indicate nesting of the object, the objects prefix is shown in square brackets. Note that some objects such as a SmallInteger live entirely within their own prefix. If the prefix contains only a quoted string then the prefix identifies the first instance of the named class - subsequent instances are prefixed by the class name without the quotes. If the prefix contains something in angle brackets then it represents a reference to that object. For objects which have more than a prefix, their indexed size follows even for non-indexed classes (whoops). Finally, if the object is a byte object, they print their literal representation.

Exceptions

Both the underlying stream object and the STB filers themselves may signal exceptions during the saving or loading of objects. The STBInFiler and STBOutFiler raise non-resumable exceptions of class STBError. Where the filers are used with a FileStream the user code should trap any such exceptions to ensure that the stream is properly closed.