Pachyderm Object Database


Pachyderm is an easy to use object database for Dolphin Smalltalk, not necessarily an elephant as the name implies.

Contents

Highlights

The messages that are used to implement Pachyderm revolve around a memory model. i.e.: #remember:as:, #forget and other messages.

The intention was to not bother the developer (probably you) with details of making your model instances persistent. I hope that has been achieved. The whole persistency starts from what are called root entries. A root is created by asking Pachyderm to #remember: an instance #as: a string. Anything that can be reached ("owned by") from a root is also made persistent automatically.

Of course, you will probably want to get these "remembered" instances back at some point. Just ask Pachyderm to #recall: the name you gave it in the #remember:as: message. The returned value will be the same instance that you had Pachyderm remember earlier. If you ask Pachyderm to #recall: an instance that it doesn’t know about, nil will be your answer. Any changes you make to the recalled instance will be carried forward to the persistent store as soon as you tell Pachyderm to #commit your changes. If you change your mind you could also ask Pachyderm to #forget your changes. Elephants can forget if you ask them nicely.

There are a few miscellaneous things that you will occasionally want to do. To backup your files it is recommended that you tell Pachyderm to #collectGarbage. Not only will this make copies of your database (see the section on Backups) but it will compress and optimize the current files after it backs them up.

Simple interface (index)

The goal from the start was to keep the interface simple. In fact it was to keep it completely transparent. Well, I didn’t exactly achieve completely transparent. But the value of Pachyderm is well hidden in its simplicity.

Unobtrusive in your model classes (index)

There is no required sub-classing in order to use Pachyderm. There are no required methods for you to write (except for a few where versioning is concerned). Simply design your model using your favorite techniques and let Pachyderm sort it all out to disk by itself. Be recursive, be verbose, be terse, be complex; Pachyderm will handle it.

Handles changes in class definitions (versioning) (index)

Of course we all need to change our models from time to time. Go right ahead. The next time Pachyderm sees the class it will "morph" the objects with class changes. This isn’t quite all magic; see the section on "Class Versions" for details.

Transaction support (index)

A group of updates to the persistent objects can be treated as one update. This update can be committed or forgotten as a whole. See the section on "Committing and Forgetting your transaction" for details.

Objects retain their identity (index)

Objects that compared with "==" before they were #remembered will compare the same way after they are #recalled. This is just as you would expect; no surprises here.

Handles recursive structures (index)

Pachyderm has no problems storing recursive structures. Think of an ordered collection that has itself as one of its elements. This is recursion in a simple form. There can be much more complexity as well and still not create a problem for Pachyderm. A case in point: The root directory (an internal dictionary) is stored in the root directory. How do you like that for recursion!

Garbage collection extended to the virtual cache (index)

Persistent objects can become garbage just as easy as transient objects. When an object can no longer be reached from a (named) root entry it becomes garbage. Pachyderm’s garbage collector will remove the garbage from the disk file where the objects are stored. The pre-collected version of the object data is backed up prior to collection.

Efficient retrieval from disk (index)

Pachyderm is intended for storing a lot of information on disk. Therefore much attention has been given to optimizing the conversion of the objects to their disk equivalents. As little disk space as possible has been taken to store the information.

User selectable location on disk (index)

This is accomplished through the #open: message to Pachyderm. You can specify the entire path and file name of the database. If it exists it will be opened and if it doesn’t it will be created.  


Open and Closing your Database (index)

Just like all external files, Pachyderm must be #open’ed and closed (#shutdown). Errors will be returned if you forget to open the database.  

#open: aPathFileName (index)

This method allows you to create and/or open an object database. Specify the name of the database with any appropriate path, but leave the extension off. If you do supply an extension it will be ignored. This allows you to have several databases for various applications on the same harddrive.

Example:

Pachyderm open: ‘u:\Databases\App1\DB’.   

#reopen (index)

This is very similar to the #open: message except that it has no parameter. It will use the parameter of the last #open message to open the database. This is handy when you need to #shutdown the database for some reason and then reopen it. You don’t need to retain the name of the database in order to do this.

Example:

Pachyderm reopen.   

#shutdown (index)

The method will cause the database to be closed. Any uncommited work will be forgotten. This is automatically done when the image is saved or shutdown.

Example:

Pachyderm shutdown. 

Remembering your objects (index)

When objects are remembered they are written to disk. Well, not exactly. The #commit message actually causes the disk to be updated.

#remember:as: (index)

This method allows you to create a "root" entry. A root entry is what makes your objects accessible in the persistent store. Any objects that can be reached from the root object are also accessible. Any other objects will be garbage and collected when necessary. For instance, if you remember a collection as ‘My Collection’ and add a bunch of instances to it they are all persistent. Later you tell Pachyderm to forget: ‘My Collection’. The root collection and all of its entries will be inaccessible from that point on.

Example:

Pachyderm remember: OrderedCollection as: ‘My Collection’ 

Retrieving your objects (index)

#recall:

This method allows you to acquire access to an instance that was previously #remember:as: ‘ed Provide the same string as was remembered and the same instance will be returned.

Example:

Pachyderm recall: ‘My Collection’. 

Forgetting your objects (index)

#forget:

This method is what you use to delete an instance from the persistent storage area. This is only used for named entries, i.e.: root entries. Other instance that are persistent because they are owned by a root entry are forgotten simply by not referencing them anymore. That would be normal, as you would expect. The real reason for root entries is so that while the image is down the instances in the persistent store are reachable.

Example:

Pachyderm forget: ‘My Collection’. 

Garbage Collection on the virtual level (index)

#collectGarbage

This method must be called directly for now. Garbage collection for Pachyderm means starting at all of the root entries and scanning the entire database. Any objects that can be reached are retained. Any objects that can not be reached are garbage and thus collected and removed from the files. Executing this method will essentially shutdown Pachyderm and any outstanding transactions will be forgotten. Remember to commit your work first.

Example:

Pachyderm collectGarbage 

Backups (index)

The backup method is private although in the future that may change. Backups are a side-affect of #collectGarbage ‘ing. Backups are stored in the same directory as the database. They are the same file names as the original database with the phrase ‘Backup #’ prepended to the name. The # is a number from 1 to 3. 1 being the most recent backup and 3 being the oldest.

When a new backup is made the #3 is rolled off to the big bit-bucket in the sky. #2 is rolled forward into the #3 slot. #1 into the #2 slot. And the current files are copied into the #1 backup slot.

If necessary, all of the files in any particular backup set may be copied into the original database files. It is required that all three be moved together. Be careful here; data can be lost if you copy the wrong files to the wrong places.


Committing and Forgetting your transaction (index)

Pachyderm is a multi-transaction object database. Instances are locked when their state has changed. Until the changes are #commit’ed no other transaction can have access to them. A ‘locked’ error will be produced for these circumstances.

It is not necessary to explicitly use the begin/end transaction messages. When Pachyderm starts it creates a transaction that will be used when no other transaction is active. It is not named. The affect of using Pachyderm in this way is that all updates to persistent objects are maintained in a single transaction. This is much like Pachyderm was before transaction processing was implemented.

For all transactions that are created with either of the #beginTransaction messages it is necessary to end them. The #endTransaction message is provided for this. The #endTransaction implicitly sends the #forget message to remove any locks that may be left hanging.

Transactions can be named with the #beginTransaction: message. This will allow you to make non-contiguous use of a transaction. If the same parameter is sent to a second #beginTransaction: before the first one is #ended then instead of creating a new one Pachyderm will simply continue the first one.

Transactions are maintained in a ‘stack-like’ implementation. For each #beginTransaction (even same named ones) an entry is pushed on the stack. For each #endTransaction an entry is removed. Symmetry is important.

#beginTransaction (index)

Send this message to Pachyderm when you need to begin a new unit of work. The locks on instances will be maintained within this transaction separately from all other transactions. This creates an unnamed transaction that can not be directly referenced later. It must be ended like all transactions. A #commit of a transaction does not affect any pending updates in any other transaction.

Example:

Pachyderm beginTransaction. 

#beginTransaction: (index)

Send this message to Pachyderm when you need to begin a new unit of work. The locks on instances will be maintained within this transaction separately from all other transactions. This creates an named transaction that can be directly referenced later in another #beginTransaction: message. It must be ended like all transactions. A #commit of a transaction does not affect any pending updates in any other transaction.

Any #commits and #forgets affect the named transaction in whole. Any updates that are pending under the named transaction are handled as a single transaction. That is because it is in fact, a single transaction.

Example:

Pachyderm beginTransaction: ‘My favorite transaction’. 

#endTransaction (index)

Send this message to Pachyderm when you are completely done with a transaction. An implicit #forget is sent before the transaction is removed. It is wise to not rely on this behavior and explicitly #commit or #forget the transaction before ending it.

If your are ending a transaction that is named and is still being used elsewhere it will not be removed. But any #commit or #forget will affect all uses of the named transaction. For the record, even though named transactions may not be removed when they are ended, all #endTransaction messages force a #forget.

Example:

Pachyderm endTransaction. 

#commit (index)

Send this message to Pachyderm when you want to have all of your changes within the current transaction since your last commit to be written to disk. Once committed, the objects are permanently changed on disk.

Example:

Pachyderm commit 

#forget (index)

This is the opposite of #commit. It will cause all the updates within the current transaction since the last #commit (or #forget) to be rolled back. They will be forgotten and the original instances will be refreshed as if none of the updates had taken place.

Example:

Pachyderm forget. 

Class Versions (index)

Our class models change over time. Therefore it is a requirement that any object database be able to handle this scenario. When an object is remembered not only is the data remembered but so is the definition of the class. Each time an instance of a class is loaded into memory its definition is compared to the current image definition for that class. It is when the definitions don’t match that the old one (the one in the object database) needs to be migrated to the current definition.

#migrateFrom:with: (index)

This is one of the exceptions that prove the rule. You required to write few methods than #migrateFrom:with:. And to be complete, this method is not even required. The purpose of the method is to convert an older class definition to the current class definition.

All of this processing described in this section occurs when Pachyderm starts up. It checks all classes currently in the database against the classes in the image. For any that are different Pachyderm makes a call to the #migrateFrom:with: method for that class. The call to #migrateFrom:with comes with two parameters. The first is an integer representing the version number found in the object database. The second is a dictionary of <variable name> / <value> associations that match the old definition.

Typically, the method should do three things. First it must create an instance of the class. Second, it must fill in the instance variables using values in the supplied dictionary. Also providing default values for new instance variables. And, thirdly, to return the instance of the class. This method should be like #new in most ways.

All classes are assigned a version number when they are first #remembered into the database. This number is always 1. Eventually you may need to change the definition of a class; i.e.: add or remove an instance variable. As far as Pachyderm is concerned this is a new class with the same name. Therefore it will assign the version number 2. Each additional change to the class will cause the version number to be increased by one each time.

The second parameter that comes into the #migrateFrom:with: is the dictionary. The keys of the dictionary are the instance variable names of the old format of the class; the one in the database. Each of the keys has a value in the dictionary (of course) that is the value for the named instance variable as it is in the database.

Example:

Instance variable name (key) Instance variable value (value)
‘name’ ‘Chris’
‘eMail’ ‘cdegreef@ix.netcom.com’
 

It is the task of this method to move the values into the proper place in the newer version of the class. Most of the time this means lines of code like:

Example:

migrateFrom: oldVer with: values 
	| newInstance | 
	newInstance:= self new. 
	newInstance firstName: (values at: ‘name’). 
	newInstance eMail: (values at: ‘eMail’). 
	^newInstance 

This example shows that the newer version of the class renamed the instance variable ‘name’ to be ‘firstName’. ‘eMail’ stayed the same but the value must be transferred to the new class anyway. If the case is that only new instance variables are added and nil is ok in them and old instance variables are removed then you don’t need to provide this method at all. The ‘super’ method will handle this kind of conversion automatically.

Here are some more examples of how this method can work. 

Example:

migrateFrom: ver with: values 
	| me | 
	me:= super migrateFrom: ver with: values. 
	me thatNewVariable: String new. 
	^me 

I know what I am going to say now is a little less than the OO way to do it. This is mostly because only one version of the named class can exist at a time in any image. So the current version of the class must be aware of prior versions in the sense that it knows how to convert from them to itself. It is advisable to use the ver parameter to condition how the new instance is to be set up. I suggest checking something like this:

(ver = 1) 
	ifTrue: ["convert from ver 1 to the current model"] 
	ifFalse: [(ver = 2) 
		ifTrue: ["convert from ver 2 to the current model"]]. 
		…  

highestVersionNumberSupported (index)

This is another method that is to be written by you when the definition of the class changes. It belongs on the class side of your class. It must return the highest version number (an integer) that your class can support migrations to.

The inherited method returns 1. Since all classes start out at version 1 this means that all classes, by default, can support version 1 as the highestVersionNumberSupported. When you change the definition of the class you need to subclass this method adding one to the version number that was there. The first time that means that the returned number will be 2.

Don’t forget to update the #migrateFrom:with: method to actually support the class definition. Pachyderm will fail to start up if it finds a newer version of a class in the image and the new version number fails to be <= to the returned value of #highestVersionNumberSupported for that class.

Example:

highestVersionNumberSupported 
	^2 

This tells Pachyderm that the #migrateFrom:with: method can handle the conversion to version 2 and below. If another change is made by you to this class and you forget to increase this number to 3 Pachyderm will not startup and an error will be generated.  


StartUp (index)

It is required that you #open the database prior to issuing commands to Pachyderm. Only one database may be open at a time. Therefore if a database is already open when you issue the #open message it will be closed. All outstanding transactions will be forgotten. #commit your work if you wish it to be saved.


Shutdown (index)

This method should not normally be used. In fact it is automatically called whenever the image is saved or Smalltalk is shutdown. The effect of sending #shutdown to Pachyderm is that the files are immediately closed. Any pending transaction updates are forgotten. It is necessary to send the #open: message before you can access the database after a #shutdown. See #reopen for a shortcut.


Development issues (index)

Pachyderm normally takes care of all of its necessary shutdown procedures when the image shuts down. This is normally a good thing for developing and most appropriate for runtime. But in development it can be a nice thing to have Pachyderm stop doing it’s thing. This will allow you to control the changes made to your classes and other such things. Pachyderm should not be running when you make changes to the definition of a class that has persistent instances. Tell Pachyderm to #shutdown. See the section on #shutdown to see what to do after you make your class modifications.


The Future (index)

I have big plans for Pachyderm. But future enhancements depend mostly on user requests. So this list is truly tentative depending on your comments. In other words, these enhancements are "vapor"-ware. And we all know what Pachyderm vapor can be like!

Multiple name-spaces (index)

This will allow you to have more than one object database open at a time. You will be able to specify the path where it will be stored and the file names that will contain the data.

ODBC support! (index)

This is a big one. An ODBC driver than can read the Pachyderm files. This will provide for those times when a relational ODBC tool is required for the system.

Transaction Logging (index)

This enhancement will log the changes to the persistent store. Possible to allow forward / backward recovery of the object database!

Multi-User support (index)

This will allow multiple images to access the same ODB. This will work even across the Internet!


Requirements (index)

Dolphin Smalltalk (of course!)

Pachyderm.Pac loaded into your image

PachydermWait.ani in your executable directory