Large Objects - Part 3 Serialising Java Objects
In parts 1 and 2, we looked at how to load and retrieve large objects using Teradata BLOBs and CLOBs. In part 1, I put forward my unbreakable golden rule that you shouldn’t simply store Object Models as Large Objects. Rather, always map your objects’ attributes to columns in a table; don’t just serialize a bunch of classes and store them in a BLOB.
In part 3, I present another of my unbreakable golden rules; which is “Rules were made to be broken”. The focus of this article is how to serialize and persist (store) an Object Model into a BLOB on Teradata and reinstate it.
My golden rules
Given rule number 1, why would we want to serialise an Object model and store it in a BLOB which essentially makes it unusable in terms of SQL operations? The answer is that sometimes you will be in a situation where you need to manage a whole bunch of data that really is not useful to query and possibly not convenient to convert to its component elements and store in relational attributes. By way of example, an image is simply a sequence of bytes that when appropriately interpreted can be displayed or printed as a picture. You could; if you really, really wanted to store the bytes of an image stored in a table consisting of one row per row byte in the image. Most people don't store an image as a series of integers in a table. I wouldn't do that either, I like most, would store it as a BLOB. But you could do that if you really wanted to do so.
Consider the diagram. It is a diagram of an airfield (airport). This diagram could be stored as an image in a BLOB column. In this web page, it is just that, an image - a GIF image to be precise. However, the diagram is intended to be part of an application that allows the user to interact with the it. One example is of a user being given taxi instructions could enter them and have the taxi path highlighted on the diagram. This would not be easy to do if the diagram was stored as an image.
A custom object model is required to maintain the diagram and support the interactivity. This custom object model is the Airfield object model.
The Airfield object model needs to be able to be stored and retrieved to/from my database. I could store the Airfield Object Model as individual attributes in a multitude of tables or simply serialise it and store it as a BLOB. In this article I will attempt to justify and do the latter.
About the Airfield object model.
The Airfield object model consists of a number of classes representing all aspects of the diagram plus some additional features not shown (e.g. Radio Services).
In this initial version of the object model there are 14 classes representing runways, taxiways, aprons and other aspects of the airfield. Once could assume from this that to store the airfield class structure in a database, up to 14 different tables would be required, one for each class.
The actual number of tables required to store an airfield in a Teradata would likely be different to the number of classes in the Object model. The reasons are twofold:
There is some inheritance and some of the classes would be stored as attributes in one of the other tables. For example there is a class PreparedSurface which is subclassed to Runway, Taxiway and Apron. Unless the data model is constructed using subtypes we wouldn’t bother with mapping PreparedSurface to a table, rather, the attributes provided by PreparedSurface would be repeated in Runway, Taxiway and Apron tables. This reduces the number of required tables and joins.
- Some of the classes have subordinate data structures consisting of lists of data. These lists of data would have to be stored as either repeating groups in the “parent table” (which would not be practical) or in one or more dependant tables. For example, an Apron (where the planes park) consists of an arbitrary number of points defining its shape. Each of these bounding points would need to be stored as a row in a table with a 1:M relationship between Apron’s table and it’s dependent table (e.g. Apron Bounding Points or something like that). Additional attributes (not required in the object model) would be required to ensure that the points are maintained in the correct sequence.
There is nothing in the object model that would prevent the mapping from classes to tables and attributes to columns for storage in a 3NF (or denormalised) data model. The number of records that would need to be maintained (i.e. save and load) would however be considerable. Similarly the amount of code required to be built (whether O-R mapping configuration files or SQL queries loading an object model) would also not be insignificant. Once we are done, we would have a data model full of attributes which, quite frankly, wouldn't be useful for analytical work (and certainly not useful for the applications I have in mind). Following are two examples that hopefully better explain this point.
Consider a runway. It consists of designators (the numbers at the end of the runway) and points that define the runway’s ends relative to the airfield’s reference point. None of these are terribly useful to be used in a query. For example, it is unlikely that anyone is going to run a query listing airfields that consist of a runway that has an end point a certain distance from the origin or runway 35. Of course there will always be the exception, for example, someone concerned with the integrity of the data may wish to do this, but for the applications I have in mind, such a query does not make sense.
Taxiways also consist of designators (the letter identifying the taxiway) and a series of points relative to the airfield’s reference point that map out the path of the taxiway. In most, if not all, countries the naming of taxiways has not been formalized. As such taxiway designations can be somewhat arbitrary. Consequently, none of the taxiway attributes are particularly useful to use in queries. For example a query listing airfields that consist of taxiway “A” will pretty much return all airfields in the database; which is not terribly useful. Additionally, the location of each designator and the points that map out the path of the taxiway would need to be stored as individual entries in dependent tables having a 1:M relationship with Taxiway.
What is useful to use in a query is the airports IATA code (or ICAO code) and/or location (latitude/longitude). The IATA code is an abbreviation allocated to airfields around the world. Here are some examples of IATA codes that you may be familiar with together with the ICAO equivalent code:
- LAX – Los Angeles International (ICAO code: KLAX)
- SYD – Kingsford Smith (Sydney International) (ICAO code: YSSY)
- LYR – London Heathrow (ICAO code: EGLL)
So the IATA code will be one of the attributes that we store in the table containing our BLOB.
Persisting the object model
For the reasons above, I’ve decided that I will serialize my object model and store it as a Binary Large Object in Teradata. Obviously I will also need to restore my object model when reading it back from the database.
As we’ve seen in the other 2 articles, storing a large object from a file into Teradata is pretty easy. The problem that we have here is that any object model we want to work with isn’t one of the Java types that we can store in a BLOB. That is, the Airfield Object Model is not an array of byte (byte ) nor is it a Blob (which at the end of the day is basically just a wrapper for an array of byte).
So the trick is to convert our object model into a form that we can use; that is we need to convert our object model to an array of byte.
Note that in order to serialise an Object model, each and every class in the model must implement the marker interface java.io.Serializable.
The only way that I know how to convert an object model to an array of bytes is to serialize our object then get an InputStream to that serialized data. This (2 step process) sounds very convoluted because most people would think of serializing in terms of saving to a file which must then be read back in – not to mention figuring out how to keep track of a location to store the file, cleanup afterwards and so on!
Fortunately there are 2 classes that make the conversion of an object model to a byte array and back again very straightforward. These are java.io. ByteArrayOutputStream and java.io.ByteArrayInputStream. Consider the following code snippet which converts a Blob retrieved from the database into an Airfield Object structure:
Blob airfieldBlob = rs.getBlob("diagram"); InputStream is = airfieldBlob.getBinaryStream(); ObjectInputStream ois = new ObjectInputStream (is); Airfield airfield = (Airfield) ois.readObject(); ois.close (); is.close ();
At lines 1 and 2 we extract the airfield BLOB from our result set and obtain its InputStream. At this point it is just an array of bytes that aren’t terribly useful.
At line 3, we use the Blob’s InputStream to construct an ObjectInputStream. This is the class that does all the magic of converting the collection of bytes into a useable Airfield object (line 4). Finally we close all of the InputStreams and we are done. Simple!
Apart from I/O errors, the only things that really can go wrong with this are Class Cast Exceptions and Invalid Class Exceptions. The former (Class Cast Exception) will occur if the BLOB isn’t an instance of an Airfield object. The later (Invalid Class Exception) will occur if the signature of the classes has been changed. The Invalid Class Exception can be managed by placing SerialVersionUID in your class definitions. Refer to the serialver command in the JDK for information on generating these - your IDE may have a convenient plugin to generate them for you (mine does).
Saving the object model is essentially the reverse of the retrieval:
Airfield airfield = /* obtain an airfield from somewhere */ ByteArrayOutputStream bos = new ByteArrayOutputStream (); ObjectOutputStream oos = new ObjectOutputStream (bos); oos.writeObject(airfield); oos.flush (); oos.close (); bos.close (); byte  airfieldData = bos.toByteArray(); PreparedStatement pstmt = connection.prepareStatement(sqlText); .../* setting of other attributes */ pstmt.setObject (8, airfieldData);
To begin the save process we need to obtain a reference to our airfield which is performed at line 1.
Next we convert the object to a byte array using an ObjectOutputStream. This is performed at lines 3-8. Note that there is a need to flush the output stream (line 6). If you do not, it is possible that you will end up with an empty byte array; an empty byte array would be undesirable.
The final step is to extract the byte array and store it in our BLOB. This is achieved at lines 10-13. Note that the extraction of the byte array (line 10) and the setting of the Object (line 13) could be combined into a single call if you so desire. I’ve just separated it to hopefully make the overall process a little clearer.
The sample program
The attached sample program is an extremely simple program that prompts you for the airfield’s ICAO code. It then retrieves the airfield from the database, converts it to an object and passes it to the airfield renderer which draws the airfield.
To get the program working, you will need to:
- Create the table and load the sample data. The attached files may be used with SQL Assistant to do this (refer to part 1 and/or SQL Assistant help for details on how to use it to do the load).
- Create a project in your favourite IDE. Add the Teradata JDBC driver and the Airfield class library to your project’s class path (the class library requires JRE 6.0).
- Modify the sample program’s JDBC URL and user credentials. Then compile and run.
When you run the program, it will prompt you for the code of the airfield to retrieve. It will then attempt to connect to the database, retrieve and display the diagram. There may be a delay between entry of the code and display of the diagram while the database connection is established. Any exceptions that occur will be printed to System.out.
The airfield editor is quite a large application and I haven’t included it here. Consequently the save logic isn’t included in the test application. To explore saving Large Objects, you could use the sample application to modify an airfield’s attribute and save it. Alternatively you could use your own object model and save it in the database.
When the airfield is loaded from Teradata, the sample program overwrites certain attributes in the object model. The reason for overwriting them is that someone may modify these values that are exposed as columns in the table using SQL. For example, someone may modify the value of the elevation attribute using SQL. To provide some degree of data integrity, the values of the exposed attributes (city, state, latitude, longitude and elevation) have precedence to the values stored in the BLOB. Should you do this for your application? I can’t answer that because the answer would depend upon factors specific to what you are trying to achieve and how you design your object model and your data model (but you probably should).
In consideration of my two rules at the top of this article, should you serialize objects and store them in a BLOB? As with most things in life there is a very clear and definitive answer which is: it depends.
Well perhaps “it depends” isn’t definitive, but it is clear. You will need to evaluate all the pros and cons of storing objects as BLOB’s or in tables/columns and make a decision that works best for you.
So with the airfield model in mind, what are the attributes of serializing the Airfield Object or any object and storing it in a BLOB?
- The data model is simplified as only one table with 2 columns (a key and the BLOB) is all that is required (you could put more attributes in the data model like I did if necessary).
- Serialising the Object and saving it to the database is relatively fast as only one row needs to be recorded in the database.
- Serialising the object is easy and requires only a few lines of code.
- Attributes within the Serialised object are typically not easily used in SQL queries. This limits flexibility. For example, with this design, I couldn’t write an SQL query that returned nearby airfields that had a runway roughly aligned with the prevailing wind direction if such a query was needed.
- Since attributes in the serialized object cannot be queried (using SQL), those that make sense to be used in queries must be replicated in table attributes.
- There is a risk of inconsistency as a user can modify one of the exposed attributes using SQL. This won’t be reflected in the serialized object (e.g. if you change values in the airfields table, this is not automatically reflected in the Airfield BLOB). To modify values in the BLOB, an external program (e.g. an AirfieldEditor program) would be required. This has the benefit that there is no way for users to corrupt the diagram (i.e. the taxiway paths, runway ends etc) by modifying data that they should't be modifying using SQL.
By way of alternative, some of the attributes of storing the Airfield object in multiple tables (and not using a BLOB) are:
- The airfield contains many different types of Objects (Runways, taxiways etc) that each require different table structures. As such multiple tables will be required.
- Multiple Inserts (or Updates) will be required to save the data model – possibly numbering in the hundreds as more detail is added. Each point in a taxiway’s path; the location of the designators and the apron’s edges would each correspond to a row in a table. This means "save operations" could take a long time.
- Storing in a table is relatively easy, but would require lots of code will need to be written to save and retrieve objects (either OR mapping configuration or code using the java.sql classes).
- SQL Queries could be written to extract airfields using any or all of an airfield’s attributes. This provides maximum flexibility for SQL based users.
- There is no risk of inconsistent data since all attributes are stored just once in the data model. However, this leads to the risk that people modify data via SQL which may corrupt the structure. This risk exists because they can modify “internal data” (such as the points defining a taxiway path) without the need for the AirfieldEditor program (which theoretically constrains them to sensible changes that do not corrupt the structure).
Hopefully the above check list of factors will help you to decide the correct approach for you.
As a rule of thumb, I would say that if the data you are storing is complex and largely graphical in nature (e.g. an airfield diagram, 3D coordinate data for parts, 3D coordinate information for how parts assemble into modules etc) and requires a dedicated tool to manipulate them via point/click and drag and drop types of user interaction, then you might consider serializing the object model and storing it in a BLOB.
On the other hand if the data you are storing is simple (including simple graphics data - e.g. photos) or primarily data that can be displayed and edited as text; including things like account numbers and amounts displayed using digits (which is also just text) it should be stored as attributes in one or more tables.
I hope this article gives you some options (as opposed to created confusion) that may help you in building applications around Teradata that leverage complex data structures.
The diagrams and data supplied in this article are for illustration purposes only.
As such, the diagrams and data supplied in this article should not be used for operational purposes.
For operational purposes, you should consult current versions of the appropriate approved operational documentation such as ERSA.