Managing Multimedia and Unstructured Data in the Oracle Database
上QQ阅读APP看书,第一时间看更新

Why store unstructured data in a database?

As the business imperative grows for companies to start managing and then publishing their digital image assets, the issue about where those image assets are stored is raised. More recently, multiple analysts have estimated that data will grow 800 percent over the next five years. The unstructured information accounts for more than 70–80 percent of all data in organizations and is growing 10–50 times more than the structured data(21).

This then grows to include any type of data. The initial choice is to store it on a disk file system. This can be seen as the quickest and simplest approach. Another, better, alternative is to store the images in the database.

A database is not normally considered to be an ideal repository for multimedia or any form of unstructured data. Historically they are known to have had issues with performance with large volume data retrieval. There has also been a noticeable lack of support with third-party tools, leaving any data in the database well and truly locked in. With the Oracle database this was seen in the older Oracle7 release and the use of long fields.

Only recently the possibility of storing multimedia in a database has become realistic. Though the capability has been around for some time, with the increase in disk capacity, and introduction of low-cost, high volume SANs, there has been a greater push towards moving any multimedia from the filesystem into the database. In the past five years, with changes in database technology and improvements in disk performance and storage, the rules have changed and it now makes business sense to use the Oracle database to store and manage all of an organizations digital assets.

Most companies are also now recognizing that large amounts of corporate knowledge and assets are stored within their filesystems. Accessing them is difficult and most do not follow standards for managing and dealing with them. As such, most are now looking to acquire or build some sort of digital asset management system.

Currently the type of unstructured data is limited to those classified as digital assets, which includes multimedia and some other forms of data. The notion of storing a whole operating system inside the database is yet to be reached due to the logistics of adhoc retrieval versus any perceived performance issues. Given time the question will be raised, should the database be the operating system? Having the database as the operating system changes the mentality for its use. It already has security, auditing, extensible programming languages, schemas, backup and recovery, diagnostic management, and a built-in web server. Though such a scenario does not exist, it's plausible and would definitely appeal to a niche market that only uses the database on its server. It would not replace a Windows or Mac PC, it might replace a Unix or Windows server.

The following are the strengths an Oracle database can offer over traditional filesystem storage.

Manageability

Images stored in the database can be directly linked with metadata. In the one transaction an image can be manipulated, a thumbnail of that image created, and all associated metadata modified. Related information is kept in sync. If an image is stored in a file system, it is possible for external processes to delete or modify that image, causing the image itself to either become orphaned or lose synchronicity with its corresponding relational data. Another common issue is web quality images losing their associated thumbnails, meaning web page displays become broken.

Oracle multimedia, which extends control over images, allows images to be manipulated inside the database. They can be resized, copied, converted, and rotated. This simplifies management of them and allows for the one programming environment (PL/SQL or Java).

Moving multimedia is simplified as only one object is being moved. When deleting any multimedia all associated thumbnails and metadata is deleted. Management becomes simpler and less prone to error, especially on recovery and when doing general database maintenance.

Security

If all images are stored in a directory, fine grained control is not possible. That is, it is not possible to restrict access of the images to individual users. Once users can gain access to an image in the directory they can access all of them (this is based on the assumption of using Digest Authentication).

By storing an image in the database, fine grained security becomes possible. Access to an image can be restricted to individual users and it also becomes possible to achieve the following:

  • Attach a timeout to access the image
  • Include check in/check out capabilities
  • Audit who accessed the image and when
  • Offer image exclusivity (one user accesses an image for a set period of time during which no one else can access it)

Using Oracle security it becomes possible to attach roles to images and introduce fine grained access on them. Security can be configured so that a user can access a thumbnail but not the original. Full auditing of who accessed each image and how they accessed it can be tracked. Auditing can also be included to keep track of network capacity used per user, making it possible to track and then charge for network usage.

Backup/recovery

The one backup program that is used to backup the database will also backup the images. This simplifies the backup process. In the event of failure the whole database can be recovered to the last committed transaction.

The traditional behavior for backing up a filesystem is to back it up daily or weekly. This means in the event of failure the filesystem on recovery will be out of sync with the database by at least one day.

So by having the images in the database only one backup program is required and in the event of failure only one recovery procedure is needed.

Another advantage that can be seen comes when using some of the more advanced database features such as standby databases and replication. Images are automatically replicated if the advanced replication option is used, and for disaster recovery situations, image data is automatically transferred to a standby database.

The Oracle database is designed to handle backing up and recovering very large volumes of data. Using RMAN, incremental backups ensure only the changed blocks are backed up. Backup and recovery can be done in parallel. The database supports full rollback. Recovery is done until the last committed transaction, meaning no data is lost. This is important as it ensures that when the database is recovered all multimedia and associated metadata match. If a filesystem was used to store the multimedia and a database to store the metadata, when failure occurs it becomes possible for the metadata to become out of sync with the files recovered on the filesystem.

Though the latest generation of SANs can do high volume and high-speed backups of data, there is still no way to guarantee complete consistency between the database on the SAN. With changes in technology, a number of SANs now support real-time block level replication and can ensure consistency. This capability is vendor and database specific.

Integration

All the data is in one location. The digital image becomes an object and can be accessed using the one query. Client server and web applications can access the one image and retrieve its associated metadata using the one SQL statement. Image management is also subject to the same transactional rules as relational data. Using PLSQL and/or Java, the one query can access a variety of multimedia types in the one query. For example, a photo with its associated metadata and video can be retrieved in the one simple query. It's also possible to do a query that not only retrieves the metadata, but also performs a spatial query to do analysis of it.

The simplistic nature of these highly complex tasks makes it a powerful option to use.

Extensibility

An image stored in the database can be indexed. If an image is a document it can be thematically searched and gists (summaries) can be extracted from it.

An image can be converted from one format to another. Metadata can be extracted from it. It can be copied, re-sized, and the image quality controlled.

Flexibility

When it comes to managing and controlling the images in the database, the Oracle database offers the greatest in flexibility. Sets of images can be deleted, updated, or copied as easy as it is to write a query.

Images can be linked together and metadata can be easily attached to them. All data related to an image or set of images can logically co-exist.

This adds flexibility, which gives a DBA and developer greater control over managing and working with the images.

Features

The Oracle database has various built-in features, which when used makes it easier to manage and deliver multimedia:

  • The Oracle database can extract metadata from a number of multimedia types. The metadata is stored in XML format making it easier to manipulate and control. For images, metadata can also be saved back into it. Using Oracle's built-in XML handling capabilities, accessing the XML data is easily done.
  • Transportable tablespaces: Multimedia can be migrated en-masse by copying them to a transport tablespace and then moving this tablespace to the new location. This is useful when firewalls are involved. It also allows for the large scale copying of image databases.
  • Database links: Multimedia can be copied between databases directly using a database link.
  • Oracle supports streaming of videos directly from the database.
  • Photos can be processed within the database. They can be rotated, re-sized, watermarked, or translated from one type to another.
  • Embedded gateway: Web access, including multimedia loading and retrieval is simply done using the built-in HTTP gateway.