The Mostly Color Channel: Little structured data

Tuesday, March 22, 2011

Little structured data

Today we are mostly interested in large data sets, like the megaimages we mentioned recently. Moreover, we are happy with flat unstructured data, which we comb and mine as needed. Personally, I prefer navigation and structure, but that is a matter of taste. Anyway, what is the trend for little data?

When we roll back the time machine to the days of computer assisted instruction (CAI), one of the early systems was Plato. A reason for its failure was that students tended to get lost in its unstructured graph of lessons. This problem was solved in Nievergelt's XS-1 by using a tree data structure, with the property of having a root and only one path between any two nodes.

In those days, structure was particularly important in documents, because the typewriter had let authors write in free style, while for technical documents a regimented style is much more efficient. This was first achieved through style sheets, like in TeX and Cedar's Tioga, then through more comprehensive systems like SGML, where content was decoupled and independent of appearance, with a dictionary encoding the enforced document structure. Most computer manuals where written in SGML, often using the FrameMaker application.

Then came the Web with HTML. The marriage of SGML with HTML begot XML. When the researchers at Sun were looking for a system-independent means to exchange data among Java applications running on a heterogeneous network of machines, they chose XML, and Java was quickly bestowed with rich XML manipulation libraries.

It did not take long to realize that an XML Data Type Dictionary (DTD) had a strong similitude to the schema of relational databases, and the jump from XML with DTD to XML with schema allowed for the system independent exchange of databases.

Unfortunately, this requires substantial parsing and can be slow on the mobile devices that are the platform of choice today. An alternative method for exchanging a database is to dump it as a sequence of SQL commands, which can be fetched from a completely different system and loaded by simply executing the dump file.

The issue for mobile devices like smart phone is footprint, because large software components eat too much battery power. Today's relational database system du jour is SQLite. It was invented in 2000 by Richard Hipp of General Dynamics, who implemented it on HP-UX for deployment on board guided missile destroyer ships.

With a footprint of only about 275 KB, it can fit everywhere a little structured data is needed, and with the whole database in a single platform-independent file, the data is easy to share, especially when you dump it as a sequence of SQL commands.

No wonder, its is now showing up under the hood in so many places. Here is a small sample:

Mozilla applications for storing bookmarks, cookies, contacts etc.
Lightroom and Aperture for storing the photograph metadata
Skype, iTunes, Apple Mail, Opera, McAffe Antivirus, etc.

SQLite is so small, it has been integrated in many systems, so it is available without requiring libraries and software installations: HP's webOS, Apple's Core Data, Adobe's Integrated Runtime (AIR), PHP, Python, Symbian, Maemo, Android, BlackBerry, and many more. It is open source and can be downloaded from here.

1 comment:

Dimitris Mylonas28 March, 2011 12:00
nice one, thanks for sharing
ReplyDelete
Replies

Add comment

About this blog

The Internet is an amalgam of forms blurred under epistemological pressures. In Søren Kierkegaard’s words, under this flat shower of leveled information, where everybody is interested in everything and nothing is too trivial or too important, people just accumulate information and postpone decisions indefinitely, i.e., nobody takes action and nobody is responsible for truth — there is no mastery, just gossip. He called this the æsthetic sphere of existence, exhorting us to evolve to the ethical sphere, where we do not just accumulate information but take action and make commitments. Blogs are instruments to overcome flatness by creating opportunities for vertical activities. In this sense this blog is a view from my window — a collection of tidbits I judged relevant to computational color science and in general to the promotion of scientific excellence in areas of strategic importance for the future of research, economy and society.

The Mostly Color Channel

Tuesday, March 22, 2011

Little structured data

1 comment:

Search This Blog

Featured Post

Meta-Palette

Understanding Color

Cognitive Aspects of Color

The Color Thesaurus...

Popular Posts

Blog Archive

Labels

Contributors

Blogroll

About this blog

Privacy Policy