Sarah E. Bond

The Gospel of Unicode: Digital Love Letter(s) and Art Through Numbers

Art and Graphics, digital humanities, hyperallergic, text, The State of the Field, Uncategorized

Over at Hyperallergic this week, I discuss the proposed release of over 2,000 Hieroglyphs into Unicode by 2020 or 2021. If you are a classicist then you know how important the Unicode movement has been in standardizing the visualization of Greek texts in particular. But the non-profit Unicode Consortium encodes many other ancient and endangered languages. This is a pivotal act of digital preservation for such scripts, one that is partially funded by the National Endowment for the Humanities (NEH) and spearheaded by the Script Encoding Initiative (SEI) lab at UC-Berkeley (founded in 2002).

Fragment of the Ptolemaic-era “stela of Horiraa” now at the British Museum. It has a funerary inscription of three horizontal lines incised with hieroglyphs. It is read from right to left. As the BM metadata notes, below the Hieroglyphs are two horizontal lines of black-painted cursive script that are written in Demotic. To the right are a number of encoded Hieroglyphs & Greek.

As I note in the piece, Unicode had its nascence in 1987, when Xerox employee Joe Becker teamed up with Apple’s Lee Collins and Mark Davis. For his part, Becker had already written a seminal paper in 1984 for Scientific American called “Multilingual Word Processing.” In it, he addressed how “complex scripts” like Japanese, Hebrew, and Arabic can be better served through the creation of a “universal notion of ‘text’” that was broadly defined. That is where Unicode came from: the notion that text could be universal through the use of standardized numbers assigned to distinct characters.

Becker’s initial article in a 1984 edition of Scientific American laid the groundwork for the Unicode Consortium’s creation and explained the need for the initiative. This is Figure 1 of the article (Scientific American, July 1984, 251.1: p.97).

To understand Unicode, we must start from the understanding that computers can really only understand numbers. Many early digital humanists had encoded Greek in something know as Betacode, which followed the standard of the American Standard Code for Information Interchange (ASCII). The transition to Unicode allowed for standardization across operating systems and also helped to allow for manuscripts to be successfully encoded as well. The creation of UTF-8 (variable width character encoding that used 8-bit bytes) was initially designed to work backward to be compatible with ASCII. After that, Unicode standards developed rapidly and were readily adopted by Classicists.

A Xerox-designed specialty computer for typing Japanese called Star. It is Figure 2 in Becker’s pivotal 1984 article.

For paleographers, Unicode has been a pivotal building block for the XML encoding of texts. This method can allow for the dissemination of manuscripts, inscriptions, and handwritten texts. This encoding facilitates the search and then easy reuse of these transcribed texts. The Text Encoding Initiative (TEI) points out Unicode’s benefits: “Unicode is distinguished from other coded character sets by its (current and potential) size and scope; its built-in provision for (in practical terms) limitless expansion; the range and quality of linguistic and computational expertise on which it draws; the stability, authority, and accessibility it derives from its status as an international public standard; and, perhaps most importantly, the fact that today it is implemented by almost every provider of hardware and software platforms worldwide.”

Manuscript by Greek calligrapher Luke the Cypriot (1594-1596) now at the Walters Art Museum in Baltimore. The text of the Gospel Lectionary has now been TEI encoded (Images via the Walters Art Museum, CCO).

The creation of Unicode not only lead to the standardization of emoji (for which most people know it), it revolutionized the fields of Classics and Digital Humanities by allowing for movement from the stone to the screen in a standardized manner. And that work continues today with their focus on Hieroglyphs. Unicode remains a seminal form of digital preservation through the encoding of ancient and endangered scripts, but it isn’t without a cost. As I have long preached, digital preservation is a necessary but not cheap role played by today’s libraries and also by consortia like Unicode. Funding for these initiatives can and does often come from university library budgets or university departments, but for larger projects, humanists must often turn to the National Endowment for the Humanities and other federal agencies; agencies which are increasingly under attack. That is why it is important not only to use Unicode, but also to give back if you can.

Unicode is a non-profit consortium. So why not give to them?

May 9, 2018

sarahemilybond

classics, computer history, DH, digital humanities, digital preservation, encoding, hieroglyphics, hyperallergic, manuscripts, TEI, text

The Gospel of Unicode: Digital Love Letter(s) and Art Through Numbers

Share this:

Leave a comment Cancel reply