Over at Hyperallergic this week, I discuss the proposed release of over 2,000 Hieroglyphs into Unicode by 2020 or 2021. If you are a classicist then you know how important the Unicode movement has been in standardizing the visualization of Greek texts in particular. But the non-profit Unicode Consortium encodes many other ancient and endangered languages. This is a pivotal act of digital preservation for such scripts, one that is partially funded by the National Endowment for the Humanities (NEH) and spearheaded by the Script Encoding Initiative (SEI) lab at UC-Berkeley (founded in 2002).
As I note in the piece, Unicode had its nascence in 1987, when Xerox employee Joe Becker teamed up with Apple’s Lee Collins and Mark Davis. For his part, Becker had already written a seminal paper in 1984 for Scientific American called “Multilingual Word Processing.” In it, he addressed how “complex scripts” like Japanese, Hebrew, and Arabic can be better served through the creation of a “universal notion of ‘text'” that was broadly defined. That is where Unicode came from: the notion that text could be universal through the use of standardized numbers assigned to distinct characters.
To understand Unicode, we must start from the understanding that computers can really only understand numbers. Many early digital humanists had encoded Greek in something know as Betacode, which followed the standard of the American Standard Code for Information Interchange (ASCII). The transition to Unicode allowed for standardization across operating systems and also helped to allow for manuscripts to be successfully encoded as well. The creation of UTF-8 (variable width character encoding that used 8-bit bytes) was initially designed to work backward to be compatible with ASCII. After that, Unicode standards developed rapidly and were readily adopted by Classicists.
For paleographers, Unicode has been a pivotal building block for the XML encoding of texts. This method can allow for the dissemination of manuscripts, inscriptions, and handwritten texts. This encoding facilitates the search and then easy reuse of these transcribed texts. The Text Encoding Initiative (TEI) points out Unicode’s benefits: “Unicode is distinguished from other coded character sets by its (current and potential) size and scope; its built-in provision for (in practical terms) limitless expansion; the range and quality of linguistic and computational expertise on which it draws; the stability, authority, and accessibility it derives from its status as an international public standard; and, perhaps most importantly, the fact that today it is implemented by almost every provider of hardware and software platforms worldwide.”
The creation of Unicode not only lead to the standardization of emoji (for which most people know it), it revolutionized the fields of Classics and Digital Humanities by allowing for movement from the stone to the screen in a standardized manner. And that work continues today with their focus on Hieroglyphs. Unicode remains a seminal form of digital preservation through the encoding of ancient and endangered scripts, but it isn’t without a cost. As I have long preached, digital preservation is a necessary but not cheap role played by today’s libraries and also by consortia like Unicode. Funding for these initiatives can and does often come from university library budgets or university departments, but for larger projects, humanists must often turn to the National Endowment for the Humanities and other federal agencies; agencies which are increasingly under attack. That is why it is important not only to use Unicode, but also to give back if you can.