Tuesday, March 24, 2015

Literal Culture World: Beyond the Culture-Agnostic Evolution of Literature



As mentioned in the previous posts, the Dig into Data Project (DiDP) is more about modeling and understanding of the literal culture evolution across apparently-different cultures. Such modeling not only allows to spot and highlight the key literal works and writers of a specific culture in a specific period of evolution time, it provides a mean to show that all literal cultures are to some degree “the same” at least from an evolution of literature (enlightenment) perspective. We frame such a similarity in the form of a culture-agnostic literature evolution mechanism that would be projectable on all cultures along their enlightenment period of time. 

The outcome of the DiDP would be a great achievement because it provides a new angle in understanding the literal cultures, and especially in hinting that all cultures are in essence the same. This would lead to implicit but significant conclusions that could have application beyond the DiDP.

Despite these applications, the goal of the DiDP would stay highly scientific because in the real world there is no isolated culture evolving by itself. Although in the past the level and speed of communications among cultures has been much less than that of today, a single spark initiated by a translation of an influential manuscript form a culture to another culture, which has been ready and eager, would be sufficient to start an onset of a significant evolution in receiving literal culture. Such an ignition could be seen as a big bang effect. Observing such interactions at the level of cultures is a critical understanding in order to accelerate literal evolutions and co-existence across the globe. The significant challenges would be in alignment of evolution periods of time of different cultures and also in modeling cross literal-domain interactions.

The DiDP database is rich in terms of cultures and literal domains. However, the number of domains, the periods of time, and also the geographical locations are so sacred that the chance of having a big bang event would be highly small. Enlarging the scope of the data to cover all literal data across the globe would be interesting and an ultimate goal. However, considering the resources available to the projects, this seems infeasible. What that seems to be a good test bed to evaluate the concept of the Inter-Culture Evolution could be a small world of a finite number of literal cultures that have been highly interactive among themselves in the form of various exchanges and migration, and also share the same greater geographical area for a considerable period of time. If the period of time of the available data of such a small world is long enough to capture a few of the inter-cultural big bang events, we can spot and highlight those inter-cultural evolutions using a moderate amount of research resources. Those highly intense inter-cultural interactions could be then studied and analyzed in deep and details in order to develop robust models for inter-cultural exchanges and phenomena. Such models can be leveraged at various levels toward a sustainable future of co-existing but highly evolving cultures across the globe.

Interestingly, in another project that we have been involved, i.e., the Indian Ocean World Project (IOWP), a very clear example of a small world has been studied for a long time. Serval cultures, such as Arabic, African, Chinese, Indian, Japanese, and Persian around the Indian Ocean have shared the same small world of ecology, economy, and people. This Indian Ocean World (IOW) has been highly dependent on significant climatic phenomena, such as the tropical Monsoon climate, and therefore some sort of ‘synchronization’ could be expected among its cultures. 

Thanks to the IOWP, a considerable amount of data, not limited to the literature, have been collected and hosted by the participating organizations in the project, and particularly McGill University. Such a huge data could be combined with the robust methodologies developed in the DiDP in order to open new horizons of understanding and insight on the cultural evolutions of such region, which could be then leveraged toward homogenous development across the IOW and also across the globe despite continuous increase in shortage and scarcity of resources, such as water.

We use the term Literal Culture World to name this combination of the DiDP outcomes and the IOWP data. Literal Culture World could be seen as a step beyond both Culture-Agnostic Evolution of Literature and also Global Economy of the IOW. Although we will start with the literal data, such a combination will not be limited only to literature. 


The Indian Ocean World Project

The Indian Ocean World Centre (IOWC) is a research initiative and resource base at McGill University. It has been established to promote the study of the history, economy and cultures of the lands and peoples of the Indian Ocean world (IOW). The IOW ranges from China to Southeast and South Asia, the Middle East and Africa.

For official web site, please visit:

http://indianoceanworldcentre.com/


What is the Dig into Data of Global Literature Project about?



The Dig into Data project focuses on developing analysis and understanding approaches which are culture/language agnostic. In other words, it pursues the idea that literal cultures evolve according to somehow global mechanisms that are independent from the actual language, culture and time period of the evolution. It is worth mentioning that there is another agnostic facet in this project in that sense that the developed methodologies are also expected to be transferable/migratable across languages/cultures in a straightforward way.

The database of the project compromises of several collections of manuscripts each one restricted to a specific language/culture and period of time. The period of time of each collection is selected in such a way that it highlights a transitional/evolutionary period of the associated literal culture. 

The objectives of the project are approached using methodologies that leverage on the singularities in the manuscripts at various levels. In this way, the developed methodologies would be simply transferable among collections with minimal re-modeling effort. Starting from the main collection of the project, the German European Enlightenment Collection, footnote objects were chosen as the singular events across the manuscripts. A complete set of document image processing methods has been developed to address challenges of processing this collection. The methods range from preprocessing and denoising, to layout analysis and correction, to typeface identification, to footnote marker and body detection and extraction, and to retrieval of the titles of cited manuscript in the footnotes. For many of these steps, such as preprocessing and layout analysis, our in-house state-of-the-art methodologies have been generalized and modified to address the new collection. On the whole collection consisting of more than 1,300 manuscripts, a set of more than 37,000 footnotes was detected and extracted, which is then being used to build some high-level understanding of the relations among manuscripts within the collection. It is worth mentioning that the amount of data provided by the singular features, i.e., the footnote objects, could be negligible compared to the total amount of the visual data that the collection carries from the understanding perspective. As a proof of generalizability of the approach, the methodologies were in a straightforward manner transferred from the German collection to the Chinese collection, the Collection of Chinese Women’s Writing from the Ming-Qing Dynasties, to detect and extract annotation markers and other singular features present on the manuscript’s pages of that collection. 

New visual methodologies are being developed based on novel representations and modeling in order to digest the whole set of document images in the form of a big, complex network of relations among the rich objects representing fractional, incomplete but complementary parts of image-content data encrypted in the manuscripts of a collection. In particular, spatial-patch graphs, error-bounded sparse representations, and multi-state (and quantum-state) state machines, among other approaches are on our road map toward partially addressing the challenges of understanding the documented human heritage.

Introduction

This blog is for my ideas on the "Global Currents: Cultures of Literary Networks, 1050-1900" project, also known as the "Digging into Data: Global Currents" project. The project is led by Prof Andrew Piper as the co-Lead PI along with Prof Mohamed Cheriet, Prof Elaine Treharne, and Prof Lambert Schomaker. This project is a Digging into Data Research Project with McGill University, ETS, Stanford University, and the University of Groningen. For official web site of the project, please visit:

http://globalcurrents.ca/2014/04/19/a-brief-introduction-to-the-global-currents-project/

http://txtlab.org/?cat=43

http://txtlab.org/?p=234

http://www.mcgill.ca/newsroom/channels/news/taking-big-data-challenge-232451