IMG_1056Every researcher starting their first digital archiving project will run into a series of roadblocks as per the web, or technology. Technology is that thing which allows us to view texts in Latin and old manuscripts no longer in print, line by line, letter per letter, and even translated into English. Yet, one finds decisions on the archiving end to be somewhat difficult when running into some of the common dilemmas of web technologies. At some point, the archiver realizes the user isn’t quite receiving the real experience of the original manuscript, or thing being archived. It feels like a paradoxical Derridian problem, by nature, where we are constantly circling around the thing that we wanted, the original, the signified, but never actually obtaining it because we are always just looking at the signifier. Then we, archivers, realize that we are constantly dancing around in a performance the early 20th century theorists outlined for us. It’s something we learn all too well in English 500 when we study theory, but don’t always observe in the real world. The internet is this space, and people who archive, participate in this performance endlessly as we attempt to encapsulate the real within the virtual.

There are many problems then, you can imagine, Susan and I ran into on the outset of this project. You could not look at our website and smell the old musty aroma of the pulps, or gather the fragility of the paper by the light treatment of your fingers on the paper, or even get a sense of their dimensions and the experience someone would have had in the 30s reading them, without actually traveling to Special Collections. An article on digital and analog media in A Companion to Digital Literary Studies outlines exactly this. John Lavagnino’s comments under the section “The Nature of Texts,”

The idea that text is in this sense a digital medium, a perfect-copy medium, is now widespread. And this view also fits many of our everyday practices in working with texts: we assume that the multiple copies of a printed book are all the same, and in an argument at any level about the merits of a recent novel it would not be persuasive if I objected, “But you didn’t read my copy of it.” In the world of art history, though, that kind of argument does have force: it is assumed that you need to see originals of paintings, and that someone who only saw a copy was seeing something significantly different. In discussions of texts we also assume that it is possible to quote from a text and get it right; you could make a mistake, but in principle it can be done and in frequent practice it is done. The digital view of text offers an explanation of what’s behind these standard practices. (Seisman)

Since the pulp magazines are art, contain art in them, and can be considered art objects because of their historical value, in a sense, a fundamental issue with this project involves aesthetics. The user must trust in the person archiving transcribed the copy accurately, or provided the best images, color corrected to represent their actual value. Karen Hunter outlines these issues in “Digital Archiving” via bullet points:

• What should be archived?

• In what format?

• How many copies of the archive are needed?

• Who holds those copies?

• What is the access to the archive and who controls that access?

• How does licensing affect archive building?

• What can the scholarly community afford? (Hunter 62)

These decisions are usually a collaboration between librarians or conservationists and institutions. The availability of these collections then depends on subscriptions. We would like to avoid this issue by providing the pulps not through a database, but through a website free to all. However, these questions are all pertinent to our approach and I find them currently active within the planning for the longevity and identity of the project.

Another Problem for Researchers/Archivers

I will first outline how I came to discover this problem within digital archiving, and it will illuminate our methodology and approach for the pulp project.

Recently I was writing a paper on The Epic of Gilgamesh and needed to locate and confirm only a few of the 200 sources in The Two Babylons. The reason being, the volume is criticized for it’s faulty citations and inaccurate paraphrasing. First of all, the Latin text being cited, the physical book is no longer obtainable, and the citation is over a hundred years old because The Two Babylons was written in 1853. The Latin text I had to acquire was not obtainable in any local library, campus library, or even searchable on the Library of Congress’ website. I did, however, discover the text in full on a website online. The person who had graciously provided the text in Latin even had a button where you could click to read the text in English. Here is where the problem occurred. I could hit command + F to search the web page for keywords, but I could not locate where the citation with the correct passage occurred. The reason? Because the web copy ordered the document by sections with no page numbers, and the text was a full block column down the width of the page, completely altered from its original form. There were no page numbers corresponding to the citation. The author of the The Two Babylons cited page numbers, and most sites still accord MLA format, which makes citations to websites or page numbers not accessible. I was unable to locate the passage and had to return to the text, or refer to other authors who mention the same passage in their articles. It was literally impossible for me to confirm the citation. The quality of my research would not be as good, and my argument might have a holes now. I felt frustration, anxiety at an impending conference I would present my research at, and upset over the time it took to locate the document on the web which became useless to me. I did not know what to do, and immediately it hit me; this is one of many many problems within the field of digital archiving and the decisions archivers make impacts the production of new research exponentially.

Our Archiving Methodology

Our methodology with the pulps emerged from a contemplation of this problem, the problem of not having the real thing in hand. At the beginning of this project, I chose WordPress as a landing space for our scifi and pulp project because of its expandability. What I mean by expandability is the easy way in which someone can migrate the site to a paid hosting plan where plugins can be installed and code can be adapted to the needs of the web developer (usually depending on the amount of content put on the site). It also worked as a small platform content management system where information can be categorized and organized by user, so having more than one writer for this project became manageable as I could see who posted what, when, and where. This seemed appropriate, although limited, for the project we were embarking on. That being said, I wanted to make sure whatever content we uploaded would be viewable in its original form, meaning that the content would also be quality. I chose, despite the standard TIFF format having higher resolution, to export each issue as a pdf so that the user might download it and view it in its original form, the way we held it in special collections, so as to get the original experience of these manuscripts (outside of smell).

What We Selected and Why

It is also worth noting that the approach and selection of the pieces has been strategic. Jacqueline Wernimont gives a treatise to feminist archives worth noting. Her study contemplates how the selection of items for an archive shapes the identity of that collection, much in the same way the museums of London reinforce or recreate a national identity. These collections must be shaped with care.

How We Did It

We attempted to use the scanning machine in special collections, at the direction of Patricia Prestinary, Archivist, but discovered, although it allowed us to change the file type to TIFF, it did not give much flexibility when exporting. The scanner would only allow us to take one photo at a time and exported as a large TIFF file. This method was not optimal for the project because each issue is approximately 122 pages.    Also the layout of the scanner required we place each issue facing down. This became problematic for many of the issues out of the 1930’s because the bindings were falling apart, as these issues were stapled together.

When I returned home from hours of scanning page by page on the Xerox machine, I discovered half of the scans auto cropped and cut off half of the page. This meant if we wanted to use the scanner we had to scan each page individually and then open Adobe Acrobat to combine them into a PDF. This would have meant extra time ensuring each page was in order. It seemed inefficient since I would then have to remove pages from the pdf or rescan certain pages. So, instead I used ScannerPro to take the photos and adjust their color, lighting, and crop edges. I then exported the PDF and uploaded it to our WordPress site, saving an image of the cover. From there I would type the table of contents page, so that the contents of the page would be searchable within the largest search engines. Since WordPress also allows for tagging, I thought WordPress would optimize our search-ability and make our content more readily available to other researchers. WordPress also allows for images to be named and tagged.

However, I decided to take our archiving a step further. We could provide a simulation of the experience of browsing the pulps, but not the actual thing. Another problem arose. There was no way to tag or insert editorial documentation text (standards used by researchers today) to provide information in a database, search engine, or even just google when a pdf is embedded. This text would be inserted via XML, and typically a page would be created in XML abiding the TEI guidelines. I attempted to integrate the XML into the “text” form page in WordPress, but once the author clicks “update” the xml is converted to blank space. This made me realize that until the site can be expanded to either a self-hosted platform or a hosting program, XML integration would be impossible. I am not seeking out another solution to this problem through a free XML hosting plan, where I can create the XML pages externally and separately link them to the WordPress pages. This would make the content available not through wordpress but through another platform, potentially cause structural issues with the site.

The user would benefit from the look of the pulps as a PDF, but be unable to search things not included in the table of contents. I decided for now the best solution was to hand type some of the poems, editor’s articles, and short stories. This method is time consuming, but it allows for maximum search-ability. The hope is with future development the site will be migrated to a hosting provider where it can be further expanded. Then the use of plugins will be available, and the site will be able to maximize design options. I think the ultimate goal of this project is to provide resources for researchers of science fiction, steampunk, and to shed light on fantastic stories in the pulps.

To Conclude

Blogging and developing an online voice is one way to show involvement in a community and a breadth of mastery over a particular subject, object, or subset of society. “Blog” emerged from the word “weblog” as its own organism in the 1990s when a handful of bloggers created a small discussion online, but now more and more undergraduates, graduates, and professors are utilizing blogs as a landing page for information, annotated bibliographies, and supplemental learning spaces (Siemans). We hope you will find these resources and methods helpful to your own research within the space of science fiction.


Bibliography

Siemans, Ray, and Susan Shreibman. A Companion to Digital Literary Studies. Blackwell Publishing. Online.

Hunter, Karen. “Digital Archiving.” Serials Review. 26.3. (2000): 62-64.

“Whence Feminism? Assessing Feminist Interventions in Digital Literary Archives.” Digital Humanities Quarterly.

Crane, Gregory. “Tools for Thinking: ePhilology and Cyberinfrastructure.” Online.

“NYU Workshop in Archival Practice.” Article on archival practice workshop.

“Computing in the Digital Humanities” A Syllabus. NYU 2011.


Resources & Approaches to Archiving

Perseus Digital Library

“What is the Digital Humanities?” Oxford Digital Humanities Online.

TEI: Text Encoding Initiative

“What is TEI?” Open Edition Press. 2014.

XML Handbook.

Debates in the Digital Humanities

General Archiving File Format List. dpbestflow.org

Learn WordPress Online

4Humanities


Models

NYU Archiving Project

Catullus and Carmen V (Poem 5) Project

Speculative Fiction Database Project

Electronic Book Review. Online

 

 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s