Resources
A range of resources including information about text analysis, the TAPoR workshop,
electronic text collections, and journals in the field of digital arts.
Quick Links

The University of Alberta TAPoR Node is proud to have worked in conjunction with exceptional researchers to produce and publish the following completed (though often still evolving) projects:
© 2004-2008 TAPoR @ UAlberta
Copyright and privacy statement
The TAPoR Workshop facilities are being used for a variety of projects, breaking new text-based humanities computing ground and setting the foundation for future research projects. Our high-powered servers and ample disk space are hosting projects with text-bases in the millions, even billions of words in scope. As such, these are a small sampling of the type of projects possible through TAPoR; our limits are your imagination.
If you are interesting in proposing a use for the TAPoR workshop,
download and complete the Use and Access to the TAPoR Workshop
and TAPoR Research Protocol and Infrastructure Use forms.
Please submit them to John Newman at:
4-36A Assiniboia Hall.
Completed - Published
Terry Nadasdi, Linguistics, University of Alberta
Stéfan Sinclair, Multimedia, McMaster University
Le Patron is a grammar-checking program used by foreign-language students to identify potential problems in their written work. The program invites student to submit a written assignment, and choose the language in which it is written. Using a complex series of regular expressions (pattern matching), the program identifies constructions which are likely incorrect, and directs the student to attend to these during revision.
The purpose the Le Patron is two-fold. First, it is a tool to be made available to thousands of French language learners in Canada and abroad. Secondly, it will allow graduate students and University of Alberta linguists to conduct research on SLA (second language acquisition).
Completed - Published
Mary Mahoney-Robson, University of Alberta Press
Alethea Adair, Research Assistant, University of Alberta Press
David Laurie, Humanities Computing
Stefan Sinclair, Multimedia, McMaster University
The University of Alberta Press, with the assistance of the Humanities Computing graduate program and other partners, is developing an electronic publication based on the research compiled in the Atlas of Alberta Railways project. With over 140 hand-drawn detailed maps and illustrations, historic photographs, and over 70 pages of information, the Atlas is a unique historic resource that documents the development and history of railways in western Canada from their origin to the 1950s.
By publishing the Atlas on the World Wide Web, the historic research and the extensive visual information contained in the maps will be made available freely over the Internet. The Atlas will be an important web-based resource for teachers in the elementary and high school systems, colleges and universities, as well as historians, geographers and the general public interested in railroads or the history of Canada. The Atlas materials have been encoded in XML, facilitating web development using Apache's Cocoon environment.
Completed - Published
Pamela Asquith, Department of Anthropology, University of Alberta
Peter Ryan, MA student, HuCo, University of Alberta, and PhD student, Ryerson University in Communication and Culture, Ryerson University
Prashant Jois, Co-op Engineering student, University of Alberta
Bess Sadler, MA student, Library Science, University of Alberta
Natasha Nunn, MA student, HuCo, University of Alberta
Craig Soars, undergraduate student, University of Alberta
Carmen Zelada, BA, University of Alberta
Bernard Higham, MA student, HuCo, University of Alberta
Peter Ryan
Carmen Zelada
This project makes available on a website the entire collection of the personal notes and papers of Kinji Imanishi (1902-1992), one of the most important biological and anthropological scientists in 20th century Japan. Imanishi was an entomologist, ecologist, and anthropologist, as well as a professional mountaineer and explorer. He founded primatology in Japan. The collection includes 8000 pages of fieldwork notes and diaries, letters to foreign scientists, maps, notebooks of studies of western authors, travel itineraries, budgets, photographs, etc. dating from 1919-1980. This represents not only the intellectual journey of an important scientist, but also details the history of ideas in especially ecology and primatology. The collection is in Japanese, English and German.
A digital image was made of every page of the original documents in 2002 and 2003 in Kyoto and in Edmonton. These images, along with the database and descriptions for each page, were uploaded to the University of Alberta's SunSite. Work is proceeding using TAPoR computers to prepare every image for the web interface, while a website is being designed by Peter Ryan and P. Asquith for the archive. The images will be available to scholars as high resolution Jpegs, and with protected access which is available upon request in TIFF format. The goal is to make this unique record available to scholars throughout the world as the original papers remain uncatalogued in the Imanishi family home in Kyoto, Japan.
Completed
Regula Qureshi, Department of Music, University of Alberta
Michael Frishkopf, Department of Music, University of Alberta
FolkwaysAlive! is a partnership between the University of Alberta and Smithsonian Folkways Recordings in Washington D.C. This initiative, based on the Moses and Frances Asch Collection of Folkways Recordings, is a celebration of "the sounds of the people." It is also a digital archive and research centre, student research scholarships, a concert series, and an opportunity for community partnerships and intercultural dialogue.
The home for FolkwaysAlive! and the Canadian Centre for Ethnomusicology at the University of Alberta is 347 Arts. Folkways makes use of the computing infrastructure for research provided through TAPoR.
Completed
John Newman, Department of Linguistics, University of Alberta
The Wenzhou Spoken Corpus (WSC) has been developed by Jingxia Lin and John Newman in the Department of Linguistics, University of Alberta, with technical support from the Text Analysis for Research Portal (TAPoR) team. WSC is an online, searchable corpus of transcribed spoken Wenzhou data, consisting of six sub-corpora: Face to Face Conversation, Phonecall, Wenzhou News Commentary, Internet Chat, Interview and Wenzhou Songs. Most of the conversational data was collected in downtown Wenzhou and Yueqing city, from 2004 to the present. Spoken forms that lack a conventional representation by characters have been transcribed using phonetic transcription. The files have been marked up in XML. The current corpus (Version 1.0) consists of about 150,000 words and is continually being expanded.
Completed - Published
Natalie Kononenko, Ukrainian Folklore Centre, University of Alberta
Folklore began with the study of words, or what is also called oral literature. In the last few years we have been able to record these spoken words, but, it has been difficult to present these recordings to researchers. Since 1998 I have been interviewing people in villages in Central Ukraine for my research on births, weddings, and funerals. The people have wonderful tales to tell, songs to sing, and valuable information on village life.
This web site provides access to the recordings made in these villages, made accessible through a variety of specific keywords and terms.
Ongoing
Terry Butler, Arts Resource Centre, University of Alberta
A collection of public domain and project XML texts which we will deliver in a variety of user interfaces. The searching systems will be content driven, and user-customizable. This environment shows to good effect the core technologies in use at Alberta: XML, XML transforms, XML searching, and native XML databases.
Ongoing
Chris Westbury, Department of Psychology, University of Alberta
A very large (7.7 billion word) corpus of text has been gleaned from the Internet.
Using software developed by the Westbury Lab, the text is analyzed for
word co-occurrence. Models of memory for meanings use this data to
explain the human ability to process the meanings of words.
The Westbury Lab has developed an intelligent USENET crawler and uses
other free web crawling software. Some processing of the massive
textbase has been offloaded to the WestGrid shared memory supercomputer.
Completed
PearlAnn Reichwein, Department of Physical Education and Recreation, University of Alberta
The history of the Banff School of Fine Arts in Canada's mountain national parks is the subject for investigation. The School was started by the University of Alberta in 1933 and transformed into today's Banff Centre. Archival research at the BARD, the Whyte Museum in Banff, and elsewhere will document the school. TAPoR has provided the team a web-based data collection system. Previously, a single-user bibliography system was used and had been adapted to project needs.
Completed
Stan Ruecker, Humanities Computing and English, University of Alberta
Reports from the 2004-2005 series of ERW meetings are available here:
Formed in 2003, The Experimental Reading Workshop is an ongoing, open-ended research and design project collectively run by students and staff at the University of Alberta. Its members meet bi-weekly to discuss innovations in the analysis, display, and reception of text and graphics through technology. Membership is open, and new members may bring existing projects, or ideas for projects, into the Workshop for comment and support.
Completed
Stephen R. Reimer, University of Alberta
Preparing a hypertext edition (text plus illustrations and maps) of Lydgate's fifteenth-century poem. The illustrations to be included are manuscript pages and illuminations as well as modern photographs of places mentioned in the text, maps, etc. (the story of St. Edmund involves various locations in East Anglia, while St. Fremund's story takes place in Warwickshire, Bristol, and Wales.). This project uses an oversize flatbed scanner.
Completed
Stephen R. Reimer, English Department, University of Alberta
Ann F. Howey, Department of English, Brock University
Karl Anvik, Humanities Computing, University of Alberta
This is a book-length bibliography of King Arthur in modern English culture, listing and describing works of literature, art, music, motion pictures, computer games, and other cultural artefacts involving the Arthurian legends, for the period of 1500-2000. The bibliography is to be published in book form by Boydell and Brewer (2006). The database version of the bibliography is geared towards the future research of the participants (its public availability remains subject to negotiations with the publishers).
Ongoing
Chris Fletcher, Department of Anthropology, University of Alberta
The Canadian North is experiencing an unprecedented period of economic development focusing on mineral, oil and gas extraction. The scale of industrialization present opportunities and challenges to Northerners, a large proportion of whom are Aboriginal people, who seek to maintain cultural and ecological integrity while maximizing benefits. This project provides one source of information that may illuminate the local experience of change associated with development: The public record of citizen involvement in Environmental Impact Assessment (EIA).
Public participation is enshrined in legislation governing the EIA process in Canada. In instances where large-scale industrial projects are proposed they are often accompanied by a series of public hearings during which people may express themselves on the possible impacts of the projects. Despite the significance of the EIA hearings the public record of citizen involvement in Northern EIA is unevenly dispersed. There is no single source giving access to the transcripts of public meetings and, when transcripts can be found, they are inconsistently presented. With rapid economic development occurring in the Canadian north it is increasingly important to understand how change is experienced, understood and projected by Northerners.
The objective of this project is to identify, collect, digitize and make available through the web public statements made in scoping and hearing sessions on large projects in the north. It is specifically concerned with Aboriginal statements in the same. For the immediate purposes of this study the "North" includes all three Territories, territory under the James Bay and Northern Quebec Agreement, and Labrador. All large projects that come under the terms of the scoping and/or comprehensive review processes would be targeted for inclusion in the database. The transcripts will be located, digitized, and coded according to a series of themes to be determined through a consultative process. A comprehensive effort to gather and make it more widely and consistently available would serve as an important research tool and meet the spirit of both the legislative context and the intentions of people who contribute their time to EIA.
Completed
John Newman, Department of Linguistics, University of Alberta
The first release of the American National Corpus (ANC) contains over 10 million words of written and spoken American English, annotated for lemma and part of speech. It has been acquired for research and education purposes from the Linguistic Data Consortium. The corpus is encoded with XML (using the TEI — Text Encoding Initiative — DTD). A web-based searching and display system which we have developed here through the auspices of TAPoR allows the corpus to be easily accessible to the University of Alberta community, and facilitate its use for research and teaching purposes.
Completed
Ryan Dunch, History and Classics, University of Alberta
A bibliography of Christian missionary work in China (in Chinese).
Completed
Adam Morton, Department of Philosophy, University of Alberta
For pedagogical purposes, we wished to provide a search environment for students studying symbolic logic. Searching with symbolic logic language, in standard and adapted forms, the student can explore the results against a simple table of properties: for example, find all the people who are skiers, find all the Albertans who like beer, find any person who is married to someone over 6 feet tall. The table of properties is dynamic. The user interface allows full Boolean and quantificational combinations, and is intended to be easy for students unfamiliar with logical notation.
Completed
Yoshi Ono, East Asian Studies, University of Alberta
A pilot project to digitize conversational Japanese (audio and video). TAPoR support has provided a password-protected site where the research team (some local and others located elsewhere) can store transcripts and digitized content. The digital materials can be up/downloaded and differing versions of transcripts easily compared.
Ongoing
John Newman, Department of Linguistics, University of Alberta
This project constitutes the Canadian contribution to the International Corpus of English (ICE). ICE is a major international research effort, begun in 1990, with the primary aim of collecting material for comparative studies of English worldwide, coordinated by Dr. Gerald Nelson, Department of English Language & Literature, University College London. As part of this project, fifteen research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus will consist of one million words of spoken and written English produced after 1989. Material for the Canadian component (ICE-CANADA) was collected in the early 1990's under the direction of Nancy Belmore at Concordia University, Montreal. Sabine Bergler (Concordia University) maintained this material until John Newman (University of Alberta) took responsibility for further development of the corpus in January 2006. TAPoR is providing assistance with digitization and markup of texts, as well as scanning facilities, OCR software, disk space, and a Sun Ray thin client. The project is currently supported by a grants from a University of Alberta Humanities, Fine Arts, and Social Sciences Research Fund (HFASSR), the Faculty of Arts, and the Department of Linguistics.
Ongoing
Anthony Harding, Department of Women's and Gender Studies, University of Saskatchewan
David Miall, Department of English, University of Alberta
Terry Butler, Director Research Computing in the Faculty of Arts, University of Alberta
Anthony Harding at the University of Saskatchewan, with David Miall and Terry Butler at the University of Alberta, is preparing an electronic index to the Coleridge notebooks. A WWWsearchable index will allow access to the complete text of Princeton University Press notebooks (five double volumes of text and notes) as well as Harding, Miall and Butlers added thematic, topical and other access points (including names, dates, and places mentioned in the notebooks); which were not included in the printed works.
The notebooks of Samuel Taylor Coleridge (1772-1834) are an important scholarly resource for 19th century European intellectual history, and as a record of Coleridge's wide-ranging interests in literature, politics, religion, science and the visual arts.
More information about Coleridge can be found through the Victoria Web at Brown University.
This research is funded by the Social Sciences and Humanities Research Council of Canada through their Standard Research Grants program. Additional support and assistance is being provided by University of Saskatchewan and University of Alberta.
Coleridge Project home page
Completed
Andriy Nahachewsky, Ukrainian Folklore Centre and MLCS, University of Alberta
The project aims to document everyday life, ethnic identity and regional variation among people of Ukrainian, French, German, and English heritage on the prairies up to 1939. The key questions for the project are: How did people from diverse backgrounds interact, adapt and "become prairie Canadians" in the first half of this century? What was the relationship between cultural inheritance and local community participation? How did they express their various identities on the local community level? What factors affected any regional variation in such communities as they evolved over time? The project will generate a great deal of documentary information and primary archival resources for further research in many aspects of the Canadian prairie heritage.
Ongoing
Jonathan Hart, Comparative Literature, University of Alberta
The Canadian Review of Comparative Literature is an international and interdisciplinary scholarly journal which provides a broad forum for scholars engaged in the study of literature. This project will provide the CRCL with a template for electronic transmission and publication, enabling more efficient distribution of articles, and allowing the journal to reach a broader audience worldwide.
The project involves the design and implementation of a flexible XML schema which will eventually be used for all document delivery and transformation. The designers are using current XML manipulation software, including XMLSpy and XMetal.
Ongoing
George Peschke, Department of Mathematical and Statistical Sciences, University of Alberta
The Felynx Cougati project (FCM) maintains a continuously
evolving repository of www-based learning and teaching aids
of university level mathematics, as well as corresponding
referencing tools. TAPOR's contribution to FCM includes the
following:
Completed
Stephen R. Reimer, English Department, University of Alberta
Ann F. Howey, Department of English, Brock University
Karl Anvik, Humanities Computing, University of Alberta
This is a book-length bibliography of King Arthur in modern English culture, listing and describing works of literature, art, music, motion pictures, computer games, and other cultural artefacts involving the Arthurian legends, for the period of 1500-2000. The bibliography is to be published in book form by Boydell and Brewer (2006). The database version of the bibliography is geared towards the future research of the participants (its public availability remains subject to negotiations with the publishers).
Ongoing
John Newman, Department of Linguistics, University of Alberta
The Dinka Narratives project arose out of a community outreach initiative in 2005-2006 involving the local Dinka-speaking Sudanese community in Edmonton. A number of traditional stories have been recorded and transcribed, as part of an effort to promote the use and understanding of Dinka among the youth in the community. Stories collected as part of this project are being added to a Dinka language website, along with an interactive dictionary. Audio recordings (individual words, as well as whole paragraphs), English glosses of the words, and free translation of sentences can also be viewed online. TAPoR is providing assistance with XML markup of texts, conversion of XML texts into web pages using PHP programming language, and disk space. A Roger S. Smith Undergraduate Student Researcher Award (Faculty of Arts, University of Alberta), won by Kristina Geeraert in 2006, has helped to support the collection, transcription, and markup of the stories.
Completed
Lai Fong Leung, Department of East Asian Studies, University of Alberta
A web-based bibliography of modern Chinese writers.
Ongoing
Daniel Aberra (supervisor: John Newman), Department of Linguistics, University of Alberta
This project will create an XML-based corpus of written and spoken Amharic, a Semitic language of Ethiopia. The corpus will be available online, with accompanying user interface. The project is a joint venture involving TaPOR (providing assistance with XML markup of texts, conversion of XML texts into web pages)) and the Department of Linguistics. So far, a representative sample of written Amharic has been collected and is being used as a testbed for the development of an appropriate tag set.
Ongoing
Sullay Mohamed Kanu (supervisor: John Newman), Department of Linguistics, University of Alberta
This project aims at developing both a written and a spoken corpus of Temne, a Niger-Congo language spoken in Sierra Leone. The corpus will consist of transcribed speech representing story-telling, narrative, and interview genres. The Temne Corpus will be available online along with search and concordancing tools. TAPoR is providing assistance with XML markup of texts, conversion of XML texts into web pages using PHP programming language, and disk space.
Ongoing
Terry Butler, Arts Resource Centre, University of Alberta
"Dynamic Text" is an umbrella for several text display and text processing environments through which scholars can interact with literary texts in original ways. The Text Teller is a reading environment for texts which could be deployed on small, mobile devices. Dynamic Text Display is a display environment which visualizes two similar texts in relation to one another. The Editorial Difference Engine is a dynamic environment in which the user explores the difference between two texts by transforming one text into another. All three environments have been reported upon in conference papers and articles: the Dynamic Text website on TAPoR will provide a stable home for the demos and interactive software.
Ongoing
Andriy Chernevych,Ukrainian Folklore Program, Department of Modern Languages & Cultural Studies, University of Alberta.
The project documents the experiences of Ukrainian Canadians who lived in the city of Edmonton between 1930 and the early 1950s. The research focuses on three primary areas: the Ukrainian community, everyday life culture, and urban landscape. The website promotes the Ukrainian Edmonton Project and creates a research and educational resource that features updated samples of audio interviews, historical photos from private collections, a database of participants, and other materials that reflect the history of Edmonton's Ukrainian community.