Archives and Wikipedia

wikiarchive2-280gCreated with the use of "Wikipedia logo bronce" by User:Nohat [CC-BY-SA-3.0], via Wikimedia Commons"

Archives are repositories of human cultural heritage, their mission to preserve it and make accessible to everyone. Wikipedia is a universal, global encyclopedia. Archives  have in their keeping original documents and artifacts from the past. Wikipedia has millions of articles on all possible topics, including history. Archives and Wikipedia  are thus natural candidates for partnership, drawing on the resources and strengths of both. The notes below are based on the experience of Pilsudski Institute - Wikipedia partnership for the past year and a half.

Why Wikipedia?

Wikipedia is the largest encyclopedia, openly accessible to everybody. Anyone who has access to the Internet can use it. Wikipedia has about half a billion views monthly, 250 language versions and contains some 20 million articles. (The Polish version has over a million  articles and is in the first 10 in size). Users increasingly reach for Wikipedia for basic information on any topic; it applies especially to the younger generation, for whom computer and internet are everyday tools.

The Pilsudski Institute has some 1.7 million pages of documents, on events in Europe and the U.S., from WWI to WWII to Cold War. Virtually anyone who wants to write about history of Poland, Germany, Austro-Hungary, Ukraine, Belarus, Soviet Russia etc., during WWI and in its aftermath comes to the Pilsudski Institute to study the sources. The Institute has also a large library of historical books covering the period. Other archives have similar treasures, and it makes sense to create a partnerships with Wikipedia, which can disseminate this knowledge very effectively.

In the initial stages of the project there was some resistance to sharing resources with Wikipedia, stemming from the lingering doubts about the quality of a publication that “can be edited by anybody”. The other objection was a question of freely sharing our “valuable resources”. In the end, the Institute’s Board of Directors decided to enter into the partnership anyway, and after a year it has shown clear benefits in broadening accessibility to our resources and interest in the Institute.

Why Wikiproject?

Wikipedia has a reputation of being a reliable, unbiased source of knowledge.  The neutral point of view policy (NPOV) is one of the pillars of Wikipedia, zealously guarded by its community. In order to maintain such reputation, Wikipedia stresses strongly the potential for conflicts of interests and discourages editing articles about yourself, your company or cause. 

“A Wikipedia conflict of interest (COI) is an incompatibility between the aim of Wikipedia, which is to produce a neutral, reliably sourced encyclopedia, and the aims of an individual editor. COI editing involves contributing to Wikipedia to promote your own interests, including your business or financial interests, or those of your external relationships, such as with family, friends or employers [...] COI editing is strongly discouraged. It risks causing public embarrassment to the individuals and groups being promoted[...]1.

On the other hand, institutions that are either sources or depositories of knowledge and expertise; such as GLAM’s (galleries, libraries, archives and museums - see Do you GLAM?), universities, research institutions, NGO’s etc. can significantly contribute to Wikipedia. In order to facilitate such knowledge transfer, Wikipedia has developed concepts of Wikiproject and Wikipedian-in-Residence.

Wikiproject is a portal connecting a group of people working as a team to improve Wikipedia. Creation of a Wikiproject dedicated to an institution adds legitimacy to the team working on the articles related to this institution. Closely related is the position of Wikipedian-in-Residence (WiR), a volunteer or paid intern, representing Wikipedia in the organization and serving as a liaison between the institution and the Wikimedia community. The WiR model was first piloted by the GLAM initiative, but has since been adopted by other types of organizations. Wikprojects and WiR’s can minimize the impact of conflicts of interest and smooth the process of writing related articles.

In the Institute we have created the Wikiproject, and volunteers serve as WiR’s: Piotr Puchalski until August 2014, and Łukasz Chelminski from September 2014. Beginner Wikipedians were trained by the Wikipedian-in-Residence of the Metropolitan New York Library Council, Dorothy Howard, who also helped us in setting up the Wikiproject site. The process has worked very well, the WiR’s have created and updated many new articles, and activated a small group of volunteers who, as project participants, contributed to Wikipedia coverage of events, organizations and people associated with the Institute or which have archival and other resources in the Institute. In the first year of the project 54 articles were created or improved, by the Wikproject volunteers, both in English and Polish Wikpedia. Those 54 articles were viewed an astonishing number of 200,000 times in a year2.

Copyright

Copyright issues permeate Wikimedia projects at different levels. To begin with, when you write or modify an article, you automatically grant the Creative Commons CC BY-SA (previously GNU GFDL) license. Under those licences Wikipedia content can be freely copied, modified, and redistributed, with the proviso that  the copied version is made available on the same terms and acknowledgment of the authors of the Wikipedia article used is included.

It happens, however, that a writer, student or historian published somewhere a text that can be a great base of the Wikipedia article. He or she is willing to grant the appropriate licence, but does not have time or interest in becoming a Wikipedia editor. In such case we can use the Open-source Ticket Request System (OTRS) which main function is to request and grant (if appropriate) free licences to use a specific copyrighted material. The owner of the copyright is asked (usually by the article editor) to grant a licence, by email and preferentially using an email template. After the owner response is reviewed by an administrator, the material can be included in Wikipedia.

Copyright for illustrative material, most frequently photos or other multimedia, is even more complicated. Because there are so many complex copyright rules in many countries, different licences may apply to each uploaded file. You can upload your own photos (usually), those that are in public domain (which is not always easy to determine) or those which are freely licensed by the copyright owner. Not all of your own snapshots can be openly published - not of a monument in Italy (but it is OK in Germany) etc. On the other hand, if a monkey takes a picture or an elephant makes a painting, you can use it in Wikipedia. The uploading tools in Wikipedia Commons will guide you in the process of determining the copyright status of a file you are about to upload.

Within the Pilsudski Institute Wikiproject we have created a number of biograms of the Institute founders and presidents. The archives often contain their old photos, but it is usually close to impossible to determine the image’s author, whether he lives (and where), if he has any descendants - all questions relevant for determining its copyright status. In such cases one can use the Fair Use exception which allows, among others, to use a low resolution image of a deceased person for illustrative purposes3. In such case, the upload must be to the English Wikipedia, not to Wikimedia Commons. In the Polish Wikipedia the rules are different, and the Fair Use exception, although legally valid, cannot be used to supply the material. Therefore some of the biograms in Polish are pictureless.

Primary and Secondary Sources

Secondary sources are books, articles or discussions of some topic, a ‘digested and interpreted’ knowledge. Primary sources are sources that originate this knowledge - letters, original documents, eyewitness records, photographs, peer reviewed articles etc4. There are also tertiary sources, typically encyclopedias or review articles, that summarize or review knowledge based on its secondary sources.

One of the pillars of Wikipedia is the No Original Research (NOR) policy5. It is often interpreted as a prohibition against the use of primary sources in Wikipedia. In fact, no such prohibition exists in the official policy of Wikipedia, although one has to search throughout several documents to find it.

“Wikipedia does not publish original thought: all material in Wikipedia must be attributable to a reliable, published source. Articles may not contain any new analysis or synthesis of published material that serves to reach or imply a conclusion not clearly stated by the sources themselves"6.

What is reliable, published source then? One typically sees it as a book, or a peer-reviewed article, but the Wikipedia definition is quite a bit broader:

‘Source material must have been published, the definition of which for our purposes is "made available to the public in some form".[6]’7

Following subscript [6] we finally find

“[6] This includes material such as documents in publicly-accessible archives, inscriptions on monuments, gravestones, etc., that are available for anyone to see.”

This rule is very important to the Archive - Wikipedia partnership, because archives typically house primary documents - records, letters, artifacts created “then and there”. If the document appears reliable, one can safely use it in writing the article. In fact, official records, such as birth or death certificates etc. are the best and most reliable sources of data. The caution must be taken by the editor of the Wikipedia article not to add his or hers own interpretation of the facts, a caution applicable to the secondary sources as well.

In our work with the Wikipedia articles we increasingly use the archival sources - as sources of facts (for articles and for Wikidata) and as illustrative material from the Institute archives.

Donations and partnerships

The Institute, as many other GLAM institutions, has a treasure trove of original documents, some 100 or more years old. The archivists and volunteers who work on digitization, especially in the step involving describing the documents, are fascinated by top secret reports, military plans, personal letters and more. The Institute has entered into a partnership with Wikimedia Commons, a branch of the Wikimedia Foundation to donate some of those document images to Wikimedia. The process uses collections that have been converted to a digital form in the Institute’s digitization program. The digitized material -  image files with accompanying metadata  - undergo reformatting (chiefly consolidation of single pages into pdf documents), selection based on the copyright status, and finally mass uploading to Commons. Wikipedian Jarek Tuszyński is the primary mover in this partnership. Until now  over 1,400 documents in public domain form the Institute archive have been added to Commons.

One of the main problems in this operation is the categorization of the uploaded files. The category tree (or bush) in Wikipedia and in Commons, with its over 2.5 million categories, is difficult to work with. On the other hand the archives usually have its own hierarchy, created during the organization of collections - Fonds, Series, Folders, Documents etc., in addition to subject and index terms. We are currently in a planning stage of the “Scalable Archive Project”, for which we are seeking funding, and which would help bring together the archive and wikimedia commons taxonomies, and help other GLAMs in categorizing their donations.

Conclusion

The partnership with Wikipedia has been very rewarding experience for the Institute. The volunteers acquired new, useful skills and the Institute gained a channel to more easily disseminate knowledge about history. There are opportunities - Wikiprojects, Wikimedia partnership, WiR’s and challenges - use of resources, copyright etc. and I believe such cooperation can be beneficial for all archival institutions.

Marek Zieliński, November 11, 2014

References

1.  Wikipedia:Conflict of interest
2.  Data from sampling a 30 day period, Oct 2-31, 2014, using stats.grok.se
3.  Wikipedia:Non-free_content
4.  Primary, Secondary and Tertiary Sources, University Libraries, University of Maryland
5,6. Wikipedia:No Original Research
7.  Wikipedia:Verifiability

Explore more blog items:

PARTNERZY
Ministerstwo Kultury
Biblioteka Narodowa
Naczelna Dyrekcja Archiwów Państwowych
Konsulat RP w NY
Fundacja na rzecz Dziedzictwa Narodowego
PSFCU
NYC Department of Cultural Affairs