For this post, I am joined by a fellow student in Indiana University’s Information and Library Science Department, Sam Ott! Sam is a first year student, also working toward a dual-degree Master of Library Science and Master of Information Science, who has over three years of experience working in paraprofessional positions in multiple public libraries. Sam and I are taking the same core classes, but he is focusing his studies on public libraries instead of my own focus on academic and research libraries. With these distinct end goals in mind, we wanted to write about how the technologies we are learning in library school are helping cultivate our skills in preparation for future jobs.
On the academic library track, much of the technology training seems to be abstract and theory based, paired with practical training. There is a push for students to learn digital encoding practices, such as TEI/XML, and to understand how these concepts function within a digital library/archive. Website architecture and development also appear as core classes and electives as ways to complement theoretical classes.
Specializations offer a chance to delve deeper into the theory and practice of one of these aspects, for example, Digital Libraries, Information Architecture, and Data Science. The student chapter of the Association for Information Science and Technology (ASIS&T) offers workshops through UITS, in addition to the courses offered, to introduce and hone UNIX, XML/XSLT, and web portfolio development skills.
On the public library track, the technology training is limited to two core courses (Representation and Organization, plus one chosen technology requirement) and electives. While most of the coursework for public libraries is geared toward learning how to serve multiple demographics, studying Information Architecture can allow for greater exposure to relevant technologies. However, the student’s schedule is filled by the former, with less time for technological courses.
One reason I chose to pursue the Master of Information Science, was to bridge what I saw as a gap in technology preparation for public library careers. The MIS has been extremely helpful in allowing me to learn best practices for system design and how people interact with websites and computers. However, these classes are still geared toward the skills needed for an academic librarian or industry employee, and lack the everyday technology skills a public librarian may need, especially if there isn’t an IT department available.
We’ve considered a few options of courses and workshops which could provide a hands-on approach to daily technology use in any library. Since many academic librarians focused in digital tools still staff the reference desk and interact with patrons, this information is vital for library students moving on to jobs. We imagine a course or workshop series that introduces students to common issues staff and patrons face with library technologies. The topics of this course could include: learning how to reboot and defragment computers, hook up and use various audio visual technologies such as projectors, and troubleshooting the dreaded printer problems.
As public and academic libraries embrace the evolving digital trends, staff will need to understand how to use and troubleshoot ranges of platforms, makerspaces, and digital creativity centers. Where better to learn these skills than in school!
But we aren’t quite finished. An additional aspect to the course or workshop would be allowing the students to shadow, observe, and learn from University Information Technology Services as they troubleshoot common problems across all platforms. This practical experience both observing and learning how to fix frequent and repeated issues would give students a well-rounded experiential foundation while in library school.
If you are a LITA blog reader working in a public library, which skills would you recommend students learn before taking the job? What kinds of technology-related questions are frequently asked at your institution?
This blog post is cross-posted from the Open Access Working Group blog.
Details of OpenCon 2015 have just been announced!
OpenCon2015: Empowering the Next Generation to Advance Open Access, Open Education and Open Data will take place in on November 14-16 in Brussels, Belgium and bring together students and early career academic professionals from across the world to learn about the issues, develop critical skills, and return home ready to catalyze action toward a more open system for sharing the world’s information — from scholarly and scientific research, to educational materials, to digital data.
Hosted by the Right to Research Coalition and SPARC, OpenCon 2015 builds on the success of the first-ever OpenCon meeting last year which convened 115 students and early career academic professionals from 39 countries in Washington, DC. More than 80% of these participants received full travel scholarships, provided by sponsorships from leading organizations, including the Max Planck Society, eLife, PLOS, and more than 20 universities.
“OpenCon 2015 will expand on a proven formula of bringing together the brightest young leaders across the Open Access, Open Education, and Open Data movements and connecting them with established leaders in each community,” said Nick Shockey, founding Director of the Right to Research Coalition. “OpenCon is equal parts conference and community. The meeting in Brussels will serve as the centerpiece of a much larger network to foster initiatives and collaboration among the next generation across OpenCon’s three issue areas.“
OpenCon 2015’s three day program will begin with two days of conference-style keynotes, panels, and interactive workshops, drawing both on the expertise of leaders in the Open Access, Open Education and Open Data movements and the experience of participants who have already led successful projects.
The third day will take advantage of the location in Brussels by providing a half-day of advocacy training followed by the opportunity for in-person meetings with relevant policy makers, ranging from the European Parliament, European Commission, embassies, and key NGOs. Participants will leave with a deeper understanding of the conference’s three issue areas, stronger skills in organizing local and national projects, and connections with policymakers and prominent leaders across the three issue areas.
Speakers at OpenCon 2014 included the Deputy Assistant to the President of the United States for Legislative Affairs, the Chief Commons Officer of Sage Bionetworks, the Associate Director for Data Science for the U.S. National Institutes of Health, and more than 15 students and early career academic professionals leading successful initiatives. OpenCon 2015 will again feature leading experts. Patrick Brown and Michael Eisen, two of the co-founders of PLOS, are confirmed for a joint keynote at the 2015 meeting.
“For the ‘open’ movements to succeed, we must invest in capacity building for the next generation of librarians, researchers, scholars, and educators,said Heather Joseph, Executive Director of SPARC (The Scholarly Publishing and Academic Resources Coalition). “OpenCon is dedicated to creating and empowering a global network of young leaders across these issues, and we are eager to partner with others in the community to support and catalyze these efforts.”
OpenCon seeks to convene the most effective student and early career academic professional advocates—regardless of their ability to pay for travel costs. The majority of participants will receive full travel scholarships. Because of this, attendance is by application only, though limited sponsorship opportunities are available to guarantee a fully funded place at the conference. Applications will open on June 1, 2015.
In 2014, more than 1,700 individuals from 125 countries applied to attend the inaugural OpenCon. This year, an expanded emphasis will be placed on building the community around OpenCon and on satellite events. OpenCon satellite events are independently hosted meetings that mix content from the main conference with live presenters to localize the discussion and bring the energy of an in-person OpenCon event to a larger audience. In 2014, OpenCon satellite events reached hundreds of students and early career academic professionals in nine countries across five continents. A call for partners to host satellite events has now opened and is available at opencon2015.org.
Applications for OpenCon 2015 will open on June 1st. For more information about the conference and to sign up for updates, visit opencon2015.org/updates. You can follow OpenCon on Twitter at @Open_Con or using the hashtag #opencon.
We are delighted to announce that the Digital Public Library of America (DPLA) has become the latest formal Hydra Partner. In their Letter of Intent Mark Matienzo, DPLA’s Director of Technology, writes of their “upcoming major Hydra project, generously funded by the IMLS, and in partnership with Stanford University and Duraspace, [which] focuses on developing an improved set of tools for content management, publishing, and aggregation for the network of DPLA Hubs. This, and other projects, will allow us to make contributions to other core components of the Hydra stack, including but not limited to Blacklight, ActiveTriples, and support for protocols like IIIF and ResourceSync. We are also interested in continuing to contribute our metadata expertise to the Hydra community to ensure interoperability across our communities.”
A warm welcome into the Partners for all our friends at the DPLA!
By Bram Luyten, @mire
That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Naun Chew of Cornell and Stephen Hearn of the University of Minnesota. Focus group members manage a wide variety of image collections presenting challenges for metadata management. In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments. Depending on the nature of the collection and its users, questions concerning identification of works, depiction of entities, chronology, geography, provenance, genre, subjects (“of-ness” and “about-ness”) all present themselves; so do opportunities for crowdsourcing and interdisciplinary research.
Many describe their digital image resources on the collection level while selectively describing items. As much as possible, enhancements are done in batch. Some do authority work, depending on the quality of the accompanying metadata. Some libraries have disseminated metadata guidelines to help bring more consistency in the data.
Among the challenges discussed:
Variety of systems and schemas: Image collections created in different parts of the university such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library. The metadata often comes in spreadsheets, or unstructured accompanying data. Often the metadata created by other departments requires a lot of editing. The situation is simpler when all digitization is handled through one center and the library does all of the metadata creation. Some libraries are using Dublin Core for their image collections’ metadata and others are using MODS (Metadata Object Description Schema). It was suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema).
Duplicate metadata for different objects: There are cases where the metadata is identical for a set of drawings, even though there are slight differences in those drawings. Duplicating the metadata across similar objects is likely due to limited staff. Possibly the faculty could add more details. Brown University extended authorizations to photographers to add to the metadata accompanying their photographs without any problems.
Lack of provenance: A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance. For example, a researcher took OCR’ed text retrieved from HathiTrust, ending up with millions of images. However, the researcher didn’t include the metadata of where the images came from. The challenge is to support both a specific purpose and group of people as well as large scale discovery.
Maintaining links between metadata and images: How should libraries store images and keep them in sync with the metadata? There may be rights issues from relying on a specific platform to maintain links between metadata and images. Where should thumbnails live?
Relating multiple views and versions of same object: Multiple versions of the same object taken over time can be very useful for forensics. For example, Stanford relied on dated photographs to determine when its “Gates of Hell” sculpture had been damaged. Brown University decided to describe a “blob” of. various images of the same thing in different formats and then have descriptions of the specific versions hanging off it. The systems used within the OCLC Research Library Partnership do not yet have a good way to structure and represent relationships among images, such as components of a piece.
Integrating collections from different sources: Stanford is considering ingesting images from a local art museum, many of which are images for a single object, so that scholars can study the object over time. They are wondering how to represent them in their discovery layer. MIT is trying to integrate metadata coming from different departments so that they can contribute to different aggregators, such as the DPLA. All involved get together to make sure that there is a shared understanding. Contributing and having images live in an aggregated way present new challenges.
Yale’s largest image collection is the Kissinger papers, with about 2 million scanned images. For much of the collection, metadata is very scanty. Meetings among the collection owner, metadata specialist and systems staff try to resolve insufficient or questionable data and to come to a shared understanding. They store two copies of each image: TIFF (preservation and on request) and JPEG for everything else).
Managing relationships with faculty and curators: It’s important to ensure that faculty feel their needs are met. Collaboration is necessary among holders of the materials, metadata specialists and developers as all come from different perspectives.
Challenges of bringing together different images or versions of the same object in a large aggregation were explored by OCLC Research’s Europeana Innovation Pilots. The pilots came up with a method for hierarchically structuring cultural objects at different similarity levels to find semantic clusters.
About Karen Smith-Yoshimura
Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.Mail | Web | Twitter | More Posts (57)
During the last nine months, the Digital Public Library of America has been researching educational use with generous support from the Whiting Foundation. We’ve been learning from other online education resource providers in the cultural heritage world and beyond about what they have offered and how teachers and students have responded. We also convened focus groups of instructors in both K-12 and undergraduate higher education to hear about what they use and would like to see from DPLA. Today, we are proud to release our research findings and recommendations for DPLA’s future education work.
Preliminary feedback from educators suggested that DPLA was exciting as a single place to find content but occasionally overwhelming because of the volume of accessible material. In our focus groups, we learned that educators are eager to incorporate primary sources into instructional activities and independent student research projects, but we can better help them by organizing source highlights topically and giving them robust context. We also discovered how important it was to educators and students to be able to create their own primary-sourced based projects with tools supported by DPLA. From other education projects, including many supported by our Hubs, we learned that sustainable education projects require teacher involvement, deep standards research, and specific outreach strategies. Based on this research, we recommend that DPLA and its teacher advocates build curated content for student use with teacher guidance, and that DPLA use its position at the center of a diverse mix of cultural heritage institutions to continue to facilitate conversations about educational use. We see this report as the beginning of a process of working with our many partners and educators to make large digital collections like DPLA more useful and used.
Seminar Programme: Göttingen Dialog in Digital Humanities (2015)
The dialogs take place on Tuesdays at 17:00 during the Summer semester (from April 21th until July 14th). The venue of the seminars is to be announced, at the Göttingen Centre for Digital Humanities (GCDH). The centre's address is: Heyne-Haus, Papendiek 16, D-37073 Göttingen.
Yuri Bizzoni, Angelo Del Grosso, Marianne Reboul (University of Pisa, Italy)
Diachronic trends in Homeric translations
Stefan Jänicke, Judith Blumenstein, Michaela Rücker, Dirk Zeckzer, Gerik Scheuermann (Universität Leipzig, Germany)
Visualizing the Results of Search Queries on Ancient Text Corpora with Tag Pies
Jochen Tiepmar (Universität Leipzig, Germany)
Release of the MySQL based implementation of the CTS protocol
Patrick Jähnichen, Patrick Oesterling, Tom Liebmann, Christoph Kurras, Gerik Scheuermann, Gerhard Heyer (Universität Leipzig, Germany)
Exploratory Search Through Visual Analysis of Topic Models
Christof Schöch (Universität Würzburg, Germany)
Topic Modeling Dramatic Genre
Peter Robinson (University of Saskatchewan, Canada)
Some principles for making of collaborative scholarly editions in digital form
Jürgen Enge, Heinz Werner Kramski, Susanne Holl (HAWK Hildesheim, Germany)
»Arme Nachlassverwalter...« Herausforderungen, Erkenntnisse und Lösungsansätze bei der Aufbereitung komplexer digitaler Datensammlungen
Daniele Salvoldi (Freie Universität Berlin, Germany)
A Historical Geographic Information System (HGIS) of Nubia based on the William J. Bankes Archive (1815-1822)
Daniel Burckhardt (HU Berlin, Germany)
Comparing Disciplinary Patterns: Gender and Social Networks in the Humanities through the Lens of Scholarly Communication
Daniel Schüller, Christian Beecks, Marwan Hassani, Jennifer Hinnell, Bela Brenger, Thomas Seidl, Irene Mittelberg (RWTH Aachen University, Germany, University of Alberta, Canada)
Similarity Measuring in 3D Motion Capture Models of Co-Speech Gesture
Federico Nanni (University of Bologna, Italy)
Reconstructing a website’s lost past - Methodological issues concerning the history of www.unibo.it
Edward Larkey (University of Maryland, USA)
Comparing Television Formats: Using Digital Tools for Cross-Cultural Analysis
Francesca Frontini, Amine Boukhaled, Jean-Gabriel Ganascia (Laboratoire d’Informatique de Paris 6, Université Pierre et Marie Curie)
Mining for characterising patterns in literature using correspondence analysis: an experiment on French novels
As announced in the Call For Papers, the dialogs will take the form of a 45 minute presentation in English, followed by 45 minutes of discussion and student participation. Due to logistic and time constraints, the 2015 dialog series will not be video-recorded or live-streamed. A summary of the talks, together with photographs and, where available, slides, will be uploaded to the GCDH/eTRAP. For this reason, presenters are encouraged, but not obligated, to prepare slides to accompany their papers. Please also consider that the €500 award for best paper will be awarded on the basis of both the quality of the paper *and* the delivery of the presentation.
Camera-ready versions of the papers must reach Gabriele Kraft at gkraft(at)gcdh(dot)de by April 30.
The papers will not be uploaded to the GCDH/eTRAP website but, as previously announced, published as a special issue of Digital Humanities Quarterly (DHQ). For this reason, papers must be submitted in an editable format (e.g. .docx or LaTeX), not as PDF files.
A small budget for travel cost reimbursements is available.
Everybody is welcome to join in.
If anyone would like to tweet about the dialogs, the Twitter hashtag of this series is #gddh15.
For any questions, do not hesitate to contact gkraft(at)gcdh(dot)de. For further information and updates, visit http://www.gcdh.de/en/events/gottingen-dialog-digital-humanities/ or http://etrap.gcdh.de/?p=633
We look forward to seeing you in Göttingen!
The GDDH Board (in alphabetical order):
Camilla Di Biase-Dyson (Georg August University Göttingen)
Marco Büchler (Göttingen Centre for Digital Humanities)
Jens Dierkes (Göttingen eResearch Alliance)
Emily Franzini (Göttingen Centre for Digital Humanities)
Greta Franzini (Göttingen Centre for Digital Humanities)
Angelo Mario Del Grosso (ILC-CNR, Pisa, Italy)
Berenike Herrmann (Georg August University Göttingen)
Péter Király (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
Gabriele Kraft (Göttingen Centre for Digital Humanities)
Bärbel Kröger (Göttingen Academy of Sciences and Humanities)
Maria Moritz (Göttingen Centre for Digital Humanities)
Sarah Bowen Savant (Aga Khan University, London, UK)
Oliver Schmitt (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
Sree Ganesh Thotempudi (Göttingen Centre for Digital Humanities)
Jörg Wettlaufer (Göttingen Centre for Digital Humanities & Göttingen Academy of Sciences and Humanities)
Ulrike Wuttke (Göttingen Academy of Sciences and Humanities)
This event is financially supported by the German Ministry of Education and Research (No. 01UG1509).
The Library and Information Technology Association (LITA), a division of the American Library Association, seeks a dynamic, entrepreneurial, forward-thinking Executive Director.
This is a fulfilling and challenging job that affords national impact on library technologists. As the successful candidate, you will be not only organized, financially savvy, and responsive, but also comfortable with technological change, project management, community management, and organizational change.
Interested in applying? For a full description and requirements, visit http://bit.ly/LITA_ED
We will advertise for the position in April, conduct phone interviews in early May, and conduct in-person interviews with the top candidates at ALA Headquarters in Chicago, mid to late May.
Ideally, the candidate would start in June (perhaps just before ALA Annual Conference), and there would be a one-month overlap with current Executive Director Mary Taylor, who retires July 31.
- Mary Ghikas, ALA Senior Associate Executive Director
- Dan Hoppe, ALA Director of Human Resources
- Keri Cascio, ALCTS Executive Director
- Rachel Vacek, LITA President
- Thomas Dowling, LITA Vice-President
- Andromeda Yelton, LITA Director-at-Large
- Isabel Gonzalez-Smith, LITA Emerging Leader
Library of Congress: The Signal: Mapping Words: Lessons Learned From a Decade of Exploring the Geography of Text
The following is a guest post by Kalev Hannes Leetaru, Senior Fellow, George Washington University Center for Cyber & Homeland Security.
It is hard to imagine our world today without maps. Though not the first online mapping platform, the debut of Google Maps a decade ago profoundly reshaped the role of maps in everyday life, popularizing the concept of organizing information in space. When Flickr unveiled image geotagging in 2006, more than 1.2 million photos were geotagged in the first 24 hours. In August 2009, with the launch of geotagged tweets, Twitter announced that organizing posts according to location would usher in a new era of spatial serendipity, allowing users to “switch from reading the tweets of accounts you follow to reading tweets from anyone in your neighborhood or city–whether you follow them or not.”
As more and more of the world’s citizen generated information becomes natively geotagged, we increasingly think of information as being created in space and referring to space, using geography to map conversation, target information, and even understand global communicative patterns. Yet, despite the immense power of geotagging, the vast majority of the world’s information does not have native geographic metadata, especially the vast historical archives of text held by libraries. It is not that libraries do not contain spatial information, it is that their rich descriptions of location are expressed in words rather than precise mappable latitude/longitude coordinates. A geotagged tweet can be directly placed on a map, while a textual mention of “a park in Champaign, USA” in a digitized nineteenth century book requires highly specialized “fulltext geocoding” algorithms to identify, disambiguate (determine whether the mention is of Champaign, Illinois or Champaign, Ohio and which park is referred to) and convert textual descriptions of location into mappable geographic coordinates.
Building robust algorithms capable of recognizing mentions of an obscure hilltop or a small rural village anywhere on Earth requires a mixture of state-of-the-art software algorithms and artistic handling of the enormous complexities and nuances of how humans express space in writing. This is made even more difficult by assumptions of shared locality made by content like news media, the mixture of textual and visual locative cues in television, and the inherent transcription error of sources like OCR and closed captioning.
Recognizing location across languages is especially problematic. The majority of textual location mentions on Twitter are in English regardless of the language of the tweet itself. On the other hand, mapping the geography of the world’s news media across 65 languages requires multilingual geocoding that takes into account the enormous complexity of the world’s languages. For example, the extensive noun declension of Estonian means that identifying mentions of “New York” requires recognizing “New York”, “New Yorki” , “New Yorgi”, “New Yorgisse”, “New Yorgis”, “New Yorgist”, “New Yorgile”, “New Yorgil”, “New Yorgilt”, “New Yorgiks”, “New Yorgini”, “New Yorgina”, “New Yorgita”, and “New Yorgiga”. Multiplied by over 10 million recognized locations on Earth across 65 languages and one can imagine the difficulties of recognizing textual geography.
For the past decade much of my work has centered on this intersection of location and information across languages and modalities, exploring the geography of massive textual archives through the twin lenses of the locations they describe and the impact of location on the production and consumption of information. A particular emphasis of my work has lain in expanding the study of textual geography to new modalities and transitioning the field from small human studies to at-scale computational explorations.
Over the past five years my studies have included the first large-scale explorations of the textual geography of news media, social media, Wikipedia, television, academic literature, and the open web, as well as the first large-scale comparisons of geotagging versus textual description of location in citizen media and the largest work on multilingual geocoding. The remainder of this blog post will share many of the lessons I have learned from these projects and the implications and promise they hold for the future of making the geography of library holdings more broadly available in spatial form.
In the early 2000’s while an undergraduate student at the National Center for Supercomputing Applications I launched an early open cloud geocoding and GIS platform that provided a range of geospatial services through a simple web interface and cloud API. The intense interest in the platform and the incredible variety of applications that users found for the geocoding API foreshadowed the amazing creativity of the open data community in mashing up geographic APIs and data. Over the following several years I undertook numerous small-scale studies of textual geography to explore how such information could be extracted and utilized to better understand various kinds of information behavior.
Some of my early papers include a 2005 study of the geographic focus and ownership of news and websites covering climate change and carbon sequestration (PDF) that demonstrated the importance of the dual role of the geography of content and consumer. In 2006 I co-launched a service that enabled spatial search of US Government funding opportunities (PDF), including alerts of new opportunities relating to specific locations. This reinforced the importance of location in information relevance: a contract to install fire suppression sprinklers in Omaha, Nebraska is likely of little interest to a small business in Miami, Florida, yet traditional keyword search does contemplate the concept of spatial relevance.
Similarly, in 2009 I explored the impact of a news outlet’s physical location on the Drudge Report’s sourcing behavior and in 2010 examined the impact of a university’s physical location on its national news stature. These twin studies, examining the impact of physical location on news outlets and on newsmakers, emphasized the highly complex role that geography plays in mediating information access, availability, and relevance.
In Fall 2011 I published the first of what has become a half-decade series of studies expanding the application of textual geography to ever-larger and more diverse collections of material. Published in September 2011, Culturomics 2.0 was the first large-scale study of the geography of the world’s news media, identifying all mentions of location across more than 100 million news articles stretching across half a century.
A key finding was the centrality of geography to journalism: on average a location is mentioned every 200-300 words in a typical news article and this has held relatively constant for over 60 years. Another finding was that mapping the locations most closely associated with a public figure (in this case Osama Bin Laden) offers a strong estimate of that person’s actual location, while the network structure of which locations more frequently co-occur with each other yields powerful insights into perceptions of cultural and societal boundaries.
The following Spring I collaborated with supercomputer vendor SGI to conduct the first holistic exploration of the textual geography of Wikipedia. Wikipedia allows contributors to include precise latitude/longitude coordinates in articles, but because such coordinates must be manually entered in specialized code, just 4% of English-language articles had a single entry as of 2012, totaling just 1.1 million coordinates, primarily centered in the US and Western Europe. In contrast, 59% of English-language articles had at least one textual mention of a recognized location, totaling more than 80.7 million mentions of 2.8 million distinct places on Earth.
In essence, the majority of contributors to Wikipedia appear more comfortable writing the word “London” in an article than looking up its centroid latitude/longitude and entering it in specialized code. This has significant implications for how libraries leverage volunteer citizen geocoding efforts in their collections.
To explore how such information could be used to provide spatial search for large textual collections, a prototype Google Earth visualization was built to search Wikipedia’s coverage of Libya. A user could select a specific time period and instantly access a map of every location in Libya mentioned across all of Wikipedia with respect to that time period.
Finally, a YouTube video was created that visualizes world history 1800-2012 through the eyes of Wikipedia by combining the 80 million textual location mentions in the English Wikipedia with the 40 million date references to show which locations were mentioned together in an article with respect to a given year. Links were color-coded red for connections with a negative tone (usually indicating physical conflict like war) or green for connections with a positive tone.
That Fall I collaborated with SGI once again, along with social media data vendor GNIP and the University of Illinois Geospatial Information Laboratory to produce the first detailed exploration of the geography of social media, which helped popularize the geocoding and mapping of Twitter. This project produced the first live map of a natural disaster, as well as the first live map of a presidential election.
At the time, few concrete details were available regarding Twitter’s geographic footprint and the majority of social media maps focused on the small percentage of natively geotagged tweets. Twitter offered a unique opportunity to compare textual and sensor-based geographies in that 2% of tweets are geotagged with precise GPS or cellular triangulation coordinates. Coupled with the very high correlation of electricity and geotagged tweets, this offers a unique ground truth of the actual confirmed location of Twitter users to compare against different approaches to geocoding textual location cues in estimating the location of the other 98% of tweets that often have textual information about location.
A key finding was that two-thirds of those 2% of tweets that are geotagged were sent by just 1% of all users, meaning that geotagged information on Twitter is extremely skewed. Another finding was that across the world location is primarily expressed in English regardless of the language that a user tweets in and that 34% of tweets have recoverable high-resolution textual locations. From a communicative standpoint, it turns out that half of tweets are about local events and half of tweets are directed at physically nearby users versus tweeting about global events or users elsewhere in the world, suggesting that geographic proximity plays only a minor role in communication patterns on broadcast media like Twitter.
A common pattern that emerges across both Wikipedia and Twitter is that even when native geotagging is available, the vast majority of location metadata resides in textual descriptions rather than precise GIS-friendly numeric coordinates. This is the case even when geotagging is made transparent and automatic through GPS tagging on mobile devices.
In Spring 2013 I launched the GDELT Project, which extends my earlier work on the geography of the news media by offering a live metadata firehose geocoding global news media on a daily basis. That Fall I collaborated with Roger Macdonald and the Internet Archive’s Television News Archive to create the first large-scale map of the geography of television news. More than 400,000 hours of closed captioning of American television news totaling over 2.7 billion words was geocoded to produce an animated daily map of the geographic focus of television news from 2009-2013.
Closed captioning text proved to be extremely difficult to geocode. Captioning streams are in entirely uppercase letters, riddled with errors like “in two Paris of shoes” and long sequences of gibberish characters, and in some cases have a total absence of punctuation or other boundaries.
This required extensive adaptation of the geocoding algorithms to tolerate an enormous diversity of typographical errors more pathological in nature than those found in OCR’d content – approaches that were later used in creating the first live emotion-controlled television show for NBCUniversal’s Syfy channel. Newscasts also frequently rely on visual on-screen cues such as maps or text overlays for location references, and by their nature incorporate a rapid-fire sequence of highly diverse locations mentioned just sentences apart from each other, making the disambiguation process extremely complex.
In Fall 2014 I collaborated with the US Army to create the first large-scale map of the geography of academic literature and the open web, geocoding more than 21 billion words of academic literature spanning the entire contents of JSTOR, DTIC, CORE, CiteSeerX, and the Internet Archive’s 1.6 billion PDFs relating to Africa and the Middle East, as well as a second project creating the first large-scale map of human rights reports. A key focus of this project was the ability to infuse geographic search into academic literature, enabling searches like “find the five most-cited experts who publish on water conflicts with the Nuers in this area of South Sudan” and thematic maps such as a heatmap of the locations most closely associated with food insecurity.
As of spring 2015 the GDELT Project now maps the geography of an ever-growing cross-section of the global news media in realtime across 65 languages. Every 15 minutes it machine translates all global news coverage it monitors in 65 languages from Afrikaans and Albanian to Urdu and Vietnamese and applies the world’s largest multilingual geocoding system to identify all mentions of location anywhere in the world, from a capital city to a remote hilltop. Over the past several years, GDELT’s mass realtime geocoding of the world’s news media has popularized the use of large-scale automated geocoding, with disciplines from political science to journalism now experimenting with the technology and GDELT’s geocoding capabilities now lie at the heart of numerous initiatives from cataloging disaster coverage for the United Nations to mapping global conflict with the US Institute of Peace to modeling the patterns of world history.
Most recently, a forthcoming collaboration with cloud mapping platform CartoDB will enable ordinary citizens and journalists to create live interactive maps of the ideas, topics, and narratives pulsing through the global news media using GDELT. The example map below shows the geographic focus of Spanish (green), French (red), Arabic (yellow) and Chinese (blue) news media for a one hour period from 8-9AM EST on April 1, 2015, placing a colored dot at every location mentioned in the news media of each language. Ordinarily, mapping the geography of language would be an enormous technical endeavor, but by combining GDELT’s mass multilingual geocoding with CartoDB’s interactive mapping, even a non-technical user can create a map in a matter of minutes. This is a powerful example of what will become possible as libraries increasingly expose the spatial dimension of their collections in data formats that allow them to be integrated into popular mapping platforms. Imagine an amateur historian combining digitized georeferenced historical maps and geocoded nineteenth newspaper articles with modern census data to explore how a region has changed over time – these kinds of mashups would be commonplace if the vast archives of libraries were made available in spatial form.
In short, as we begin to peer into the textual holdings of our world’s libraries using massive-scale data mining algorithms like fulltext geocoding, we are for the first time able to look across our collective informational heritage to see macro-level global patterns never before visible. Geography offers a fundamental new lens through which to understand and observe those new insights, and as libraries increasingly geocode their holdings and make that material available in standard geographic open data formats, they will enable an entirely new era where libraries become conveners of information and innovation that empower a new era of access and understanding of our world.
On 2015’s Document Freedom Day, Open Knowledge Nepal organized a seminar on Open Standards at CLASS Nepal at Maitighar, Kathmandu.
We intended to pitch openness to a new audience in Nepal and help them learn documentation skills. As we could not hope to teach documentation and spreadsheets in less than a day, we utilized the cohort to teach them small bits of information and skills that they could take home and gather information about their current knowledge and pertinent needs so as to help ourselves plan future events and trainings.
The targeted audience were office bearers and representatives of labor unions in many private and government organizations in Nepal. We also invited some students of Computer Science and Information Technology (CSIT). Few of the students are core members of Open Knowledge Nepal team and have also represented us in Open Data Day 2015, Kathmandu. We invited the students to let them know about the audience they will have to work with, in days to come.
It was a lazy March afternoon in Kathmandu and participants were slowly turning in from around 2 pm. Organizers and the students had already begun with chitchats on open, tech, football and other stuffs while waiting for enough participants to begin the event formally. Participants kept coming in ones and twos until the hall was up to its limit (35+) and we started formally just after 3:00 PM (NST).
The event was started by Mr. Durga of CLASS Nepal by welcoming all participants and introducing CLASS Nepal to the participants. He then invited Mr. Lekhnath Pokhrel, representative of UNI-Global Union in the event. He requested all participants to take full advantage of seminar and announced they will be organizing useful events in coming future too. Nikesh Balami, our active member and Open Government lead followed with his presentation on “Open Knowledge, Open Data, Open Standards, and Open Formats.” He started by gathering information about participants’ organizational backgrounds. This lightened the settings as everybody opened up to each other. NIkesh introduced Open Knowledge Nepal and our activities to the hall (see the slides).
Kshitiz Khanal, Open Access lead at Open Knowledge Nepal went next. This session was intended to be an open discussion and skill dissemination on documentation and spreadsheet basics. We started by asking everybody to share their experience, set of skills and the skills they would like to learn in the event.
We were in for a surprise. While we had prepared to teach them pivot tables, our audience were interested to learn more basic skills. Most of our audience were familiar with documentation packages like Microsoft Word, some were using spreadsheets in work, and most of them had to use slides to present their work. We paired our students with our target audience so that one can teach other. Based on the requests, we decided to teach basic spreadsheet actions like sorting and filtering data, performing basic mathematical operations.
We also explained basic presentation philosophy like use pictures in place of words whenever possible, using as less words as possible, and when we do – making them big, rehearsing before presenting. These sound like obvious but these are not commonplace yet because these were not taught anywhere as a part of curriculum to our audience. This was well received. We also had a strange request – how to attach a sound recording in email. We decided to teach how to use google drive. We demonstrated how google drive can be used to store documents and the links can be used to send any type of files by email.
There were few female participants as well. This was a good turnout when compared to most of our and other tech / open events in Kathmandu with nil female participation. One of our female participant said that while she wants to learn more skills, she doesn’t have time to learn at home while taking care of her children, and at office she mostly has her hands full with work.
Most of the work in many offices is documentation, and this day and age makes strong documentation skills almost mandatory. While having freedom in the sense of document freedom entails having access to proper tools, it also necessitates having the proper set of skills to use the tools.
We learned lessons in the status and interest of people like our audience and the level of skill that we need to begin with while preparing modules for other similar events.
Part fi of Amazon crawl..
This item belongs to: data/ol_data.
This item has files of the following types: Data, Data, Metadata, Text
There are a few simple things you can configure in your .travis.yml to make your travis builds faster for ruby builds. They are oddly under-documented by travis in my opinion, so I’m noting them there.NOKOGIRI_USE_SYSTEM_LIBRARIES=true
Odds are your ruby/rails app uses nokogiri. (all Rails 4.2 apps do, as nokogiri has become a rails dependency in 4.2) Some time ago (in the past year I think?) nokogiri releases switched to building libxml and libxslt from source when you install the gem.
This takes a while. On various machines I’ve seen 30 seconds, two minutes, 5 minutes. I’m not sure how long it usually takes on travis, as travis logs don’t seem to give timing for this sort of thing, but I know I’ve been looking at the travis live web console and seen it paused on “installing nokogiri” for a while.
But you can tell nokogiri to use already-installed libxml/libxslt system libraries if you know the system already has compatible versions installed — which travis seems to — with the ENV variable `NOKOGIRI_USE_SYSTEM_LIBRARIES=true`. Although I can’t seem to find that documented anywhere by nokogiri, it’s the word on the street, and seems to be so.
You can set such in your .travis.yml thusly:env: global: - NOKOGIRI_USE_SYSTEM_LIBRARIES=true Use the new Travis architecture
Travis introduced a new architecture on their end using Docker, which is mostly invisible to you as a travis user. But the architecture is, at the moment, opt-in, at least for existing projects.
Travis plans to eventually start moving over even existing projects to the new architecture by default. You will still be able to opt-out, which you’d do mainly if your travis VM setup needed “sudo”, which you don’t have access to in the new architecture.
But in the meantime, what we want is to opt-in to the new architecture, even on an existing project. You can do that simply by adding:sudo: false
To your .travis.yml.
Why do we care? Well, travis suggests that the new architecture “promises short to nonexistent queue wait times, as well as offering better performance for most use cases.” But even more importantly for us, it lets you do bundler caching too…Bundler caching
If you’re like me, a significant portion of your travis build time is just installing all those gems. On your personal dev box, you have gems you already installed, and when they’re listed in your Gemfile.lock they just get used, the bundler/rubygems doens’t need to go reinstalling them every time.
But the travis environment normally starts with a clean slate on every build, so every build it has to go reinstalling all your gems from your Gemfile.lock.
Aha, but travis has introduced a caching feature that can cache installed gems. At first this feature was only available for paid private repos, but now it’s available for free open source repos if you are using the new travis architecture (above).
For most cases, simply add this to your .travis.yml:cache: bundler
There can be complexities in your environment which require more complex setup to get bundler caching to work, see the travis docs.Happy travis’ing
The existence of travis offering free CI builds to open source software, and with such a well-designed platform, has seriously helped open source software quality/reliability increase in leaps and bounds. I think it’s one of the things that has allowed the ruby community to deal with fairly quickly changing ruby versions, that you can CI on every commit, on multiple ruby versions even.
I love travis.
It’s odd to me that they don’t highlight some of these settings in their docs better. In general, I think travis docs have been having trouble keeping up with travis changes — travis docs are quite good as far as being written well, but seem to sometimes be missing key information, or including not quite complete or right information for current travis behavior. I can’t even imagine how much AWS CPU time all those libxml/libxslt compilations on every single travis build are costing them! I guess they’re working on turning on bundler caching by default, which will significantly reduce the number of times nokogiri gets built, once they do.
Filed under: General
All Web services that require user level authentication will be unavailable during the installation window, which is between 2:00 – 4:00 AM local time, Friday April 17th.
Thanks to everyone who entered the 2015 OCLC Research Collective Collections Tournament Bracket Competition! A quick re-cap of the rules: all entrants picked a conference. If no one chose the winning conference, then a random drawing would be held among all entrants to determine the winner of the prize. Well, that’s where we’re at! No one picked Atlantic 10 to prevail, so everyone gets another chance to win!
A random drawing was held this morning in the Tournament offices (well, here in OCLC Research). The winner of the 2015 OCLC Research Collective Collections Tournament Bracket Competition is …
Carol wins a $100 Visa Gift Card, along with the right to call herself Bracket Competition Champion! Congratulations! And thanks to all of our Bracket Competition participants for playing.
We hope you enjoyed the Collective Collections Tournament! Keep up to date with OCLC Research as we continue to use the concept of collective collections to explore a wide range of library topics.
More information:Brian Lavoie
Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.Mail | Web | LinkedIn | More Posts (14)
Will you be at the American Library Association Conference in San Francisco this June? Do you have a great new technology idea that you’d like to share informally with colleagues? How about a story related to a clever tech project that you just pulled off at your institution, successfully, or less-than-successfully?
The LITA Program Planning Committee (PPC) is now accepting proposals for a round of Lightning Talks to be given at ALA.
To submit your idea please fill out this form: http://goo.gl/4NbBY2
The lightning rounds will be Saturday June 27, 10:30 – 11:30
All presenters will be given 5 minutes to speak.
Proposals are due Monday, May 4 at midnight. Questions? Please contact PPC chair, Debra Shapiro, firstname.lastname@example.org
During the past three years, Open Knowledge has been leading the community building work in the Digitised Manuscripts to Europeana (DM2E) project, a European research project in the area of Digital Humanities led by Humboldt University. Open Knowledge activities included the organisation of a series of events such as Open Data in Cultural Heritage workshops, running two rounds of the Open Humanities Awards and the establishment of OpenGLAM as an active volunteer-led community pushing for increased openness in cultural heritage.DM2E and the Linked Open Web
As one of its core aims, the DM2E project worked on enabling libraries and archives to easily upload their digitised material into Europeana – the online portal that provides access to millions of items from a range of Europe’s leading galleries, libraries, archives and museums. In total, over 20 million manuscript pages from libraries, archives and research institutions were added during the three years of the project. In line with the Europeana Data Exchange Agreement, all contributing institutions agreed to make their metadata openly available under the Creative Commons Public Domain Dedication license (CC-0), which allows for easier reuse.
Since different providers make their data available in different formats, the DM2E consortium developed a toolset that converted metadata from a diverse range of formats into the DM2E model, an application profile of the Europeana Data Model (EDM). The developed software also allows the contextualisation and linking of this cultural heritage data sets, which makes this material suitable for use within the Linked Open Web. An example of this is the Pundit tool, which Net7 developed to enable researchers to add annotations in a digital text and link them to related texts or other resources on the net (read more).Open Knowledge achievements
Open Knowledge was responsible for the community building and dissemination work within DM2E, which, apart from promoting and documenting the project results for a wide audience, focused on promoting and raising awareness around the importance of open cultural data. The presentation below sums up the achievements made during the project period, including the establishment of OpenGLAM as a community, the organisation of the event series and the Open Humanities Awards, next to the extensive project documentation and dissemination through various channels.DM2E community building from Digitised Manuscripts to Europeana OpenGLAM
In order to realise the value of the tools developed in DM2E, as well as to truly integrate the digitised manuscripts into the Linked Data Web, there need to be enough other open resources to connect to and an active community of cultural heritage professionals and developers willing to extend and re-use the work undertaken as part of DM2E. That is why Open Knowledge set up the OpenGLAM community: a global network of people and organisations who are working to open up cultural content and data. OpenGLAM focuses on promoting and furthering free and open access to digital cultural heritage by maintaining an overview of Open Collections, providing documentation on the process and benefits of opening up cultural data, publishing regular news and blog items and organising diverse events.
Since the start in 2012, OpenGLAM has grown into a large, global, active volunteer-led community (and one of the most prominent Open Knowledge working groups to date), supported by a network of organisations such as Europeana, the Digital Public Library of America, Creative Commons and Wikimedia. Apart from the wider community taking part in the OpenGLAM discussion list, there is a focused Working Group of 17 open cultural data activists from all over the world, a high-level Advisory Board providing strategic guidance and four local groups that coordinate OpenGLAM-related activities in their specific countries. Following the end of the DM2E project, the OpenGLAM community will continue to push for openness in digital cultural heritage.Open Humanities Awards
As part of the community building efforts, Open Knowledge set up a dedicated contest awards series focused on supporting innovative projects that use open data, open content or open source tools to further teaching and research in the humanities: the Open Humanities Awards. During the two competition rounds that took place between 2013-2014, over 70 applications were received, and 5 winning projects were executed as a result, ranging from an open source Web application which allows people to annotate digitized historical maps (Maphub) to an improved search application for Wittgenstein’s digitised manuscripts (Finderapp WITTfind). Winners published their results on a regular basis through the DM2E blog and presented their findings at conferences in the field, proving that the awards served as a great way to stimulate innovative digital humanities research using open data and content. Details on all winning projects, as well as final reports on their results, are available from this final report.DM2E event series
Over the course of the project, Open Knowledge organised a total of 18 workshops, focused on promoting best practices in legal and technical aspects of opening up metadata and cultural heritage content, providing demonstration and training with the tools and platforms developed in the project and hackdays and coding sprints. Highlights included the Web as Literature conference at the British Library in 2013, the Open Humanities Hack series and the Open Data in Cultural Heritage workshops, as a result of which several local OpenGLAM groups were started up. A full list of events and their outcomes is available from this final report.
It has been a great experience being part of the DM2E consortium: following the project end, the OpenGLAM community will be sustained and build upon, so that we can realise a world in which our shared cultural heritage is open to all regardless of their background, where people are no longer passive consumers of cultural content created by an elite, but contribute, participate, create and share.More information
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Heather Terrell, MLIS degree candidate at San Jose State University, has been named the winner of the 2015 LITA/Ex Libris Student Writing Award, sponsored by Ex Libris Group and the Library and Information Technology Association (LITA), a division of the American Library Association (ALA).
Terrell’s paper, titled “Reference is dead, long live reference: electronic collections in the digital age,” describes the changing landscape of electronic reference sources and explores the possibilities inherent in building hybrid library collections.
“The members of the LITA/Ex Libris Student Writing Award Committee are pleased to acknowledge and honor with this award Heather Terrell’s manuscript, which addresses the benefits and challenges of electronic reference materials to libraries and library users,” said Sandra Barclay, chair of the committee.
The LITA/Ex Libris Student Writing Award recognizes outstanding writing on a topic in the area of libraries and information technology by a student or students enrolled in an ALA-accredited library and information studies graduate program. The winning manuscript will be published in Information Technology and Libraries (ITAL), and the winner will receive $1,000 and a certificate of merit.
The award will be presented at the LITA Awards Ceremony on Sunday, June 28, 2015 during the ALA Annual Conference in San Francisco.
Ex Libris is a leading provider of automation solutions for academic libraries. Offering the only comprehensive product suite for electronic, digital, and print materials, Ex Libris provides efficient, user-friendly products that serve the needs of libraries today and will facilitate their transition into the future. Ex Libris maintains an impressive customer base consisting of thousands of sites in more than 80 countries on six continents. For more information about Ex Libris Group visit www.exlibrisgroup.com.
Established in 1966, LITA is the leading organization reaching out across types of libraries to provide education and services for a broad membership of nearly 3,000 systems librarians, library technologists, library administrators, library schools, vendors and many others interested in leading edge technology and applications for librarians and information providers. For more information about LITA go to www.lita.org, or contact the LITA office by phone, 800-545-2433, ext. 4268; or e-mail: email@example.com
Question and Comments
Library & Information Technology Association (LITA)
(800) 545-2433 ext 4267
Maybe you thought libraries were “the netflix for books”, but in this Wired article, The ‘Netflix for Books’ Just Invaded Amazon’s Turf, it’s not libraries they’re talking about, and it’s not just Amazon’s turf they’re invading. Although they’re talking about the vendor, Oyster, starting to sell books, not just offer a subscription lending library, that’s what they mean by “Amazon’s turf.” Still, one might have thought that lending books was the “turf” of library’s, but they don’t even get a mention.
Before the existence of public libraries, paid subscription libraries were a thing, both as commercial entities and private clubs, popular in the 18th and 19th centuries. Books were comparatively expensive then compared to now.
The United States played a key role in developing public free libraries, democratizing access to published knowledge and cultural production.
It might be instructive to compare the user workflow in actually getting books onto your device of choice between Amazon and Oyster’s systems (for both lending and purchase), and the vendors and solutions typically used by libraries (OverDrive, etc). I suspect it wouldn’t look pretty for library’s offerings. The ALA has a working group trying to figure out what can be done.
Filed under: General
In the Library, With the Lead Pipe: Randall Munroe’s What If as a Test Case for Open Access in Popular Culture
Open access to scholarly research benefits not only the academic world but also the general public. Questions have been raised about the popularity of academic materials for nonacademic readers. However, when scholarly materials are available, they are also available to popularizers who can recontextualize them in unexpected and more accessible ways. Randall Munroe’s blog/comic What If uses open access scholarly and governmental documents to answer bizarre hypothetical questions submitted by his readers. His work is engaging, informative, and reaches a large audience. While members of the public may not rush to read open access scientific journals, their availability to writers like Munroe nevertheless contributes to better science education for the general public. Popularizers outside of academia benefit significantly from open access; so do their readers.Open Access and the Public Good
Open access (OA) is a longstanding and important discussion within librarianship. As Peter Suber explains, the “basic idea of OA is simple: Make research literature available online without price barriers and without most permission barriers.” For a good grounding in the basics of open access, I refer the interested reader to Suber’s book Open Access; for a quick overview of open access, see this blog post by Jill Cirasella.
Open access has many benefits, both to academics and to the wider public. The benefits to academics are obvious: authors get wider distribution of their work, researchers at institutions with small budgets have better access to scholarly materials, and, for librarians, it represents a partial solution to the serials crisis.
In this article, however, I will focus on the benefit of open access to the public. When scholarship is freely available on the Web, it is available not only to scholars, but to anyone with an internet connection, the research skills to locate these materials, and the proficiency to read them. Open access has the potential to support lifelong learning by making scholarship available to people without any current academic affiliation, whether they are professionals in a field that requires continuing education, or hobbyists fascinated by a particular subject, or just people who are interested in many things and want to keep learning. In The Access Principle: The Case for Open Access to Research and Scholarship, John Willinsky describes the value of scholarly information to several specific segments of the public, including medical patients, amateur astronomers, and amateur linguists.
Both Suber and Willinsky cite critics who argue that most members of the public are not interested in reading scholarly articles or books, that the public cannot understand this material, or even that the information could be harmful to them. Suber criticizes the presumptuous attitudes of those who would make these claims, pointing out that the public’s demand for scholarly information cannot be determined until this information is made widely available. Willinsky objects strongly to the presumptuous attitudes of those who question the ability of the public to benefit from open access:
[P]roving that the public has sufficient interest in, or capacity to understand, the results of scholarly research is not the issue. The public’s right to access of this knowledge is not something that people have to earn. It is grounded in a basic right to know.
Willinsky’s argument for the public’s moral right to access scholarly research is both stirring and compelling. This is especially true for librarians, for whom access to information is a professional value.
Open access need not rely on any demonstration that the public has met some arbitrary threshold of interest and education. Without believing in a need for such proofs, I would nevertheless like to present one case illustrating how open access can benefit the public.
The public is, by its very nature, diverse. It includes the amateur and professional users of information cited above. The public also includes popularizers who can use open access scholarly literature in unexpected ways, not only to more widely distribute the fruits of scholarly research but also to create projects of their own. By looking at the role of one such popularizer, Randall Munroe, I will question two assumptions: first, that the public is so uniformly unsophisticated, and second, that they all need to read the open access literature in order to benefit from its wide availability.What If
What If, a weekly blog that answers hypothetical questions using science and stick figure illustrations, is the work of Randall Munroe. Munroe is a former National Aeronautics and Space Administration (NASA) roboticist but is better known as a webcomic artist. His primary project is popular webcomic xkcd, which explains itself with a disclaimer:
Warning: this comic occasionally contains strong language (which may be unsuitable for children), unusual humor (which may be unsuitable for adults), and advanced mathematics (which may be unsuitable for liberal-arts majors).
It would be fair to describe xkcd as a nerdy joke-a-day comic with stick figure art, but I should point out immediately that Munroe has often used it to explain scientific concepts. Notable comics include “Lakes and Oceans,” which illustrates the depth of the Earth’s lakes and oceans in a way that gives a better idea of their scope, and “Up Goer Five,” which uses the simplest possible vocabulary to explain how the Saturn V rocket works. Munroe’s science education agenda is thus visible even in xkcd.
The connection to science education is clearer in What If, in which Munroe uses real, scientific information to provide serious answers to ridiculous hypothetical questions posed by his readership, such as:
What if everything was antimatter, EXCEPT Earth? (“Antimatter”)
What would happen if one person had all the world’s money? (“All the Money”)
At what speed would you have to drive for rain to shatter your windshield? (“Windshield Raindrops”)
Munroe answers these questions using math, science, humor, and art. He pitches his answers appropriately to a smart and curious, but not necessarily scientific, audience. In fact, several questions have been submitted by parents on behalf of their children.
A good example is the first of the questions listed above: “What if everything was antimatter, EXCEPT Earth?” In about 500 words, Munroe covers the proportion of matter to antimatter in the universe, the solar wind, Earth’s magnetic field, the effect of space dust on the Earth’s atmosphere, and the dangers of large asteroids. This sounds like a lot of information, but with Monroe’s straightforward style and amusing illustrations, it is easy to read and understand.
So, What If is humorous and silly, but the questions are taken seriously, and in fact provide real scientific information. Having read this particular post, we know more not only about the prevalence of matter and antimatter, but also about the Earth, asteroids, and more.
What If is extremely popular. In 2014, Munroe published a book including some of the questions he’d answered in the blog along with some others which he felt deserved fuller attention. The book, the #6 overall bestseller on Amazon as of December 10, 2014, has been successful in reaching a large audience. While bestseller status is not necessarily an indicator of the book’s value, it does suggest that a high level of public awareness of this work.Sourcing Information for Hypothetical Questions
As a guest on National Public Radio’s Science Friday, Munroe explained that What If is driven partially by his own desire to know the answers to the questions that people send him. He likes hypothetical questions, partly because “they’re fun,” but also:
A lot of the time it ends up taking you into exploring a bunch of weird questions or subjects in the real or practical science that you might not have thought about. (Munroe, “Interview”)
Through What If, Munroe uses research to explore questions and information sources. Munroe delves into many different types of sources in order to answer these questions.
Munroe’s sophistication as an information user manifests itself in his use of a wide variety of sources to answer many different kinds of questions. He uses Wikipedia as a starting point and YouTube as a useful source of visualizations, but he’s clearly familiar with a wide variety of ways to search the web and kinds of sources available there. He uses specialized web tools like Wolfram Alpha, a “knowledge engine” built to perform computations and provide controlled web searching. He takes advantage of online educational materials for the clarity with which they explain basic concepts and present mathematical formulae. He consults commercial catalogs to get the specifications on various products—unsurprising behavior for a former engineer! He consults blogs and enthusiast resources, such as amateur aviation and auto-repair sites, where there is a large and knowledgeable fan community. Amid this landscape, academic sources certainly have a place. They provide detailed information and a look at ongoing research, as I’ll discuss further below. Munroe’s frequent use of articles in ResearchGate and arXiv suggests that these repositories are also among his favorite sites.
Munroe’s teasing links to conspiracy sites also hint that he is well aware of the need to evaluate information for accuracy and confident in his ability to do so. He makes an effort to link to high-quality sites, although he has on one occasion (“All The Money”) admitted defeat (when trying to find the angle of repose for coins) and resorted to linking to a message board posting. Still, he carefully considers the information he uses; even when using a fairly standard resource like Google Maps, he looks carefully at the route it recommends. In “Letter to Mom,” he notes with surprise that Google Maps does not take advantage of the Buffalo Valley Rail Trail as a walking route and jokingly suggests it may be haunted. He also acknowledges other kinds of gaps in the information that’s available. His investigation into the amount of data storage available at Google (“Google’s Datacenters on Punch Cards”) works around the fact that Google does not disclose this information by looking into the cost of their data centers and the power that they consume.
In short, throughout What If, Munroe displays a high awareness of the information landscape and a strong ability to find, interpret, and appropriately deploy information, even though his information needs may be highly unorthodox.Sources Used in What If
Since links serve as citations in the world of the web, I have gone through the entire run of the blog, which included 120 posts as of December 10, 2014, and analyzed the links.
This is an informal analysis; I examined and coded each entry but I have not done any validity tests on the categories. This chart is intended only to give an at-a-glance idea of the general types of sources Munroe consults.
Academic Sources include scholarly journal articles, books, and online course-related materials such as textbooks and slides.
News, Blogs, and Associations includes a wider variety of sources, but what they have in common is that they are written not for professionals or academics. Rather, they address either the general public or a specialized, non-professional community. Here I include news reports, blogs by experts, hobby sites, and so on.
Reference Sources comprise popular online reference sources, mostly Wikipedia but also the International Movie Database (IMDB) and similar sources.
Government and Commercial Documents often present analysis and scientific or technical information. NASA is the biggest source here, with many documents written by engineers.
Data and Images include charts, datasets, results from the online search engine/calculator Wolfram Alpha, videos, and so on.
Self Citation links lead back to other What If posts, to xckd, or to Munroe’s blog.
Other includes links to social networks, other webcomics, company front pages, and so on.
Sophisticated Use of Popular Online Information
Not all the sources Munroe uses are scholarly in nature. Of the source types listed above, three of them — Academic Sources, News Sources (etc.), and Government and Commercial Documents — might provide experimental or analytical information about the phenomenon of interest. This accounts for about half of Munroe’s citations. The remainder serve other purposes, such as reference, demonstration, or humor.
Munroe’s use of sources, including nonscholarly sources, demonstrates his sophistication and understanding of the internet. “Popular Reference Sources” is the largest category other than the three mentioned above; this category is dominated by Wikipedia, the most commonly-cited source in What If. Wikipedia is a commonly reviled source in academic contexts, but Monroe uses it in an appropriate and knowledgeable way.
Munroe understands that Wikipedia is a reference source, and generally points to it when introducing concepts with which his readers may not be familiar. In the antimatter example discussed above, Munroe links to the pages on Baryon asymmetry and CP symmetry when discussing the prevalence of matter and antimatter in the universe. By linking to these pages, he avoids unnecessarily introducing technical jargon into the main text of his article but still invites his readers to learn more about it. Most of his uses of Wikipedia are similarly definitional. Occasionally, they are playful, as in “Balloon Car”, where he breaks the word “catenary” into two links, one to the entry for “cat” and the other for “domestic canary.” Note that this moment communicates something about Munroe’s expectations for his audience; they are of course perfectly capable of both recognizing the joke and searching Wikipedia for the correct term (“catenary”) themselves. The links, then, are only a courtesy to readers. Notably, Wikipedia is not cited in the print book.
In fact, Munroe’s expectation that Wikipedia is a major part of his readers’ information landscape is so strong that he occasionally inserts the Wikipedia tag “” into his articles in an ironic, jokey way, when he takes for granted something that appears obvious. My personal favorite is in post “One-Second Day”, in which he remarks that the Earth rotates, inserts a “” tag and links to the famous conspiracy site, “Time Cube.”
His use of other popular online sources is similar; a good example is YouTube, to which he frequently links when he needs visual aids. In “Extreme Boating”, he links to several videos showing reactions with different substances through which he proposes rowing a boat.Academic/Analytical Sources
Munroe is an information omnivore who constantly and intentionally mixes popular and scholarly, humorous and serious. Although he uses Wikipedia heavily for background information, he turns to deeper sources when more precise analysis is needed. His sources for this work include academic journal articles, and also government and commercial documents with scientific or technical content. However, the academic articles are of particular interest in a discussion of open access.
The post about antimatter is a good example. In it, Munroe’s links to Wikipedia links to Wikipedia are used to establish the basic concepts relevant to the question. Later in the post, questions come up that scientists still disagree about; here is where books and articles begin to be cited. The antimatter question leads to a discussion of just how much antimatter is in the universe and whether, for instance, antimatter galaxies could exist; this question is addressed with one scientific article that shows this has not yet been observed and another that proposes a new telescope to further examine the question. In other posts, many other questions are examined using similar sources. “Burning Pollen” cites a chemistry paper explaining the reaction between Diet Coke and Mentos in order to explain oxidation. “Laser Umbrella” cites several scientific articles about vaporizing liquids using lasers, as this question has often been studied. In “Speed Bump,” Munroe is working on a question about the fastest speed at which one can survive going over a speed bump, so an article in a medical journal about spinal injuries from speed bumps is useful.
As noted above, academic articles are not Munroe’s only source of scientific information. Articles from government agencies, particularly NASA, often serve a similar purpose. Munroe also often links to books, either by linking to a book’s record in WorldCat or Amazon, or by using Google Books to link a specific page, often one with a diagram or graph. What If also includes a few links (specifically, twenty-five of them) to educational materials such as class sites, lecture slides, and online textbooks.
For statistics and other kinds of quantitative information, Munroe often turns to other sites. Some of the government documents provide this sort of information, as do commercial entities such as rope manufacturers, cargo transporters, and so on. What If includes citations to data safety sheets and international standards, most notably in “Lake Tea,” which needs to cite standards for several different types of tea in order to answer a question about the strength of the brew made from dumping all the tea in the world into the Great Lakes. He uses Wolfram Alpha, the “computational knowledge engine” for calculations and conversions and Google Maps for locations and distances.Contributions of Amateurs and Journalists
Finally, popular sources also have a place in What If. Munroe often links to news, professional and hobby associations, and blogs, both those produced by passionate amateurs and those used by professionals to connect to a lay audience. These include the New York Times and Slate, but also the popular Bad Astronomy blog, a visualization blog known as Data Pointed, aviation history enthusiast sites, and a linguistics blog by scholars at the University of Pennsylvania. In most cases, these are used because they provide specific, current information by knowledgeable people.
Thus, academic journals do not have a monopoly on useful scientific information. However, at 13% of all links, they comprise a substantial portion of Munroe’s research.Open Access in What If
Munroe is aware of the open access movement; he has illustrated the available amount of open access literature (“The Rise of Open Access”).
As of December 10, 2014, Munroe had referenced 100 academic articles in What If, and about 72 of them can be considered open access because their full text is freely available to the public in one way or another.
Within the Open Access movement, authors often refer to two ways to achieve open access—the “gold road” and the “green road.” Gold open access is provided by open access publishers who make their content freely available rather than using paywalls and subscriptions. Green open access is achieved when authors publicly archive their content online, with the permission of their publishers.1
For the purposes of this pie chart, anything that Munroe has linked from a repository or an author’s page is considered green open access, and anything linked from the journal’s website is considered gold open access. Because I am attempting to capture the perspective of a reader interested in a particular article rather than that of a publisher or librarian, I am ignoring some nuances important to open access advocates. In particular, I am counting all open access articles that are available through the publishers’ sites as “gold,” even including those which are available via hybrid models The hybrid model, in which subscription journals make some articles available to the public, contingent on author fees, does not support all the goals of the open access movement. However, it does make content available to readers within the journal itself so from a reader’s point of view, it makes sense to classify these articles as gold open access.
“Gold” and “green” open access were used about equally in What If (34% and 38%, respectively). “Gold” open access included some links to very well-known open access publications such as PLOS One, but also a wide variety of other journals and some conference proceedings. The “green” open access links were to repositories; arXiv, the open access repository of physics papers, appeared frequently, as did academic social networks like ResearchGate and Academia.edu, and of course, many university repositories and faculty websites. Munroe occasionally links to articles that are not freely accessible, including some from major publishers such as Nature, Springer, and Elsevier. For these articles, only the abstracts are available. These comprise 23% of the academic articles cited. This is a substantial proportion of all academic articles, but much smaller than the proportion of open access materials.Why Open Access Matters to What If
Although Munroe occasionally links to an article that is not freely accessible, open access articles are preferable for obvious reasons. Munroe is a professional cartoonist, not an academic, so his profession does not automatically grant him access to subscription resources. Moreover, he cannot assume that his readers have access to any given closed-access resource. If Munroe succeeds in inspiring in his readers the kind of curiosity about the world that characterizes his own work, they will need resources that they are actually able to access. Open access is thus important to both the success and the quality of What If.
What If is an example of what can be achieved when information, and scholarly information in particular, is made readily available outside of academia. While Munroe depends on information from a variety of sources, the information he gleans from open access academic works is especially important because it connects him directly to the science.
Imagine a non-academic freelancer attempting to write a weekly column like What If in an environment in which all or most scholarly information is available only by subscription. Without academic affiliation, it is very difficult to obtain scholarly material in the quantity in which it is used in What If. To pay the access fee for each article needed would soon become prohibitive. Most current scholarly materials are not held in public libraries, many public libraries limit or charge for their interlibrary loans, and waiting for articles to arrive could affect the weekly publication schedule. Under such circumstances, it is not surprising that popularizers in the past have tended to be either academics or journalists, two professions which grant their practitioners access to information.
What If is driven by Munroe’s wide-ranging curiosity and that of his readers. What If began with the questions that xkcd readers sent to him; he found he too was curious about the answers. Because of the time he spent researching these questions, he decided to write them up and post them on his website. This is suggestive of the way that being an audience member sometimes works on the internet: Munroe’s readers felt sufficiently connected to him to send him these questions, and he felt sufficiently interested in the questions to research and respond to them. The ability to answer the questions to his satisfaction depends on the availability to both Munroe and to his readers of reliable information.The Readership of What If
Munroe writes What If for a general audience. However, he believes that whether his audience is general or technical, “the principles of explaining things clearly are the same. The only real difference is which things you can assume they already know…” (“Not So Absurd”). Munroe expresses skepticism toward assumptions about general knowledge:
[R]eal people are complicated and busy, and don’t need me thinking of them as featureless objects and assigning them homework. Not everyone needs to know calculus, Python or how opinion polling works. Maybe more of them should, but it feels a little condescending to assume I know who those people are. I just do my best to make the stuff I’m talking about interesting; the rest is up to them. (“Not So Absurd”)
Munroe resists the idea that his audience needs to learn how to do the things that he knows how to do, like using good estimation techniques to understand the size of a number. Instead, he states that “the rest is up to them.” What If uses links according to this principle; anyone can understand the articles without reference to their sources, but the sources are nevertheless available for their reference.
The nature of What If as a question-and-answer site ensures that Munroe is always addressing at least one of his readers directly. Linking his sources, then, becomes part of the answer. Munroe does not simply dispense answers, rather, he encourages his readers to see where the information is coming from. Occasionally, he even makes comments on the things that he links, for example: “a positively stunning firsthand account” (“Visit Every State”), “one of the worst scanned pdfs I’ve ever seen” (“Enforced by Radar”), “a wonderful chart” (“Star Sand”), and so on. In one case (“All the Money”), he links to a book in Google Books and refers to a specific page so that a reader can find the information that he used. Like any citation, these links make it possible for a reader to consult the author’s sources. To a non-academic audience, however, citations are a meaningless gesture if they are not to open access resources. Thus, open access resources are important not only for Munroe to access his sources, but also so that he can share them with his readers. This attitude–that readers should be able to access cited sources in a click–contrasts strongly with that of open access critics who claim there is little public interest in scholarly works. Although in most cases it is not clear how many readers click through, YouTube videos linked from What If do see increased views; many commenters on such videos indicate that they arrived via links from What If.Why What If Matters for Open Access
Why does it matter what resources are available to the author of a silly blog with stick figure illustrations? Although What If contributes to ongoing science education, the stakes are lower than they are for some of the other things that can be accomplished with open access, such as providing education and medical information to rural, underfunded, or poor areas.
I want to be clear that the purpose of open access is not only to benefit those who are highly educated, famous, and have a large platform of their own. I must also acknowledge that, as a white man on the Internet, Munroe’s path to popularizing scientific information is far smoother than that of others who do not share his privilege. However, I still think What If matters for open access, for several reasons.
First, scholarly information is sneaking into popular culture. What If shows how scholarly information can be relevant to people in their daily lives, even if they only use it to amuse themselves by thinking about unlikely scenarios. This increases the reach of scholarly research and contributes to public science education. There is interest in this information beyond a scholarly or professional context.
In the fields most of interest to Munroe, mathematics, physics, astronomy and earth and environmental sciences, open access has increased faster than most other fields (Bjork et al). Munroe relies on open access in a way that many humanities popularizers like Idea Channel’s Mike Rugnetta do not. However, as open access in the humanities increases, I hope to see projects that make use of it in interesting ways.
Second, What If has a very large audience. As of December 10, 2014, the book based on the blog was the #6 bestselling title on Amazon. Although many readers may never consult the book’s sources, they still benefit from their availability through the existence of What If. Munroe’s role here is that of a popularizer; he reads the scholarly literature that is relevant to his writing and produces something more accessible for the public. What If joins a host of science blogs in recontextualizing science for a different audience (Luzón). Popularizers have been writing since long before the beginning of the open access movement, but open access can make it much easier for popularizers to succeed, especially those who live outside of academia or journalism.
Additionally, although many readers might not click through the links in a What If entry to read the scholarly research that Munroe cites, those who are interested have the ability and the access to do so. What If mediates the information with accessibility and clarity, but because it exists as a born-digital work and because most of the links are to open access materials, readers are invited to examine Munroe’s sources.
Finally, Munroe and his readership stand as an example of a sophisticated, curious, and playful public who, although they may not be members of the scholarly community, have a strong interest in the work that is produced there.Conclusion
To read What If requires the playfulness to put serious academic work to a silly purpose, the curiosity to learn about the universe in new and unusual contexts, and the sophistication to understand the larger information landscape from which all this proceeds. What If’s readers have the ability to understand the Wikipedia jokes, to have a basic awareness of the existence of both high- and low-quality information on the internet, and to integrate scholarly concepts into this larger landscape.
One of the most intriguing aspects of What If is its repurposing of scholarly information in ways unlikely to occur to more traditional popularizers with an explicitly educational mission. What If is not the work of an academic trying to produce more accessible information for the public; rather, it is the work of one member of the public putting academic work to use in a way that is meaningful for his audience. Munroe’s work draws on scholarly research, but it is markedly different from anything that we would expect to find in an academic context.
Given the success of What If, it is clear that there is a readership for unexpected reuses of scholarly information. Without open access, What If could not exist. As open access expands and the public finds its way to materials it did not previously have available, what other intriguing projects might we see?
Thanks to Hugh Rundle and Jill Cirasella, who pushed me to think through the things that were messy and unfinished in this article, and who asked lots of good, difficult questions. David Williams helped me with the images and Kelly Blanchat answered a copyright question for me. Thanks also to Steven Ovadia, for arranging the 2014 Grace-Ellen McCrann Lectures, at which an early version of this paper was originally presented, and to everyone who encouraged me to turn that presentation into an article.References and Further Reading
Bjork, B. C., Laakso, M. Welling, P. & Pateau, P. “Anatomy of Green Open Access.” Journal of the American Society for Information Science and Technology. 64.2 (2014): 237-250. doi: 10.1002/asi.22963. Web. Preprint available at <http://www.openaccesspublishing.org/apc8/Personal%20VersionGreenOa.pdf>
Cirasella, Jill. “Open Access to Scholarly Articles: The Very Basics.” Open Access @ CUNY [blog]. May 18, 2011. Web. <http://openaccess.commons.gc.cuny.edu/about-open-access/>
–. Interview with Ira Flatow. Science Friday, NPR. 5 Sep. 2014. Web. <http://www.sciencefriday.com/segment/09/05/2014/randall-munroe-asks-what-if.html>
–. “Randall Munroe of xkcd Answers Our (Not So Absurd) Questions.” [Interview by Walt Hickey.] Five ThirtyEight, September 2, 2014. Web. <http://fivethirtyeight.com/datalab/xkcd-randall-munroe-qanda-what-if/>
–. “The Rise of Open Access.” Science 342.6154 (2013): 58-59. Web. doi: 10.1126/sciencef.342.6154.58
–. What If. Web. <http://what-if.xkcd.com>
–. xkcd. Web. <http://xkcd.com>
Luzón, María José. “Public Communication of Science in Blogs: Recontextualizing Scientific Discourse for a Diversified Audience.” Written Communication 30.4 (2013): 428-457.
Panitch, Judith and Sarah Michalak. “The Serials Crisis: A White Paper for the UNC-Chapel Hill Scholarly Communications Convocation.” Scholarly Communications in a Digital World: A Convocation. January 27-28, 2005. Chapel Hill, North Carolina. Web. <http://www.unc.edu/scholcomdig/whitepapers/panitch-michalak.html>
Suber, Peter. Open Access. Cambridge, MA: MIT Press, 2012. Web. <http://mitpress.mit.edu/books/open-access>
Willinksy, John. The Access Principle: The Case for Open Access to Research and Scholarship. Cambridge, MA: MIT Press, 2012. Web. <http://mitpress.mit.edu/books/access-principle>
- Many journals allow self-archiving by default, either immediately or after an embargo period. Other journals may agree to allow self-archiving after negotiation with the author. In some cases, journals allow authors to keep their copyright, so their permission is not required.