How do you feel about 40,000 square feet full of laser cutters, acetylene torches, screen presses, and sewing machines? Or community-based STEAM programming for kids? Or lightsabers?
If these sound great, you should register for the LITA “Makerspaces: Inspiration and Action” tour at Midwinter! We’ll whisk you off to Somerville for tours, nuts and bolts information on running makerspace programs for kids and adults, Q&A, and hands-on activities at two great makerspaces.A workspace at Artisan’s. (“HoaT2012: Boston, July-2012” by Mitch Altman; https://www.flickr.com/photos/maltman23/7641851700/ ; CC BY-SA)
Artisan’s Asylum is one of the country’s premier makerspaces. In addition to the laser cutters, sewing machines, and numerous other tools, they rent workspaces to artists, offer a diverse and extensive set of public classes, and are familiar with the growing importance of makerspaces to librarians.My kid made her fabulous Halloween costume at Parts & Crafts this year and I am definitely not at all biased. (Photo by the author.)
Parts & Crafts is a neighborhood gem: a makerspace for kids that runs camp, afterschool, weekend, and homeschooling programs. With a knowledgeable staff, a great collection of STEAM supplies, and a philosophy of supporting self-directed creativity and learning, they do work that’s instantly applicable to libraries everywhere. We’ll tour their spaces, learn the nuts and bolts of maker programming for kids and adults, and maybe even build some lightsabers.What tools can you use? (“Parts and Crafts, kids makerspace” by Nick Normal; https://www.flickr.com/photos/nicknormal/16441241633/; CC BY-NC-ND)
Parts & Crafts is also home to the Somerville Tool Library (as seen on BoingBoing). Want to circulate bike tools or belt sanders, hedge trimmers or hand trucks? They’ll be on hand to tell you how they do it.
I’ll be there; I hope you will be, too! .<figcaption class="wp-caption-text">Let’s all fly to Boston! (Untitled photograph by Clarence Risher; https://www.flickr.com/photos/sparr0/6871774914/in/album-72157629681164147/; CC BY-SA)</figcaption>
Lucidworks is happy to announce that several of our connectors for indexing content from Hadoop to Solr are now open source.
We have six of them, with support for Spark, Hive, Pig, HBase, Storm and HDFS, all available in Github. All of them work with Solr 5.x, and include options for Kerberos-secured environments if required.HDFS for Solr
This is a job jar for Hadoop which uses MapReduce to prepare content for indexing and push documents to Solr. It supports Solr running in standalone mode or SolrCloud mode.
It can connect to standard Hadoop HDFS or MapR’s MapR-FS.
A key feature of this connector is the ingest mapper, which converts content from various original formats to Solr-ready documents. CSV files, ZIP archives, SequenceFiles, and WARC are supported. Grok and regular expressions can be also be used to parse content. If there are others you’d like to see, let us know!
Repo address: https://github.com/LucidWorks/hadoop-solr.Hive for Solr
This is a Hive SerDe which can index content from a Hive table to Solr or read content from Solr to populate a Hive table.
Repo address: https://github.com/LucidWorks/hive-solr.Pig for Solr
These are Pig Functions which can output the result of a Pig script to Solr (standalone or SolrCloud).
Repo address: https://github.com/LucidWorks/pig-solr.HBase Indexer
The hbase-indexer is a service which uses the HBase replication feature to intercept content streaming to HBase and replicate it to a Solr index.
Our work is a fork of an NGDATA project, but updated for Solr 5.x and HBase 1.1. It also supports HBase 0.98 with Solr 5.x. (Note, HBase versions earlier than 0.98 have not been tested to work with our changes.)
We’re going to contribute this back, but while we get that patch together, you can use our code with Solr 5.x.
Repo address: https://github.com/LucidWorks/hbase-indexer.Storm for Solr
My colleague Tim Potter developed this integration, and discussed it back in May 2015 in the blog post Integrating Storm and Solr. This is an SDK to develop Storm topologies that index content to Solr.
As an SDK, it includes a test framework and tools to help you prepare your topology for use in a production cluster. The README has a nice example using Twitter which can be adapted for your own use case.
Repo address: https://github.com/LucidWorks/storm-solr.Spark for Solr
Another Tim Potter project that we released in August 2015, discussed in the blog post Solr as an Apache Spark SQL DataSource. Again, this is an SDK for developing Spark applications, including a test framework and a detailed example that uses Twitter.
Repo address: https://github.com/LucidWorks/spark-solr.
Image from book cover for Jean de Brunhoff’s “Babar and Father Christmas“.
Your Speech Is Packed With Misunderstood, Unconscious Messages, by Julie Sedivy:
Since disfluencies show that a speaker is thinking carefully about what she is about to say, they provide useful information to listeners, cueing them to focus attention on upcoming content that’s likely to be meaty. […]
Experiments with ums or uhs spliced in or out of speech show that when words are preceded by disfluencies, listeners recognize them faster and remember them more accurately. In some cases, disfluencies allow listeners to make useful predictions about what they’re about to hear. In one study, for example, listeners correctly inferred that speakers’ stumbles meant that they were describing complicated conglomerations of shapes rather than to simple single shapes.
Disfluencies can also improve our comprehension of longer pieces of content. Psychologists Scott Fraundorf and Duane Watson tinkered with recordings of a speaker’s retellings of passages from Alice’s Adventures in Wonderland and compared how well listeners remembered versions that were purged of all disfluencies as opposed to ones that contained an average number of ums and uhs (about two instances out of every 100 words). They found that hearers remembered plot points better after listening to the disfluent versions, with enhanced memory apparent even for plot points that weren’t preceded by a disfluency. Stripping a speech of ums and uhs, as Toastmasters are intent on doing, appears to be doing listeners no favors.
We’re pleased to announce that registration for DPLAfest 2016 — taking place on April 14-15 in Washington, DC — has officially opened. We invite all those interested from the general public, public and research libraries, cultural organizations, the educational community, state and local government, the creative community, publishers, the technology sector, and private industry to join us for conversation and community building as we celebrate our third year of bringing together our nation’s collections. Area institutions serving as co-hosts include the National Archives and Records Administration, the Library of Congress, and the Smithsonian Institution.About
Taking place in the heart of DC, DPLAfest 2016 will bring together hundreds from DPLA’s large and growing community for interactive workshops, engaging discussions with community leaders and practitioners, hackathons and other collaborative activities, fun events, and more. DPLAfest 2016 will appeal to anyone interested in libraries, technology, ebooks, education, creative reuse of cultural materials, law, open access, genealogy/family research, and more.Agenda
We are currently seeking session proposals for DPLAfest 2016. We will be posting a full set of activities and programming for DPLAfest 2016 in March. Click here for additional information about the schedule. To review topics and themes from previous fests, check out the agendas from DPLAfest 2015 or 2013. The deadline to submit a session proposal is Friday, January 22, 2016. Click here to review submission terms and submit a session proposal.Logistics
For logistical information about DPLAfest, including event locations and recommended hotels in the DC area, visit the logistics page.Contact
Should you have any questions, please do not hesitate to reach out to us at email@example.com. We look forward to seeing you in DC!
Ariadne Magazine: Review of: Kristin Briney, Data Management for Researchers. Organize, maintain and share your data for research success.
Gareth Cole, the Research Data Manager at Loughborough University Library, reviews the book Data Management for Researchers. Organize, maintain and share your data for research success by Kristin Briney.
Kristin Briney, Data Management for Researchers. Organize, maintain and share your data for research success (Exeter, UK: Pelagic Publishing, paperback edition, 2015) ISBN-13: 978-1784270117
Researchers (particularly those in the University sector) have a lot of demands placed upon them. The process of actually doing research has changed over the last few decades and one of the latest changes is an expectation that researchers actively manage the research data they use and create as part of their work. Read more about Review of: Kristin Briney, Data Management for Researchers. Organize, maintain and share your data for research success.
Gareth ColeOrganisations: Issue number: Authors: Article type: Date published: Thu, 12/17/201575http://www.ariadne.ac.uk/issue75/cole
The open source software Hydra is, by its name and nature, modular and complex. Using this technology gives the University of Michigan the opportunity to participate in the development of an increasingly-adopted suite of tools with the flexibility to accommodate a host of needs and engage in the spirit and philosophy of open source software development. With open source, we must concern ourselves not just with our own institution’s needs and priorities, but those of a broader community.
Lucidworks CEO Will Hayes latest Forbes columns looks at the ways scammers take advantage of the big holes in big data to prey on all of us:
“The immense amount of data we expose about ourselves make it incredibly easy to get targeted. … These profiles make it easier than ever for up-to-no-gooders to target us — they know exactly where our personal insecurities are and they can tailor attacks in ways that are perfectly suited for their victims. Here’s a look at seven common human insecurities and how scammers attempt to take advantage.
1. Money: You’ve likely seen scams like this: “Earn $800 a week just sitting home and filling out surveys!” Scammers promise quick money for little effort, and all you have to do is pay the “low” price of $34.95 to access the survey database that probably doesn’t even exist. Another common scam is job listings that promise employment from government sources. These fraudulent postings lure people into giving away personal and financial data with the promise of getting a stable, well-paid job.”
Read the other six deadly sins of scamming…
The post Data Security and Human Insecurities: How Scammers Take Advantage appeared first on Lucidworks.com.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week:
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Late last night, Congress announced its proposed FY16 budget agreement detailing funding levels for the Federal government through September 2016, which includes increases for key library programs and allows implementation of the Federal Communications Commission Open Internet Order to go forward. House Speaker Ryan also sneaked in a cybersecurity rider that ALA strongly opposes. In an action alert, we call on ALA members to urge their Members of Congress to oppose this provision.
Congress is expected to vote on the 2,000+ page $1.1 trillion spending Omnibus funding measure by week’s end. The current Continuing Resolution – necessary to keep the government open past the October 1 Fiscal Year start – expires today but will likely be extended several days to allow Congressional debate and votes on the spending and tax package. Congressional negotiators included $650 billion in tax breaks in addition to the $1.1 trillion spending provisions. Final passage of the Omnibus package is expected, though strong opposition from some fiscal hawks is anticipated.
Funding for Library Services and Technology Act (LSTA) will be increased in FY16 to $183 million, an increase over the FY15 level of $181 million. The President’s budget request to Congress called for a larger increase, while House and Senate committee bills recommended only minimal increases over FY15 levels. Grants to States will receive a FY16 boost to $155.8 million ($154.8 million in FY15). Funding for Native American Library Services is raised slightly to $4.1 million, up from $3.9 million. National Leadership for Libraries is raised to $13.1 million, up from $12.2 million. Laura Bush 21st Century Librarian funding will be level funded at $10 million.
Overall funding for Institute of Museum and Library Services will be given a slight increase over FY15 levels to $230 million, up from $227.8 million. The $230 million level is a compromise number between the President’s request ($237.4 million) and House/Senate recommended levels ($227.8 million).
Funding for school libraries received an increase of $2 million which raises the Innovative Approaches to Literacy (IAL) program from $25 million in FY15 to $27 million in FY 16. IAL funding reserves half of its funding for school libraries.
Much of the appropriations discussions focused less on funding levels but on policy riders addressing controversial issues such as abortion, refugees, energy, and gun control and research. A threatened policy rider – opposed strongly by ALA – that would have prohibited the FCC from implementing its Open Internet Order, failed to overcome strong opposition and was not included in the final spending package. Once again, funding for E-rate will not be delayed as Congress extended the Anti-Deficiency Act exemption through 2017. ALA urged Congress to include this exemption.
The post Library funding receives boosts in compromise budget agreement appeared first on District Dispatch.
Tell us about your library job. What do you love about it?
I work at the University of Nevada, Las Vegas Lied Library Digital Collections. I am the Workflow Manager for the Nevada Digital Newspaper Project, part of the National Historic Newspaper Project, a joint effort between the Library of Congress and the National Endowment for the Humanities. I have been a part of the Digital Collections team for a couple of years. Every year I learn something new about the work I am doing. I love my job because of the people I work with. I also love that I have the freedom to observe different aspects of the digitization process for many of our collections. At times I assist with managing the metadata of the different collections. We are currently utilizing TemaTres Controlled Vocabulary server to manage, publish, and share the ontologies and taxonomies we use in our collections. I am also learning more about linked data.
Where do you see yourself going from here, career-wise?
I really enjoy being a project manager and working in academic institutions. I like the idea of making photographs and other historical items digitally accessible to students, faculty, and the community. I think it would be great working for an academic institution where I am allowed to manage and create digital collections, whether with an institutional repository or within a special collections library.
Why did you apply to be an Emerging Leader? What are you most excited about?
I applied because I’ve been very fortunate to have a fantastic mentor in my supervisor, Cory Lampert. She took me on as a volunteer intern and then helped me get hired as the Digital Projects Manager at Nevada State College with an IMLS grant funded oral history project. Then she brought me back for the newspaper project. From this experience, I’ve learned the value of working with true collaborators. I’m excited to build on this experience on the national level as an Emerging Leader.
I am Navajo and lived in Shiprock, New Mexico, on the Navajo Nation until I was 24. Like many others, I moved away because of a lack of job opportunities. I hope that in some way my being an Emerging Leader could inspire others from a similar background.
What are your favorite things to do when you’re not working?
I like thrift store shopping, gaming, traveling, photographing abandoned buildings, and going to dinner with friends — but not cooking!
Loyal District Dispatch readers know that, literally for years, ALA and a strong coalition of groups and companies from across the political spectrum have been fighting privacy-unfriendly “cybersecurity,” aka “information” sharing, legislation most recently unveiled as the Cybersecurity Information Sharing Act (S. 754). CISA was meant merely to incentivize companies like internet service providers to share hints of “cyber-threats” with the government by shielding them from liability for doing so. The bill has been consistently and rightly criticized, however, for seriously compromising all of our personal privacy and for creating de facto new surveillance programs for the NSA and FBI.
(Already heard enough? Click here to take action.)
Until very recently, those serious defects and strong grassroots efforts by ALA and many others kept CISA and its legislative predecessors from passing. In recent weeks, however, the Chairs of the powerful House and Senate Committees on Intelligence and Homeland Security secretly negotiated a compromise version of their several “information sharing” bills that the White House signaled it could approve if passed.
Late last night, that language — now as bad or worse than it’s ever been from a privacy perspective – was slipped as a “rider” by Speaker of the House Paul Ryan into the 2000+ page “omnibus” spending bill that Congress must pass to avoid a government shutdown. A vote on the omnibus is slated to take place Thursday just before Congress leaves town for the holidays.
Librarians and other civil liberties organizations may lose this fight, but we needn’t and shouldn’t go quietly! Join ALA President Sari Feldman in protesting this undemocratic deal.
The odds are long and time is tight, so tweets and emails are sweet. Please, click here to send one to your Member of Congress. It’s already set to ask him or her to tell Speaker Ryan that middle-of-the-night deals that give the NSA new surveillance tools have no place in the omnibus, and that it’s not too late for the Speaker to #StopCISA .
Your voice matters. Get mad and get LOUD, right now!
The post Almost, but not too, late to tell @SpeakerRyan to #StopCISA appeared first on District Dispatch.
Hi all, please indulge my inner geek as I take a little break from the normal discussion and have some fun re-imagining Star Wars.
If you’re like me, you’ve been thinking a lot about Star Wars as the new film debuts this weekend. Perhaps you even sat down and started watching the old films in preparation for the next installment.
Did you feel Lucas and Co. called it in with Return of the Jedi?
While not the galactic-scale train wreck of the Prequels, ROTJ always felt like a poor way to wrap up the Skywalker family tragedy.
Here’s my take on what was wrong and how I’d fix Return of the Jedi…and I’m not talking cosmetic fixes. I think to really do justice to Episodes IV and V, a new story line with whole new reveals and twists would have been in order.What needs fixing
- Leia should not have been Luke’s sister. We all agree: the suggestions (and actual acts) of romantic intrigue between Luke and Leia should have disqualified this plot twist from the start.
- Ewoks were some of the least interesting, and one of the most annoying alien species of the entire story line.
- The Han Rescue Mission was overly complicated, took up too much of the film and did nothing to move the story along
- A New Death Star was a boring setting. Seriously, they couldn’t think of anything else?
There are two cliff hangers that need resolving in ROTJ left over from Empire:
1. Han needs rescuing
2. Luke needs to verify if Vader is truly his father or not
But before we handle Han’s rescue, since this obviously needs to get resolved so we can get back to the Skywalker family story, we need to set up the finale of the film with an opening scene. In the new opening crawl, we learn that Vader has been granted his wish to pursue his son, while the Emperor focuses on a new super weapon that will spell certain peril for the Rebellion. We also learn about how the Rebellion plans to retrieve Solo so that they can get back to fighting the Empire.
The story opens with the Emperor arriving on his personal Star Destroyer, scaring the crap out of everyone on board. He announces to the captain, “We have a new weapon that will be housed aboard this ship. Your crew will be expected to follow strict protocols of secrecy. Any deviation from them will call for extreme disciplinary action, captain.”
“As you command, my Lord.” Gulp!
We then transition to Luke back on Dagobah confronting Yoda and Obi Wan over the reveal from Empire that Vader is his father. As in the original, Yoda will be dying and confirm Vader is Luke’s father and finishes his last breath with: “There is another Skywalker.” Only this time, he adds: “Heed the lesson in the cave…” Also as in the original film Obi Wan’s ghost explains the reasons for withholding the information and explains that the other Skywalker was Luke’s twin sister. But Obi Wan does not know who she is or where she is since, for security reasons, that information was not provided to him, but he felt that she was likely raised on Alderaan. So she may be dead.
After Yoda passes, Luke darkly departs for the Han Rescue operation.
And now on to Han’s rescue. First off, if you’re Leia and in command of a kickass Rebel Alliance who owes much to Han Solo, you certainly aren’t going to waste time with a ragtag, risky, undersized rescue effort. You’re gonna use an army to get your man.
And, of course, the Empire is gonna know this is what you’re going to do. Ah, the plot thickens.
So, in my revision, the rescue effort quickly returns us to the Rebel vs. Empire battle but also shows off how far Luke has come in his training (and then some). The plan goes as follows: Luke, Lando and Chewbacca will lead a ninja-like raid into Jabba’s palace with massive backup led by Leia poised to support him if things go south. Luke leads the stealth incursion into the palace, past a sleeping Jabba, neutralizes Jabba’s palace guard and locates Solo (who is no longer encased in carbonite but in prison and perfectly ready to fight his way out with Luke’s help). Solo quips something incredulous like: “Kid, this ain’t no Imperial base. You think you can just waltz in and waltz out?” Luke: “I’ve learned a few new tricks.”
Luke, Lando and Chewie nearly free Han, but then at the last minute as they’re nearing the exit, out walks Vader, Boba Fett and a battalion of storm troopers. Vader is holding Jabba’s head in his hands and flings it into the room: “You forgot to bid farewell to your host.”
Outside, Leia is realizing something is wrong just as Imperial troops engage the rebel position outside the palace.
Back inside, Luke and Vader duel while they continue their conversation about that little Father thingy Vader dropped last time they met.
Meanwhile, Chewie is badly wounded by Boba Fett. Lando and Han do a good job fighting back against the storm troopers but Lando is cut off from Han and Luke when Han blasts the door blocking off the storm troopers. “Sorry, old buddy,” Han says comically. Han then goes after Boba Fett. The two tussle hand-to-hand with Boba pulling out all kinds of nasty surprises from his suit. But Han anticipates them all: “Boba, don’t you have nothin’ I didn’t teach you first?” Eventually, Han kills Boba and then checks on Chewie who is bad off but alive. They call Leia and let her know Vader is inside the palace.
The Rebels are now aware that they have an opportunity to kill Vader and move aggressively to cut off the palace.
Back in the fight, Vader then shows us the power of the Dark Side once more and uses his strangulation technique on Han with one hand, while fighting off Luke with the other. “Only through the Dark Side can you save him,” Vader insists. Luke is enraged as Chewie cries out for Han. Vader snaps Han’s neck and lets his body fall to the floor. Luke begins to let his anger take control.
Leia’s forces sweep into Jabba’s palace, but are quickly matched by the Imperial forces hiding within. It’s a pitched battle and in the end, Leia is also captured and Vader threatens to kill her too if Luke does not surrender to him.
Defeatedly: “Very well, but I will never turn,” he says to Vader.
“Perhaps you already have,” Vader asserts.
There is a tense standoff as Luke and Vader leave the planet, leaving Chewie and Leia to mourn over Han’s body.
Back among the main Rebel fleet, Leia interrogates captured Imperial officers from the fight on Tattoine and learns about some ominous new weapon the Emperor is prepping in orbit above Couroscant. They fear a new Death Star is in the works and decide that an all out attack on the Emperor is their only hope. Leia is accused of letting her anger cloud her judgement and that perhaps she is putting her personal friendships before wise strategy. But in the end, she convinces the rebels that it’s all or nothing.
Aboard Vader’s Star Destroyer, Luke communes with Obi Wan and Yoda. They warn him that the Emperor should not be underestimated. Luke must remember his training and resist. Luke says: “The Dark Side seems impossible to resist,” To this Yoda says: “Your friends. Remember the strength of the bonds you have to them. You will not find such bonds on the Dark Side. Only servitude and sorrow. Remember this, you must!”
The rebel fleet prepares for their final assault. A wounded Chewbacca with a robotic leg growls angrily as Leia details the plan. The Rebels have learned that the Emperor himself is overseeing the construction of the super weapon, which their intelligence tells them is housed on a specially outfitted Star Destroyer to avoid detection. They have also learned that Luke Skywalker is being held aboard that ship.
The goal is to attack the Emperor’s palace on the Capital to divert Imperial forces from their real target: the super weapon on the Star Destroyer. Leia and a smaller force will take two large cruisers, one attached to the other piggyback fashion. The lower cruiser will collide into the Star Destroyer and begin driving it into the atmosphere. X-wing fighters will take out any escape pods that might hold the Emperor. When the Star Destroyer is hopelessly falling into the atmosphere, the reard Rebel Cruiser will detach from the other and take the rebel crew to safety. During this, Leia has a secret mission with the droids, Chewbacca, Lando and herself to get Luke. If they fail in time, they will perish with the Star Destroyer.
The Emperor finally makes an appearance as Vader brings Luke to the throne room of the Star Destroyer. We are treated to much the same dialogue of the original script. But this is interrupted when it is announced that a Rebel fleet has just come out of hyperspace above the Capital.
The rebel fleet bursts just above the atmosphere and begins firing large ion cannon weapons onto the surface while also engaging Star Destroyers and (what the hell) orbital battle stations that look like miniature Death Stars intended for taking out large spacecraft.
The Emperor mocks the attack as “pitiful” (with a little spittle shooting out of his maw) and cackles in delight. “Soon, you will turn and join us in executing the final destruction of the Rebel Alliance.”
Luke replies: “The only thing I will destroy is you, your Highness!” and uses the Force to retrieve his light saber from the Emperor’s chair. As in the original film, Vader and Luke then go at it. I really liked this part of the film and I wouldn’t change much here. Except as Luke grows angrier and angrier and ultimately defeats Vader, he turns on the Emperor.
The Emperor: “You foolish boy. Nothing can stop the inevitable rise of the Dark Side over the Galaxy. Your friends’ assault on the capital is misguided as is your faith in them. And yourself.” The Emperor rises from his chair menacingly. “The Empire is now in possession of the ultimate power in the Universe and I intend to use it to wipe out the Rebel Alliance in short order.” A door opens and out walks a young woman in black Sith clothes.
“Darth Tera, meet your brother.”
The two fight in a dead even match with the Emperor clearly enjoying the fight. Luke does his best to convince Tera to join him and destroy the Emperor, talking about how the Dark Side nearly seduced him too. But that it doesn’t have to be that way. “There is good in our family. I can feel it.”
Tera is clearly troubled by this, but these mixed feelings just make her go wild with rage.
Vader, at the Emperor’s feet is clearly surprised. “You never told me there was another.”
“You have failed, Lord Vader. Now observe the true power of the Dark Side as it conquers the Light.”
Suddenly, an alarm sounds and the ship rocks as Leia’s ship strikes dead center into the Star Destroyer, it’s engine blasting the ship into the atmosphere.
Leia, Lando, Chewie and the droids blast their way between the ships and begin their rescue mission of Luke.
Luke senses this and begins using telepathy with Leia like he did on Cloud City in Episode V. She follows his directions, fighting their way against impossible odds. Fortunately, most of the crew are abandoning ship in life pods and shuttles which are picked off by the Rebel fighters.
Now knowing that Leia and the others are on the way, Luke’s confidence grows. “I can save you,” he says to Tera. And then to Vader: “I can save both of you. You can be free of the Emperor. You can return as Jedi.” This clearly has an effect on the Sith Skywalkers.
The ship growns as it begins hitting the atmosphere. The Emperor has had enough.
“You are both pathetic,” he scorns. Then, he goes after Luke with his lightning power lecturing all of them on the power of the Dark Side.
Vader is the first to rise up to Luke’s defense. At first Tera tries to stop her father, but Vader pushes her aside and she hesitates. The emperor blasts Vader and they tumble together over the precipice in a more revealing struggle that is terribly violent and makes you sympathetic for Vader. We do see them crash on the floor of the tunnel, with Luke and Tera looking over the side at their bodies.
Leia and company blast their way into the room with Luke and Tera standing over the precipice. “Luke!” she screams. “There’s still time to escape! Come on!”
Luke looks at Tera. “Come with me.”
“It’s too late for me.”
“That is the Dark Side speaking. But the Light offers hope.”
The heroes all run out of the room and make their escape to the rebel ship, which detaches from the front cruiser and joins the rebel forces at their rendezvous point. There is discussion about the victory over the Emperor and hints that Planets are putting their fear aside to join the alliance. “We will build a New Republic,” Luke says.
“It is time for the Galaxy to heal,” he adds, turning to his sister, now in white robes.
In the Library, With the Lead Pipe: Say what? Exploring “The most interesting place in the city” – the comments section of online news articles
Online commenting culture can be intriguing. Do people comment on news articles about libraries? What do they say? These are the questions that led us to study public comments in response to news articles about libraries in major U.S. newspapers. Newspaper articles were selected for analysis based on their relevance to libraries and the number of comments the articles received.
We wondered: Does the public see us as a “growing organism” or a stagnant, out-of-date dinosaur? A content analysis of these comments provides a snapshot of public opinion and perception. Listening in on online conversations about libraries can provide insight into how the public views libraries and library services. People are taking the time to engage in online conversation about libraries, and librarians can learn from these discussions, either through passive or active participation.Introduction
Despite popular warnings of “don’t read the comments!” and “don’t feed the trolls!” we are fascinated by the culture of online commenting and find ourselves drawn to the comments sections of various online venues: news articles, YouTube videos, blog posts, and other user generated content. We decided to pursue this interest as a multi-stage research project related to libraries. We initially focused on academic libraries via U.S. higher education periodicals (Hanson & Adams, 2014, April), then shifted to broader library topics found in U.S. newspapers (Hanson & Adams, 2014, November). Despite trolling and other bad behavior, we were optimistic we would find useful conversations in the comments section that could provide insights for libraries. The Pew Internet & American Life Project report on How Americans Value Public Libraries in Their Communities also inspired us to investigate public opinion about libraries as seen in online comments.
We started by identifying what types of unsolicited comments were being made about libraries without the motivation of surveys and incentives, as well as which topics generated the most interest among readers. The comments revealed what commenters like and dislike about libraries. We offer a sneak peek of what to expect from possible proponents and detractors through representative sample comments. Hopefully, our research can help readers prepare plans for advocacy and marketing on behalf of their libraries.Literature Review
One of the studies that motivated our current project was the Pew Internet & American Life Project report on How Americans Value Public Libraries in Their Communities. According to this 2013 report,
Americans strongly value the role of public libraries in their communities, both for providing access to materials and resources and for promoting literacy and improving the overall quality of life. Most Americans say they have only had positive experiences at public libraries, and value a range of library resources and services. (Zickuhr, Rainie, Purcell, & Duggan, p. 1)
We were curious whether unprompted comments made by self-selecting commenters would share similar sentiments to those found in the Pew research study. The Pew Research Center followed up with a report on Libraries at the Crossroads in 2015 (Horrigan), and we use the findings from that report as a point of comparison.
Studying user comments on news articles is a compelling sphere of research. Comments can potentially reach the same readers as the articles they are associated with, and because of the flipped role from reader to content contributor, the ability to comment provides a forum for dialogue. Commenting functionality is an equalizer—anyone can comment, and one doesn’t need special resources, wealth, or power to post comments. It enables readers to submit their own viewpoints, then discuss and deliberate with other commenters using the original article as the springboard or framework (Springer, Engelmann, & Pfaffinger, 2015).
Multiple studies have found that online comments affect readers’ understanding of article content (Anderson et al., 2013; Felder, 2014; Lee, 2012). Felder details the effect comments can play in shaping perceptions of news sites’ quality, concluding that sites must moderate or limit public comments for the benefit of site traffic and discourse (2014). Exhortations against reading the comments section are grounded in sound advice. Suler (2004) explains that people express themselves online in ways that they never would in real life interactions. The extreme form of this behavior, referred to as trolling, is characterized by gleeful destruction or disruption. One study has found a correlation between trolling and the Dark Tetrad of personality: narcissism, sadism, psychopathy, and Machiavellianism (Buckels, Trapnell, & Paulhus, 2014).
In addition to their potential for disruption, online comments have been found by some researchers to be frequently irrelevant to the topic at hand (Edgerly, Vraga, Dalrymple, Macafee, & Fung, 2013; Reagle, 2015). Eliminating anonymity and moderating comments can improve both the quality and relevance of comment discourse (Reagle, 2015).Methods
Using circulation figures from the Alliance of Audited Media, we identified U.S. newspapers that had a large digital circulation (as of March 2013). Of those top newspapers, we narrowed the list of publications for this project based on whether the online version was searchable, allowed for date range limiting, and had publicly accessible comments made by readers. These criteria trimmed our list to the top three digitally circulated newspapers, and we added a fourth to provide local relevance to our regional library community. The newspapers included in this study are: The New York Times, The Wall Street Journal, USA Today, and SFGate (the website of the San Francisco Chronicle).
In each publication selected, we searched for the terms [library OR libraries] and limited the date range to July 2013 through July 2014. Our search limits yielded a total of 129 articles, 54 of which had comments (see Appendix A: https://goo.gl/gUaHQ2). 75 of the articles from our initial search results generated zero comments. 55 of these did not allow comments, and 20 did allow comments but had none. For our analysis, we included opinion pieces and blog articles on newspapers’ sites as well as standard news articles because they contribute to the dialog about libraries. We use “article” as an umbrella term.
Our sample consisted of 693 comments from a subset of 51 articles (see Appendix B: https://goo.gl/NMOIxZ). Three articles generated over 200 comments (see Figure 1), which we excluded from our analysis. Because the articles represented in our sample elicited an average of 29 comments, we were concerned that including articles with a disproportionately high number of comments would skew the overall topic coverage. For example, the fourth-highest-commented-upon article garnered 158 comments. This article about Obama’s presidential library elicited many comments that focused on the preservation of history and floated that topic above others.
We manually copied article metadata into a spreadsheet and recorded the type of library discussed in each article. Next, we copied individual comments with their associated metadata. Using a content analysis framework, defined by (Babbie, 2007) as “the study of recorded human communications” (p. 320), we analyzed 693 comments. To develop a categorization system, we read the first 100 comments together to identify topic areas and establish consistency in our application of the topics. Then, we divided the remaining comments to apply topics individually, conferring with each other periodically.
We developed three questions with which to analyze the comments:
- What do people want from a library?
- What do people value about libraries?
- What library services can people do without?
The first question examined what commenters wrote about services or materials they want libraries to provide now or in the future. We coded these as “desire.” Regarding the second question, we coded comments with “value” when the commenter was aware that the library is already doing something, and they expressed appreciation. In response to the third question, when comments conveyed dissatisfaction with libraries in some way, we coded these as “doesn’t value.”
Online comments mirror the ephemeral nature of the Internet; comments are added at different times, and although most comments are made during an initial spark of interest after the publication date of an article, more may be added at a later date. Depending on the commenting policy of the publication, comments may be removed by the editors. Comments may also be removed by the commenters themselves, and unlike the articles, there will be no record of “corrections” or “errata” for comments. If a publication transitions to a new commenting platform (from Disqus to Facebook, for example), past comments could be lost. Because of these factors, our research project focuses on a snapshot in time. We examined comments made on articles during a specific time period, and those comments may or may not continue to exist in their original location in the future.
Due to the self-selecting nature of commenting culture, this project is not intended to encapsulate public opinion as a whole. We cannot claim that the comments we analyzed are representative of the general public, nor can we say that they are comprehensive. Instead, we present results based on an existing public set of metadata produced by a group of motivated readers.Results & Discussion
Among the 54 news articles with comments, the predominant focus was on topics related to public libraries (40 articles, or 74%). A few articles also discussed other library types, such as special libraries (5 articles, or 9%), international libraries (4 articles, or 7%), academic libraries (3 articles, or 6%), and school libraries (3 articles, or 6%). Some articles mentioned more than one type of library. Public libraries are the most publicly visible of library types, as they are open to all members of a community (compared to special, academic, or school libraries, which allow entrance to specific patrons). It seems natural to us that news article coverage of libraries would focus on public libraries. Due to the nature of most libraries being tied to a physical location, articles tended to highlight regional issues pertaining to their local public libraries.
Of the 54 articles that had comments, four of them garnered more than 100 comments each, indicating topics that generated high interest among readers between July 2013 and July 2014 (Figure 1).
The article that received the most comments, at a total of 368, was from SFGate in March 2014 explaining the San Francisco Public Library’s new Patron Code of Conduct (Knight, 2014). The second highest-commented article generated 283 comments in The New York Times, and detailed the decision by the New York Public Library in May 2014 to scrap its controversial plan to renovate one of their locations (Pogrebin, 2014). An article from October 2013 on SFGate received 230 comments in response to coverage of the arrest at the Glen Park library branch of the “mastermind” behind the online shopping site Silk Road (Lee, 2013). The fourth highest-commented article was published in The New York Times in February 2014 about plans for the Obama presidential library, which generated 158 comments (Rybczynski, 2014). For a list of all articles with the associated number of comments, see Appendix A: https://goo.gl/gUaHQ2. At the other end of the spectrum, 20 of the articles in our initial search results allowed comments but had none.
Although trolling is rampant in many online forums, we didn’t encounter much incivility in the comments we analyzed. We attribute this to the mediated comment platforms used by online news publications, which maintain community discussion policies. Many publications make use of third-party commenting platforms (such as Disqus, Viafoura, or Livefyre), which require users to log in with an account. Some publications employ Facebook or Google+ as commenting platforms, which not only require a login but also attach commenters’ actual identities to their shared opinions.
Many of the comments we analyzed did not directly address the coverage of the article. This is a common finding among studies which examine online comments (Edgerly, Vraga, Dalrymple, Macafee, & Fung, 2013; Reagle, 2015). Of the comments that were relevant to the article topic about libraries, most expressed positive sentiment toward libraries (see Figure 2). It is likely that some of the positive comments were written by library staff. In a few instances, commenters self-identified as librarians, library staff, library board members, retired librarians or friends and family of librarians. However, it was impossible to accurately identify the affiliation of all commenters.Most Prevalent Comment Topics
We identified 22 topic categories discussed in the comments (see Appendix C: https://goo.gl/fsIBxJ), and the following five topics were most prevalent in the comments. Free access to information was the most discussed topic followed, in descending order, by physical collections, preservation of history, impact on community, and library as place (Figure 2).
In the following sections, we share comments that represent the most common sentiments or arguments made in each topic category. These can provide information to librarians who are making a case for a particular area of their libraries. These comments demonstrate the thoughts of the public from both sides of each topic. For the purposes of the following sections we grouped comments coded as “desire” and comments coded as “value” into a general positive category compared to negative comments that express dissatisfaction with libraries, or “doesn’t value”. We did this because “desire” or “value” comments indicated that commenters still believe in the importance of libraries. It is important to note that the number of positive comments far outweighed the number of negative comments. In the sections below, we provide at least one positive comment and one negative comment, so that our readers can get a sense of what the commenters expressed both positively and negatively, but this one-to-one ratio does not imply that the negative comments were of equal weight to the positive. In the following comment samples, we have maintained the original spelling and grammar.Free Access To Information
As one commenter notes, “Since Carnegie, libraries are in cahoots to inform the public for free.”1 Libraries are commonly recognized as bastions of intellectual freedom and continuing education, and many comments reflected this core library mission. We categorized these comments under “free access to information.” In this category we labeled comments which addressed issues of public access to a wide variety of balanced information and protection from censorship.
We identified the following comment as representative of common sentiment related to the value of free access to information provided by libraries. The commenter defends the need for libraries as providers of quality information. This comment was made to an article in the Wall Street Journal entitled “Do People Need Libraries in the Digital Age?” (Farley, 2014).
the answer is yes we do need libraries in the digital age because libraries is like to the heart and soul of accsssing information. Information obtained from libraries is sometime more safer, legit and solid as compared to information on the internet (comment by user kendallsingh)
On the other hand, seven commenters don’t value intellectual freedom and would prefer to have only content in libraries that they feel is safe for their children, such as this comment from the Wall Street Journal in response to the article “Furor Erupts as Singapore Library Pulls Children’s Books Over ‘Family Values’” (Wong, 2014). Since this article was specifically about public libraries withdrawing two titles featuring same-sex couples, the comments expressing concern hone in on that topic, but librarians can expect similar push-back related to collections which reflected opinions that patrons may find challenging to their own belief systems.
I bring my children to the libraries, and NLB has to ensure that the books my children are exposed to do not go against my religious beliefs and family values. I am seeing rather aggressive defending of the LGBT rights, so aggressive as to attack religion and the very definition of marriage and family. This kind of fighting makes me even more worried and want to protect my children. (comment by user Christine)
A couple of other detractors don’t value the library as a source of free access to information, because they feel the Internet provides enough information. Overall, commenters recognize providing free access to quality information as a primary function of libraries that continues today despite the proliferation of online access to information.Physical Collections
“I want to ‘feel’ a book”2 is a familiar refrain that librarians frequently hear. The comments section of online newspaper articles is no different, and we found similar expressions in our sample. We categorized comments with the topic of “physical collections” whenever reference was made to tangible items in the library such as print books or other physical materials.
This comment indicates value for physical collections in response to an article on SFGate about a new Berkeley Public Library branch that was built for energy efficiency (Baker, 2013). It illustrates the sentiment of support for why new library branches are still important.
As to all the talk of why we need libraries: the analog experience of looking at a collection of books still trumps the incidental nature of searching via the internet, in my opinion. Not every book is available on-line, nor do people always have the money for purchasing books/magazines on line. (comment by user rktrix)
On the flip side, a comment in response to the Wall Street Journal article, “Do People Need Libraries in the Digital Age,” offers the opposite sentiment regarding the need for public libraries (Farley, 2014). This comment echoes the three negative comments in this category that basically state libraries have been superseded by technology.
The days of the public library are numbered. ebooks are easy, and efficient means of securing quality reading material. Small local public libraries are expense to run, have limited titles, and are turning into public internet cafes. To those you like that kind of thing I say fine. Join a club. My property taxes should not be financing your coffee clutch, reading time, porn surfing. (comment by user HBealeJr)
This comment reflects the misinformed opinions of a vocal few. Obviously, e-books (and their associated platforms and devices) aren’t free, but this commenter is ignorant of the necessary infrastructure required to support library services.
However, we should not be disheartened because the majority of comments expressed the high value most people place on checking out books, conducting genealogical research in library archives, and using library materials.Preservation of History
We applied the “preservation of history” topic when commenters discussed archives or other means of preserving knowledge for future generations. While there were comments that reflected the topic of preservation of history from multiple articles, the two illustrative comments that we share below are in reaction to the same New York Times article. It was an opinion piece about President Obama’s future presidential library, and it generated 25 of the 38 comments on this topic. The opinion of the article’s author was that President Obama’s presidential library should be small rather than grandiose (Rybczynski, 2014). Many comments reveal the political perspective of the commenters, but the two sample comments below, one positive and one negative, demonstrate the commenters’ views on the library as a preserver of history.
On the positive side, the comment below agrees with 35 of the commenters in expressing the importance of libraries to maintain historical records for future research.
As time travels on, these libraries are great educators for people that come after. (comment by Midwesterner)
The following commenter clearly rejects the idea of comprehensive preservation of historical materials. Only three of the comments categorized with this topic share this commenter’s opinion.
Go small, Mr. President, indeed. Glad to learn that these monstrous libraries of his predecessors are maintained with private funds. Nixon’s tape were and still are a lot of fun but, after his resignation based on one of them, the rest are quite redundant, though. (comment by user Ladislav Nemecl)
The majority of commenters understand the crucial role libraries play in maintaining and providing access to historical and archival information, such as presidential artifacts stored in national repositories. We must note, however, that this comment topic appears here predominantly because of the snapshot of time in which our research occurred. This article touched an emotional and political nerve that resulted in a remarkably high number of comments.Impact on Community
Whenever comments addressed how the library affects people and/or the socio-economic health within a specific geographic proximity to a library, we tagged them as “impact on community.”
Several comments convey supportive responses to the efforts libraries make in local communities to provide a “third space” for community members. For example, the following comment represents these positive sentiments in response to an article on SFGate about a new branch library with sustainable construction (Baker, 2013).
Plus, libraries are great community spaces, providing a quiet space for reading, learning, research, and stories for the little ones. (comment by user rktrix)
Although there were few commenters who explicitly express they don’t value the positive impact libraries have on local communities, these commenters feel library funding should either be focused exclusively on books or done away with altogether. The following comment was made in response to the Wall Street Journal article, “Do People Need Libraries in the Digital Age” (Farley, 2014).
Library staff sometimes acts too much like bureaucrats, looking to expand/redefine their services without buy-in from the community. If you want a place that is about social services (Teen Center, Senior Center, Community Center, etc.) that is fine – but that is not a library and that is not what the public perceives it is funding as a library. Before taking on those new roles and shifting funding from books, it is city staff responsibility to get explicit buy-in from the people, not just let things slide through as under-the-radar budget line items that keep growing and growing… (comment by user Library Realist)
The results of the Pew report Libraries at the Crossroads echo our findings in this category. According to their recent survey, a “two-thirds (65%) of all of those 16 and older say that closing their local public library would have a major impact on their community” (Horrigan, 2015, p. 10). From our analysis, when commenters discuss issues related to community benefits from local libraries, they evince a strong appreciation for the “extra” programming and services that libraries provide beyond circulating book collections. These benefits are sometimes explicitly detailed by the commenters and are sometimes expressed as a positive sentiment toward the intangible ideals that libraries represent. The negative responses typically indicate a lack of familiarity with modern public libraries on the part of the commenter, and reveal that the commenter had most likely not stepped foot in a library recently.Library as Place
Those of us who work in libraries obviously care about our physical spaces. We applied “library as place” to comments in which the commenter discussed library settings – whether to read, attend events, or its architectural design. One comment in response to the Wall Street Journal article, “Do People Need Libraries in the Digital Age,” (Farley, 2014) provides a descriptive image of what one specific library means to that commenter, but a similar sentiment is echoed by many commenters.
A few times per week, I will leave the office, turn my phone off and head for our public library. An hour in library…with ancient and modern authors, is the best respite in the world. You can’t be relaxed chasing Google. (comment by user John)
On the other hand, not all commenters appreciate a physical library building. In response to an article on SFGate regarding the opening of a brand new branch library in San Francisco’s North Beach neighborhood (Lagos, 2014), one commenter writes:
Glad all that dough went to fix up libraries… Most people can Google for anything they’d find in a reference library and read books downloaded to their Kindles or iPads…Next, let’s spend millions of taxpayer dollars fixing up public telephone booths. Welcome back to the 20th Century. (comment by user SuaveDuck)
Funding for library renovations or new construction is a topic that comes up frequently in local communities. As with any issue related to money, it can be fraught with controversy. The comments represented in our analysis demonstrate support for maintaining and upgrading physical buildings, which meshes well with the findings from the Pew report Libraries at the Crossroads. The Pew report found that “nearly two-thirds (64%) of those ages 16 and over say libraries should ‘definitely’ have more comfortable spaces for reading, working and relaxing. This represents a modest increase since 2012, and it suggests that libraries still occupy a prominent spot in people’s minds as a place to go” (Horrigan, 2015, p. 5).Takeaways
Comments sections of news articles remain wildly interesting to us, running the gamut from humor to snark to insight. Comments reflect popular opinions by self-motivated readers of online news articles and can be a source of ideas for advocacy. The majority of comments we reviewed were positive and appreciative of the services and spaces libraries offer, while the negative comments revealed a lack of awareness of the innovation taking place in libraries across the United States.
Our content analysis of comments revealed these prominent themes: free access to information, physical collections, preservation of history, impact on community, and library as place, among others. Within each of these themes, we noticed trends in the vocabulary used by commenters. We offer here a sampling of the dominant language with the goal that librarians can piggyback on these arguments in favor of supporting library services and funding. In addition, librarians can address misinformed negative opinions.
When discussing free access to information, commenters mention issues of balanced, uncensored collections which represent a wide variety of viewpoints. Commenters are interested in having libraries provide online, 24/7 access to public documents and information. Another frequent trend related to egalitarian access to information, and commenters express the desire that information not just be limited to economically advantaged citizens. Many participants in the discussion point out that public libraries provide access to content to those who don’t have other means and wouldn’t be able to afford their own e-book readers and Amazon downloads.
In comments related to physical collections, discussions touch on access to unique and rare treasures not available online, and concerns about having enough space for expanding collections. Commenters note that not everything is online, and there were even a few nods to interlibrary loan! Several people also mentioned that print books can be a long-lasting “technology” which doesn’t become obsolete.
Preservation of history emerged as a category because many commenters use language which described preserving information for future generations, including the acknowledgement of national milestones. People communicate a desire for digitization of physical materials for universal access, but commenters also are concerned about issues of data migration when storage technologies evolve (from tape to cd-roms, DVDs, etc). They also point out that the physical preservation of historical artifacts and documents is essential for original research.
We noticed language trends that indicate commenters think of libraries as giving their communities an economic and moral boost that can help revive struggling communities and provide technological innovation. Many commenters see value in providing community and learning experiences in communal spaces. People describe libraries as cultural and intellectual centers of society, and a common good that benefits all segments of society.
Many commenters profess their love of libraries with descriptions of specific library spaces, calling libraries monuments and architectural showpieces with large open spaces to enjoy or with nooks and crannies for studying and reading. Additional descriptive words we enjoyed reading included: masterpiece, welcoming, destination, bright and cheerful, fabulous creation, gorgeous, grand. People mention that library spaces indicate the values of the city or community for literacy and learning. Some are concerned about “exiling books” to off-site storage or converting to e-collections, expressing a desire to get lost in the stacks. Several commenters convey support for costs associated with maintaining existing library buildings as well as new construction.
Although most of the comments indicated support for libraries, there were some comments that were ill-natured. Some detractors described libraries as sanctuaries for criminals which leads to a circus atmosphere. These comments were mostly made in response to articles on SFGate which focused on library behavior policies, and revealed commenters’ concerns and fears about the homeless population.
The following is one of three comments we encountered using the specific term “dinosaur,” indicating that the commenters don’t believe libraries are evolving effectively. This comment was made in response to an article on SFGate about a photography exhibit presenting public library buildings from throughout the contiguous United States (Whiting, 2014).
Libraries are dinosaurs. They should all be turned into community centers or sold off. (comment by user Evil_Bert)
We noticed some commenters not understanding what today’s libraries offer and describing expectations for services that are, in fact, already in existence. The previous comment is one example, revealing a lack of awareness of library innovation and hub of community services. Many libraries offer assistance with literacy, tutoring, job searches, entrepreneurship, and myriad other offerings. The San Francisco Public Library is one example of a library system addressing the needs of homeless patrons by employing a social worker. A few commenters think that libraries can be replaced by e-books, but of course libraries are already providing access to millions of e-book titles, and offer circulating e-book readers.
In several places, other commenters stand up for the library in response to the snarky comments. The following example comes from the comment section of the Farley article (2014), which asked “Do People Need Libraries in the Digital Age?”
It’s always so obvious when someone who doesn’t use a library comments on library collections and services…Libraries provide access to not just paper books but also ebooks and especially forms of e-content and new technology. Libraries provide spaces and tools for people to not just consume information but also produce it. All of this is …. ready for this? … for free. Get with the program before you deem to assume what libraries have and provide. As long as there is information, in any form, and as long as people need to access it, there will be a place for libraries. (comment by user Anonymous)
Taking a cue from these online library champions, library professionals can use the positive sentiment to help bring those with the dissenting opinion around to see the value that libraries have for many in the community. Librarians can address the knowledge gaps of the general public, getting the word out that libraries are doing all the things. There are a variety of ways to raise awareness about libraries services, including participating in online comment areas, spreading the word on social media platforms, leading discussions in communities, and publicizing library innovations through more traditional marketing avenues to reach non-library users. Mining the language of library cheerleaders provides touchpoints to shape fruitful conversations with community leaders, members of the public, and administrators. Advance awareness of arguments by detractors can empower librarians to strengthen their messaging and improve external perceptions.
We would like to express our extreme gratitude to our external reviewer Amy Hofer, internal reviewer Annie Pho, publishing editor Erin Dorney, and reader Michele Van Hoeck. We have valued their insightful and constructive feedback and enjoyed the collaborative and responsive open peer review process.Works Cited
Alliance for Audited Media. (2013). Top 25 U.S. Newspapers for March 2013. Retrieved from http://auditedmedia.com/news/research-and-data/top-25-us-newspapers-for-march-2013.aspx
Anderson, A. A. et al. (2013). The ‘‘nasty effect:’’ Online incivility and risk perceptions of emerging technologies. Journal of Computer-Mediated Communication, 19(3), 373-387. http://dx.doi.org/10.1111/jcc4.12009.
Baker, D. R. (2013, December 29). Berkeley library branch a ‘zero net energy’ building. SF Gate. Retrieved from http://www.sfgate.com/default/article/Berkeley-library-branch-a-zero-net-energy-5100368.php
Buckels, E., Trapnell, P. and Paulhus, D. (2014) Trolls just want to have fun. Personality and Individual Differences, (Corrected Proof). http://dx.doi.org/10.1016/j.paid.2014.01.016.
Edgerly, S., Vraga, E. K., Dalrymple, K. E., Macafee, T., & Fung, T. K. F. (2013). Directing the dialogue: The relationship between YouTube videos and the comments they spur. Journal of Information Technology & Politics, 10(3), 276–292.
Farley, C. J. (2014, February 12). Do people need libraries in the digital age? Wall Street Journal. Retrieved from http://blogs.wsj.com/speakeasy/2014/02/12/are-libraries-overdue-for-digital-change/?KEYWORDS=libraries
Felder, A. (2014, June 5). How comments shape perceptions of sites’ quality—and affect traffic. The Atlantic. Retrieved from http://www.theatlantic.com/technology/archive/2014/06/internet-comments-and-perceptions-of-quality/371862
Hanson, M. & Adams, A.L. (2014, April). Who do they think we are? Addressing library identity perception in the academy. In 2014 CARL Conference Proceedings. Retrieved from http://carl-conference.org/sites/carl-conference.org/files/slides/hansonadams.pdf
Hanson, M. & Adams, A.L. (2014, November). What does the public say? Analyzing online news article comments about libraries. Poster session presented at the California Library Association Conference, Oakland, CA. Retrieved from http://bit.ly/1MyoJFu
Horrigan, J. (2015, September). Libraries at the crossroads. Pew Research Center. Retrieved from http://www.pewinternet.org/2015/09/15/libraries-at-the-crossroads/
Knight, H. (2014, March 8). S.F. library proposes new code of conduct with penalties. SF Gate. Retrieved from http://www.sfgate.com/default/article/S-F-library-proposes-new-code-of-conduct-with-5300570.php
Lagos, M. (2014, May 9). North Beach library’s opening marks end of $200 million program. SF Gate. Retrieved from http://www.sfgate.com/default/article/North-Beach-library-s-opening-marks-end-of-200-5467298.php
Lee, E.-J. (2012). That’s not the way it is: How user-generated comments on the news affect perceived media bias. Journal of computer-mediated communication, 18(1), 32–45. http://doi.org/10.1111/j.1083-6101.2012.01597.x
Lee, H. K. (2013, October 2). Alleged online drug kingpin arrested at SF library. SF Gate. Retrieved from http://www.sfgate.com/default/article/Alleged-online-drug-kingpin-arrested-at-SF-library-4863306.php
Pogrebin, R. (2014, May 7). Public library is abandoning disputed plan for landmark. New York Times. Retrieved from http://www.nytimes.com/2014/05/08/arts/design/public-library-abandons-plan-to-revamp-42nd-street-building.html
Reagle, J. M. (2015). Reading the comments: Likers, haters, and manipulators at the bottom of the Web. Cambridge, Massachusetts: MIT Press.
Rybczynski, W. (2014, February 18). Obama and his library: Go small. New York Times. Retrieved from http://www.nytimes.com/2014/02/19/opinion/obama-and-his-library-go-small.html
Springer, N., Engelmann, I., & Pfaffinger, C. (2015). User comments: Motives and inhibitors to write and read. Information, Communication & Society, 18(7), 798–815.
Suler, J. (2004). The online disinhibition effect. CyberPsychology & Behavior, 7(3), 321-326. http://dx.doi.org/10.1089/1094931041291295.
Whiting, S. (2014, April 16). Photographer checks out US public libraries’ function, form. SF Gate. Retrieved from http://www.sfgate.com/default/article/Photographer-checks-out-US-public-libraries-5405100.php
Wong, C. H. (2014, July 12). Furor erupts as Singapore library pulls children’s books over ‘family values’. Wall Street Journal. Retrieved from http://blogs.wsj.com/searealtime/2014/07/12/furor-erupts-as-singapore-library-pulls-childrens-books-over-family-values/?KEYWORDS=libraries
Zickuhr, K., Rainie, L., Purcell, K., & Duggan, M. (2013). How Americans value public libraries in their communities. Pew Internet. Retrieved from http://libraries.pewinternet.org/2013/12/11/libraries-in-communities
- Comment by user “Jack N Fran Farrell” to Williams, A. (2014, June 27). Got Wi-Fi? Some libraries now lending hotspots. USA Today.
- Comment by user “john fitzgerald” to Farley, C. J. (2014, February 12). Do People Need Libraries in the Digital Age? Wall Street Journal.
** This insight was written by TH Schee OK Taiwan ambassador **
Taiwan has surprisingly topped the Global Open Data Index 2015, and it’s not without questions as how this could be have been achieved without further examination. Even though Taiwan has been very active and recognised as one of the hotspot of open data, little is known on actual landscape outside the island. Take a look at tech president , Nieman Lab, and Science & Technology Law Institute of Taiwan for more context.
To give some background to the seemingly odd result, context is needed to better understand how the Index has shaped Taiwan’s overall effort and awareness of it since 2013, and possibly even more so in the long run.
According to the “Freedom of the Press 2015”, Taiwan is considered among the top in Asia Pacific, along with Japan, Australia and New Zealand. It’s extremely vigorous, diverse and free environment of press freedom has served a facilitating catalyst to any communities, not just limited to the journalistic world, but also the public and private sectors which are part of the broader “reuse” groups of public sector information to engage in a way that enthusiasts in neighboring countries and economies can only shy away for safety reasons. To put it in simple terms, you are literally free and able to enjoy more freedom to interpret data, check the integrity of it, report it, or even use it to hold your government accountable in litigation.
The country staggeringly claims the world’s highest penetration of Facebook users to overall population. This has also contributed to a fast, and to some degrees even vicious, cycle of feedback loops on public discourse of any datasets released from dozens of data portals. This has greatly enhanced visibility of the agenda carried on by the #GODI15 on the island.
From the government perspective, another major contributing factor has been the establishment of the formalised mechanism on public consultation, in forms of dedicated committees in all ministries. A total of 30+ were established in first half of 2015, and seat rotation on a 1~2 year nominal terms is enacted, with majority of members from the government plus selected few from civil society, academia and private sectors. This has served very well to raise awareness of Open Knowledge and the #GODI15 inside the government, and serious actions were taken to study the #GODI15 in detail as early as 2013. This proves to be somewhat controversial in the final outcome, but we are seeing how the Index has formally affected the perception and assessment of its own mandates and initiatives in Taiwan. The discourse around #GODI15 is public in meeting minutes that are available through taking a look at http://data.gov.tw.
The third contributing factor is slightly uncomfortable because the government has supported some very disputable mandates, including possible release of personal data in form of open format from the National Healthcare Insurance Program without prior agreement from insurants. It has dearly caused major concerns from several human right groups and the civil society are still waiting for court verdict because a class action has been filed against the government. The case raised a whole new spectrum of understanding on issues that open data initiatives might bring a forth among transparency groups and the congress, and has created a much broader community base around provocative but valuable issues that we generally find it challenging to foster from top-down, technology-driven initiatives.
The upcoming Presidential election is set to take place in less than 40 days from now and it’s widely agreed that the agenda on open data and policies would be carried out in the new government. The best thing so far has never been the ranking, but a true dialogue among local and even regional stakeholders. The #GODI15 has only served a fresh start for Taiwan, and without it, sincere and reasoned debates would not even surface.
There was homework for this week’s #critlib, a Twitter chat/community (and website) about critical librarianship that I participate in. Without going and finding the actual definition, according to the folks who started it, I’ll say that it seems to me that “critical librarianship” means librarianship (and information science) practiced through a social justice lens, including lessons from advocates and activists for feminism, racial justice, disability, [anti-] poverty, etc. (There’s a long list, when it’s being done right.)First, why NOT participate in #critlib
I find myself somewhat dissatisfied with my definition of what #critlib is.
Part of the reason I held off on participating for as long as I did, despite the participation of quite a few people I look up to, was that it felt too scholarly and too removed from the experiences of librarians (not to mention patrons) in marginalized groups, when I first looked into it. It seemed like a lot of smart people navel-gazing about queer theory, feminist theory, etc., but it did not seem like a vehicle for action.
“Scholarly” can be a compliment, and its complimentary form absolutely can be applied to #critlib. I’m using it as a critique, above, though; I mean it to imply two things: 1) the remove, the attempt to be “objective” in a way that can feel like the issues being discussed belong to other people, not to the people participating in the discussion, and 2) an inaccessibility, a “you must be this smart and well-read to enter” kind of feeling. I know the community tries to be welcoming, and I think it is getting better at this over time; but I still hear echoes of that “I’m not smart enough”/”I haven’t read enough” feeling coming from people who would otherwise like to join in. (I do what I can to help, there. We can all do better at this, though.)
I still feel some of this. So perhaps I should modify my definition above, to say “… librarianship (and information science) examined through a social justice lens…”
On a bad day, when it all seems too theoretical, or like it is wandering close to the line of being patronizing to members of marginalized groups (too much “us” and “them” in the discourse, maybe), I admit: I quietly walk away.Now, why I (sometimes) participate
There’s value in examination and theoretical understanding, though. Arguably, action is not worth taking without it. So while #critlib may have frustrated me, at first, in its distance from practical solutions, I recognize that it is a Good Thing™ and worth participating in, or at least supporting. Awareness is important. Discussion is important. Both are prerequisites for worthwhile collective action.
Like I said, also: there are a lot of people in the #critlib community whom I greatly admire and whose thoughts I am interested in. And, although I have endeavored to do my homework on various social justice issues, I know I have a lot to learn—even about issues that affect me. (I am, for instance, not a good disability activist, despite having lived with a disability for several years.) So spending some time listening to theory—and, one hopes, to the lived experiences of my peers—is absolutely a good use of my time.My hopes
Over the next year I’d like to see #critlib start to grow into more of a vehicle for concerted, collective action. I’d like to see us capture [anonymized?] stories of critical librarianship being applied in real situations. I want practical applications. (That comes as no surprise to anyone who’s met me. :))
I’d really like to see us doing more to support our colleagues whose lives are directly affected by the issues #critlib discusses. And, to that end, I want #critlib to be a space where the voices of people in marginalized groups are actively invited in, welcomed, listened to, and amplified. I am confident that we can be scholarly (in the positive sense) without being exclusionary.
This is just one example, out of many things we could do, but it’s an achievable one: I’d like us to push for data transparency—perhaps follow tech’s lead (something I didn’t expect to say) and push libraries and library associations to release their demographics publicly (including a breakdown of the demographics of library leaders). I’d like to help chart the differences between our demographics and those of our applicant pools (or at least MLIS graduates?); and between our demographics and our communities’ demographics. That won’t solve any problems on its own, but it will help us to demonstrate that there are problems and to push for change. (And, like I said, it’s just one thing we could do.)
For that matter, I’d love to know #critlib’s demographics.
More broadly, I’d like us to help teach/push one another to be better activists, and I would like to see #critlib’s effects—and be able to point to them—over time.
For now, talking and theory are a good start. I just don’t want us to stop there, you know?
Image at the top of the post via nicolecat1 on deviantart.
20 years ago, Jeff Rothenberg's seminal Ensuring the Longevity of Digital Documents compared migration and emulation as strategies for digital preservation, strongly favoring emulation. Emulation was already a long-established technology; as Rothenberg wrote Apple was using it as the basis for their transition from the Motorola 68K to the PowerPC. Despite this, the strategy of almost all digital preservation systems since has been migration. Why was this?
Preservation systems using emulation have recently been deployed for public use by the Internet Archive and the Rhizome Project, and for restricted use by the Olive Archive at Carnegie-Mellon and others. What are the advantages and limitations of current emulation technology, and what are the barriers to more general adoption?Below the fold, the text of the talk with links to the sources. The demos in the talk were crippled by the saturated hotel network; please click on the linked images below for Smarty, oldweb.today and VisiCalc to experience them for yourself. The Olive demo of TurboTax is not publicly available, but it is greatly to Olive's credit that it worked well even on a heavily-loaded network.
Title Once again, I need to thank Cliff Lynch for inviting me to to give this talk, and for letting me use the participants in Berkeley iSchool's "Information Access Seminars" as guinea-pigs to debug it. This one is basically "what I did on my summer vacation", writing a report under contract to the Mellon Foundation entitled Emulation and Virtualization as Preservation Strategies. As usual, you don't have to take notes or ask for the slides, an expanded text with links to the sources will go up on my blog shortly. The report itself is available from the Mellon Foundation and from the LOCKSS website.
I'm old enough to know that giving talks that include live demos over the Internet is a really bad idea, so I must start by invoking the blessing of the demo gods.
HistoryEmulation and virtualization technologies have been a feature of the information technology landscape for a long time, going back at least to the IBM709 in 1958, but their importance for preservation was first bought to public attention in Jeff Rothenberg's seminal 1995 Scientific American article Ensuring the Longevity of Digital Documents. As he wrote, Apple was using emulation in the transition of the Macintosh from the Motorola 68000 to the Power PC. The experience he drew on was the rapid evolution of digital storage media such as tapes and floppy disks, and of applications such as word processors each with their own incompatible format.
His vision can be summed up as follows: documents are stored on off-line media which decay quickly, whose readers become obsolete quickly, as do the proprietary, closed formats in which they are stored. If this isn't enough, operating systems and hardware change quickly in ways that break the applications that render the documents.
Rothenberg identified two techniques by which digital documents could survive in this unstable environment, contrasting the inability of format migration to guarantee fidelity with emulation's ability to precisely mimic the behavior of obsolete hardware.
Rothenberg's advocacy notwithstanding, most digital preservation efforts since have used format migration as their preservation strategy. The isolated demonstrations of emulation's feasibility, such as the collaboration between the UK National Archives and Microsoft, had little effect. Emulation was regarded as impractical because it was thought (correctly at the time) to require more skill and knowledge to both create and invoke emulations than scholars wanting access to preserved materials would possess.
OverviewMacOS7 on Apple WatchNintendo 64 on Android WearIt took Nick Lee about 4 hours to get this emulation of MacOS 7 running on his Apple Watch. Hacking Jules followed with Nintendo 64 and PSP emulators on his Android Wear. Simply getting one of the many available emulators running in a new environment isn't that hard, but that isn't enough to make them useful.
Recently, teams at the Internet Archive, Freiburg University and Carnegie Mellon University have shown frameworks that can make emulations appear as normal parts of Web pages; readers need not be aware that emulation is occurring. Some of these frameworks have attracted substantial audiences and demonstrated that they can scale to match. This talk is in four parts:
- First I will show some examples of how these frameworks make emulations of legacy digital artefacts, those from before about the turn of the century, usable for unskilled readers.
- Next I will discuss some of the issues that are hampering the use of these frameworks for legacy artefacts.
- Then I will describe the changes in digital technologies over the last two decades, and how they impact the effectiveness of emulation and migration in providing access to current digital artefacts.
- I will conclude with a look at the single biggest barrier that has and will continue to hamper emulation as a preservation strategy.
- One or more emulators capable of executing preserved system images.
- A collection of preserved system images, together with the metadata describing which emulator configured in which way is appropriate for executing them.
- A framework that connects the user with the collection and the emulators so that the preserved system image of the user's choice is executed with the appropriately configured emulator connected to the appropriate user interface.
What Is Going On?What happened when I clicked Smarty's Play button?
- The browser connects to a session manager in Amazon's cloud, which notices that this is a new session.
- Normally it would authenticate the user, but because this CD-ROM emulation is open access it doesn't need to.
- It assigns one of its pool of running Amazon instances to run the session's emulator. Each instance can run a limited number of emulators. If no instance is available when the request comes in it can take up to 90 seconds to start another.
- It starts the emulation on the assigned instance, supplying metadata telling the emulator what to run.
- The emulator starts. After a short delay the user sees the Mac boot sequence, and then the CD-ROM starts running.
- At intervals, the emulator sends the session manager a keep-alive signal. Emulators that haven't sent one in 30 seconds are presumed dead, and their resources are reclaimed to avoid paying the cloud provider for unused resources.
- Data I/O, connecting the emulator to data sources such as disk images, user files, an emulated network containing other emulators, and the Internet.
- Interactive Access, connecting the emulator to the user using standard HTML5 facilities.
- Control, providing a Web Services interface that bwFLA's resource management can use to control the emulator.
bwFLA's preserved system images are stored as a stack of overlays in QEMU's "qcow2'' format. Each overlay on top of the base system image represents a set of writes to the underlying image. For example, the base system image might be the result of an initial install of Windows 95, and the next overlay up might be the result of installing Word Perfect into the base system. Or, as Cal Lee mentioned yesterday, the next overlay up might be the result of redaction. Each overlay contains only those disk blocks that differ from the stack of overlays below it. The stack of overlays is exposed to the emulator as if it were a normal file system via FUSE.
The technical metadata that encapsulates the system disk image is described in a paper presented to the iPres conference in November 2015, using the example of emulating CD-ROMs. Broadly, it falls into two parts, describing the software and hardware environments needed by the CD-ROM in XML. The XML refers to the software image components via the Handle system, providing a location-independent link to access them.
oldweb.todayBBC News via oldweb.todayIlya Kreymer has used the same Docker (see his comment) technology to implement oldweb.today, a site through which you can view Web pages from nearly a dozen Web archives using a contemporary browser. Here, for example, is the front page of the BBC News site as of 07:53GMT on 13th October 1999 viewed with Internet Explorer 4.01 on Windows. This is a particularly nice example of the way that emulation frameworks can deliver useful services layered on archived content. Note that the URL for this page, http://oldweb.today/ie4/19991210182302/http://news.bbc.co.uk/, as with the Wayback Machine doesn't specify the technology used.
TurboTaxTurboTax97 on Windows 3.1Here, again from a Chromium browser on my Ubuntu 14.04 system, is 1997's TurboTax running on Windows 3.1. The pane in the browser window has top and bottom menu bars, and between them is the familiar Windows 3.1 user interface.
What Is Going On?The top and bottom menu bars come from a program called VMNetX that is running on my system. Chromium invoked it via a MIME-type binding, and VMNetX then created a suitable environment in which it could invoke the emulator that is running Windows 3.1, and TurboTax. The menu bars include buttons to power-off the emulated system, control its settings, grab the screen, and control the assignment of the keyboard and mouse to the emulated system.
The interesting question is "where is the Windows 3.1 system disk with TurboTax installed on it?"
OliveThe answer is that the "system disk" is actually a file on a remote Apache Web server. The emulator's disk accesses are being demand-paged over the Internet using standard HTTP range queries to the file's URL.
This system is Olive, developed at Carnegie Mellon University by a team under my friend Prof. Mahadev Satyanarayanan, and released under GPLv2. VMNetX uses a sophisticated two-level caching scheme to provide good emulated performance even over slow Internet connections. A "pristine cache" contains copies of unmodified disk blocks from the "system disk". When a program writes to disk, the data is captured in a "modified cache". When the program reads a disk block, it is delivered from the modified cache, the pristine cache or the Web server, in that order. One reason this works well is that successive emulations of the same preserved system image are very similar, so pre-fetching blocks into the pristine cache is effective in producing YouTube-like performance over 4G cellular networks.
VisiCalcVisiCalc on Apple ][This, from 1979, is Dan Bricklin and Bob Frankston's VisiCalc. It was the world's first spreadsheet. It is running on an emulated Apple ][ via a Chromium browser on my Ubuntu 14.04 system. Some of the key-bindings are strange to users conditioned by decades of Excel, but once you've found the original VisiCalc reference card, it is perfectly usable.
Internet ArchiveThis is the framework underlying the Internet Archive's software library, which currently holds nearly 36,000 items, including more than 7,300 for MS-DOS, 3,600 for Apple, 2,900 console games and 600 arcade games. Some can be downloaded, but most can only be streamed.
The oldest is an emulation of a PDP-1 with a DEC 30 display running the Space War game from 1962, more than half a century ago. As I can testify having played this and similar games on Cambridge University’s PDP-7 with a DEC 340 display seven years later, this emulation works well
Concerns: EmulatorsAll three groups share a set of concerns about emulation technology. The first is about the emulators themselves. There are a lot of different emulators out there, but the open source emulators used for preservation fall into two groups:
- QEMU is well-supported, mainstream open source software, part of most Linux distributions. It emulates or virtualizes a range of architectures including X86, X86-64, ARM, MIPS and SPARC. It is used by both bwFLA and Olive, but both groups have encountered irritating regressions in its emulations of older systems, such as Windows 95. It is hard to get the QEMU developers to prioritize fixing these, since emulating current hardware is its primary focus. The recent SOSP workshop featured a paper from the Technion and Intel describing their use of the tools Intel uses to verify chips to verify QEMU. They found and mostly fixed 117 bugs.
- Enthusiast-supported emulators for old hardware including MAME/MESS, Basilisk II, SheepShaver, and DOSBox. These generally do an excellent job of mimic-ing the performance of a wide range of obsolete CPU architectures, but have some issues mapping the original user interface to modern hardware. Jason Scott at the Internet Archive has done great work encouraging the retro-gaming community to fix problems with these emulators but, for long-term preservation, their support causes concerns.
- Technical metadata, describing the environment needed in order for the bits to function. Tools for extracting technical metadata for migration such as JHOVE and DROID exist, as do the databases on which they rely such as PRONOM, but they are inadequate for emulation. The DNB and bwFLA teams' iPRES 2015 paper describes an initial implementation of a tool for compiling and packaging this metadata which worked quite well for the restricted domain of CD-ROMs. But much better, broadly applicable tools and databases are needed if emulation is to be affordable.
- Bibliographic metadata, describing what the bits are so that they can be discovered by potential "readers".
- Usability metadata, describing how to use the emulated software. An example is the VisiCalc reference card, describing the key bindings of the first spreadsheet.
- Usage metadata, describing how the emulations get used by "readers", which is needed by cloud-based emulation systems for provisioning, and for "page-rank" type assistance in discovery. The Web provides high-quality tools in this area, although a balance has to be maintained with user privacy. The Internet Archive's praiseworthy policy of minimizing logging does make it hard to know how much their emulations are used.
Concerns: FidelityIn a Turing sense all computers are equivalent, so it is possible and indeed common for an emulator to precisely mimic the behavior of a computer's CPU and memory. But physical computers are more than a CPU and memory. They have I/O devices whose behavior in the digital domain is more complex than Turing's model. Some of these devices translate between the digital and analog domains to provide the computer's user interface.
PDP1 front panel by fjarlq / Matt.
Licensed under CC BY 2.0.A user experiences an emulation via its analog behavior, and this can be sufficiently different to impair the experience. Smarty's sound glitches are an example. Consider also the emulation of Space Wars on the PDP-1. The experience of pointing and clicking at the Internet Archive's web page, pressing LEFT-CTRL and ENTER to start, watching a small patch in one window on your screen among many others, and controlling your spaceship from the keyboard is not the same as the original. That experience included loading the paper tape into the reader, entering the paper tape bootstrap from the Address and Test Word switches at the left, pressing the Start switch at the bottom left, and then each player controlling their ship with three of the six Sense switches on the right. The display was a large, round, flickering CRT.
Concerns: Loads & ScalingDaily emulation counts
One advantage of frameworks such as the Internet Archive's and Olive's is that each additional user brings along the compute power needed to run their emulation. Frameworks in which the emulation runs remotely must add resources to support added users. The release of the Theresa Duncan CD-ROMs attracted considerable media attention, and the load on Rhizome's emulation infrastructure spiked.
Their experience led Rhizome to deploy their infrastructure on Amazon's highly scalable ElasticBeanstalk infrastructure. Klaus Rechert computes:
Amazon EC2 charges for an 8 CPU machine about €0.50 per hour. In case of [Bomb Iraq], the average session time of a user playing with the emulated machine was 15 minutes, hence, the average cost per user is about €0.02 if a machine is fully utilized.In the peak, this would have been about €10/day, ignoring Amazon's charges for data out to the Internet. Nevertheless, automatically scaling to handle unpredictable spikes in demand always carries budget risks, and rate limits are essential for cloud deployment.
Why Mostly Games?Using emulation for preservation was pioneered by video game enthusiasts. This reflects a significant audience demand for retro gaming which, despite the easy informal availability of free games, is estimated to be a $200M/year segmentof the $100B/year video games industry. Commercial attention to the value of the game industry's back catalog is increasing; a company called Digital Eclipse aspires to become the Criterion Collection of gaming, selling high-quality re-issues of old games. Because preserving content for scholars lacks the business model and fan base of retro gaming, it is likely that it will continue to be a minority interest in the emulation community.
There are relatively few preserved system images other than games for several reasons:
- The retro gaming community has established an informal modus vivendi with the copyright owners. Most institutions require formal agreements covering preservation and access and, just as with academic journals and books, identifying and negotiating individually with every copyright owner in the software stack is extremely expensive.
- If a game is to be successful enough to be worth preserving, it must be easy for an unskilled person to install, execute and understand, and thus easy for a curator to create a preserved system image. The same is not true for artefacts such as art-works or scientific computations, and thus the cost per preserved system image is much higher.
- A large base of volunteers is interested in creating preserved game images, and there is commercial interest in doing so. Preserving other genres requires funding.
- Techniques have been developed for mass preservation of, for example, Web pages, academic journals, and e-books, but no such mass preservation technology is available for emulations. Until it is, the cost per artefact preserved will remain many orders of magnitude higher.
Before the advent of the Web digital artefacts had easily identified boundaries. They consisted of a stack of components, starting at the base with some specified hardware, an operating system, an application program and some data. In typical discussions of digital preservation, the bottom two layers were assumed and the top two instantiated in a physical storage medium such as a CD.
The connectivity provided by the Internet and subsequently by the Web makes it difficult to determine where the boundaries of a digital object are. For example, the full functionality of what appear on the surface to be traditional digital documents such as spreadsheets or PDFs can invoke services elsewhere on the network, even if only by including links. The crawlers that collect Web content for preservation have to be carefully programmed to define the boundaries of their crawls. Doing so imposes artificial boundaries, breaking what appears to the reader as a homogeneous information space into discrete digital "objects''.
Indeed, what a reader thinks of as "a web page'' typically now consists of components from dozens of different Web servers, most of which do not contribute to the reader's experience of the page. They are deliberately invisible, implementing the Web's business model of universal fine-grained surveillance.
Sir Tim Berners-Lee's original Webwas essentially an implementation of Vannevar Bush's Memex hypertext concept, an information space of passive, quasi-static hyper-linked documents. The content a user obtained by dereferencing a link was highly likely to be the same as that obtained by a different user, or by the same user at a different time.
The fact that the artefacts to be preserved are now active makes emulation a far better strategy than migration, but it increases the difficulty of defining their boundaries. One invocation of an object may include a different set of components from the next invocation, so how do you determine which components to preserve?
In 1995, a typical desktop 3.5'' hard disk held 1-2GB of data. Today, the same form factor holds 4-10TB, say 4-5 thousand times as much. In 1995, there were estimated to be 16 million Web users, Today, there are estimated to be over 3 billion, nearly 200 times as many. At the end of 1996, the Internet Archive estimated the total size of the Web at 1.5TB, but today they ingest that much data roughly every 30minutes.
The technology has grown, but the world of data has grown much faster, and this has transformed the problems of preserving digital artefacts. Take an everyday artefact such as Google Maps. It is simply too big and worth too much money for any possibility of preservation by a third party such as an archive, and its owner has no interest in preserving its previous states.
Infrastructure EvolutionWhile the digital artefacts being created were evolving, the infrastructure they depend on was evolving too. For preservation, the key changes were:
- GPUs: As Rothenberg was writing, PC hardware was undergoing a major architectural change. The connection between early PCs and their I/O devices was the ISA bus, whose bandwidth and latency constraints made it effectively impossible to deliver multimedia applications such as movies and computer games. This was replaced by the PCI bus, with much better performance, and multimedia became an essential ingredient of computing devices. This forced a division of system architecture into a Central Processing Unit (CPU) and what became known as Graphics Processing Units (GPUs). The reason was that CPUs were essentially sequential processors, incapable of performing the highly parallel task of rendering the graphics fast enough to deliver an acceptable user experience. Now, much of the silicon in essentially every device with a user interface implements a massively parallel GPU whose connection to the display is both very high bandwidth and very low latency. Most high-end scientific computation now also depends on the massive parallelism of GPUs rather than traditional super-computer technology. Partial para-virtualization of GPUs was recently mainstreamed in Linux 4.4, but its usefulness for preservation is strictly limited.
- Smartphones: Both desktop and laptop PC sales are in free-fall, and even tablet sales are no longer growing. Smartphones are the hardware of choice. They, and tablets, amplify interconnectedness; they are designed not as autonomous computing resources but as interfaces to the Internet. The concept of a stand-alone ``application'' is no longer really relevant to these devices. Their ``App Store'' supplies custom front-ends to network services, as these are more effective at implementing the Web's business model of pervasive surveillance. Apps are notoriously difficult to collect and preserve. Emulation can help with their tight connection to their hardware platform, but not with their dependence on network services. The user interface hardware of mobile devices is much more diverse. In some cases the hardware is technically compatible with traditional PCs, but not functionally compatible. For example, mobile screens typically are both smaller and have much smaller pixels, so an image from a PC may be displayable on a mobile display but it may be either too small to be readable, or if scaled to be readable may be clipped to fit the screen. In other cases the hardware isn't even technically compatible. The physical keyboard of a laptop and the on-screen virtual keyboard of a tablet are not compatible.
- Moore's Law: Gordon Moore predicted in 1965 that the number of transistors per unit area of a state-of-the-art integrated circuit would double about every two years. For about the first four decades of Moore's Law, what CPU designers used the extra transistors for was to make the CPU faster. This was advantageous for emulation; the modern CPU that was emulating an older CPU would be much faster. The computational cost of emulating the old hardware in software would be swamped by the faster hardware being used to do it. Although Moore's Law continued into its fifth decade, each extra transistor gradually became less effective at increasing CPU speed. Further, as GPUs took over much of the intense computation, customer demand evolved from maximum performance per CPU, to processing throughput per unit power. Emulation is a sequential process, so the fact that the CPUs are no longer getting rapidly faster is disadvantageous for emulation.
- Architectural Consolidation: W. Brian Arthur's 1994 book Increasing Returns and Path Dependence in the Economydescribed the way the strongly increasing returns to scale in technology markets drove consolidation. Over the past two decades this has happened to system architectures. Although it is impressive that MAME/MESS emulates nearly two thousand different systems from the past, going forward emulating only two architectures (Intel and ARM) will capture the overwhelming majority of digital artefacts.
- Threats: Although the Morris Worm took down the Internet in 1988, the Internet environment two decades ago was still fairly benign. Now, Internet crime is one of the world's most profitable activities, as can be judged by the fact that the price for a single zero-day iOS exploit is $1M. Because users are so bad at keeping their systems up-to-date with patches, once a vulnerability is exploited it becomes a semi-permanent feature of the Internet. For example, the 7-year old Conficker worm was recently found infecting brand-new police body-cameras. This threat persistence is a particular concern for emulation as a preservation strategy. Familiarity Breeds Contempt by Clark et al. shows that the interval between discoveries of new vulnerabilities in released software decreases through time. Thus the older the preserved system image, the (exponentially) more vulnerabilities it will contain, and the more likely it is to be compromised as soon as its emulation starts.
Most libraries and archives are very reluctant to operate in ways whose legal foundations are less than crystal clear. There are two areas of law that affect using emulation to re-execute preserved software, copyright and, except for open-source software, the end user license agreement (EULA), a contract between the original purchaser and the vendor.
Software must be assumed to be copyright, and thus absent specific permission such as a Creative Commons or open source license, making persistent copies such as are needed to form collections of preserved system images is generally not permitted. The Digital Millennium Copyright Act (DMCA) contains a "safe harbor'" provision under which sites that remove copies if copyright owners send "takedown notices" are permitted; this is the basis upon which the Internet Archive's collection operates. Further, under the DMCA it is forbidden to circumvent any form of copy protection or Digital Rights Management (DRM) technology. These constraints apply independently to every component in the software stack contained in a preserved system image, thus there may be many parties with an interest in an emulation's legality.
The Internet Archive and others have repeatedly worked through the "Section 108'" process to obtain an exemption to the circumvention banfor programs and video games "distributed in formats that have become obsolete and that require the original media or hardware as a condition of access, when circumvention is accomplished for the purpose of preservation or archival reproduction of published digital works by a library or archive." This exemption appears to cover the Internet Archive's circumvention of any DRM on their preserved software, and its subsequent ``archival reproduction'' which presumably includes execution. It does not, however, exempt the archive from taking down preserved system images if the claimed copyright owner objects, and the Internet Archive routinely does so. Neither does the DMCA exemption cover the issue of whether the emulation violates the EULA.
Streaming media services such as Spotify, which do not result in the proliferation of copies of content, have significantly reduced although not eliminated intellectual property concerns around access to digital media. "Streaming'" emulation systems should have a similar effect on access to preserved digital artefacts. The success of the Internet Archive's collections, much of which can only be streamed, and Rhizome's is encouraging in this respect. Nevertheless, it is clear that institutions will not build, and provide access even on a restricted basis to, collections of preserved system images at the scale needed to preserve our cultural heritage unless the legal basis for doing so is clarified.
Negotiating with copyright holders piecemeal is very expensive and time-consuming. Trying to negotiate a global agreement that would obviate the need for individual agreement would in the best case, take a long time. I predict the time would be infinite rather than long. If we wait to build collections until we have permission in one of these ways much software will be lost.
An alternative approach worth considering would separate the issues of permission to collect from the issues of permission to provide access. Software is copyright. In the paper world, many countries had copyright deposit legislation allowing their national library to acquire, preserve and provide access (generally restricted to readers physically at the library) to copyright material. Many countries, including most of the major software producing countries, have passed legislation extending their national library's rights to the digital domain.
The result is that most of the relevant national libraries already have the right to acquire and preserve digital works, although not the right to provide unrestricted access to them. Many national libraries have collected digital works in physical form. For example, the DNB's CD-ROM collection includes half a million items. Many national libraries are crawling the Web to ingest Web pages relevant to their collections.
It does not appear that national libraries are consistently exercising their right to acquire and preserve the software components needed to support future emulations, such as operating systems, libraries and databases. A simple change of policy by major national libraries could be effective immediately in ensuring that these components were archived. Each national library's collection could be accessed by emulations on-site in "reading-room" conditions, as envisaged by the DNB. No time-consuming negotiations with publishers would be needed.
If national libraries stepped up to the plate in this way, the problem of access would remain. One idea that might be worth exploring as a way to it is lending. The Internet Archive has successfully implemented a lending system for their collection of digitized books. Readers can check a book out for a limited period; each book can be checked out to at most one reader at a time. This has not encountered much opposition from copyright holders. A similar system for emulation would be feasible; readers would check out an emulation for a limited period, and each emulation could be checked out to at most one reader at a time. One issue would be dependencies. An archive might have, say, 10,000 emulations based on Windows 3.1. If checking out one blocked access to all 10,000 that might be too restrictive to be useful.
ConclusionI hope I have shown that the technical problems of delivering emulations of preserved software have largely been solved. Concerns remain, but most are manageable. The legal issues are intractable unless national libraries are prepared to use their copyright deposit rights to build collections of software. If they do, some way to provide off-site access will be needed, but at least the software will be around to be emulated when agreement is reached on it.
Open Knowledge Foundation: Forbes Philippines & BlogWatch win best story award as Data Journalism PH wraps up
At the end of November, Open Knowledge, School of Data and the Philippine Center for Investigative Journalism (PCIJ) wrapped-up their six-month data journalism training for media organisations in the Philippines, the first of its kind.
Over 100 journalists and civil servants gathered at the Cocoon Hotel in Quezon City to see the twelve participating media teams present their work and listen to keynotes from The Guardian’s Caelainn Barr, Undersecretary Richard Moya (Open Data Task Force Philippines), Kenneth Abante (Department of Finance) and Rogier Van Den Brink (World Bank) on the interplay between government open data and public integrity journalism.
Kenneth Abante from the Department of Finance speaking at the wrap-up event of Data Journalism PH 2015
The World Bank funded programme equipped participating newsrooms with the tools and techniques for mining the ever increasing volumes of public data being published by Philippine government departments via their national government data portal, data.gov.ph. After an initial intensive three-day training in July 2015 the teams received regular remote training sessions on data skills from Open Knowledge and editorial support from PCIJ as they progressed with their proposed data stories. Teams worked on diverse topics from probing who really benefits from the the Philippines’ Bottom Up Budgeting initiative to following where money allocated to the reconstruction effort after Typhoon Yolanda actually went.
Five of the twelve participating teams were able to publish their stories before the event with a number of teams finalising their articles for print publications in the new year. Forbes Philippines and BlogWatch were awarded prizes for best story by PCIJ and Open Knowledge based on the originality of their stories, their approach to data collection and the strength of their narrative. Forbes Philippines collected data from the SEC on independent directors and correlated this with company performance to give a unique view on corporate accountability in the Philippines. BlogWatch persevered with a range of large publicly available datasets on aid and reconstruction. The team also took to social media to crowdsource information that was missing in order to follow the money that was plugged into various projects in the wake of the devastation caused by Typhoon Yolanda.
Winning teams BlogWatch (Jane Uymatiao & Noemi Lardizabal-Dado) and Forbes Philippines (Lala Rimando & Lorenxo Subido) with Sam Leon (Open Knowledge) and Malou Mangahas (PCIJ)
Philippine Star ran an analysis of data published by the Department for Education on how many new schools were being built that would not have access to electricity and water. Business World looked at new trends in investment amongst Filipino citizens and summarised their results in an infographic. Calbayog Post investigated how projects approved under the Bottom Up Budgeting scheme in Samar Province had performed. The Financial Times produced a visual slideshow on the Philippines’ dependence on renewables and the opportunities for hydro power using data published by the Department of Energy. You can read the published stories below with the exception of Forbes Philippines’ which will be published in their January 2016 editions. Other participating teams including Rappler, PCIJ, Interaksyon, ABS-CBN, Bloomberg, Inquirer were not able to publish in time for the deadline, but hope to publish their stories in the coming weeks.
The Philippines has made substantial progress in recent years in government transparency. Launching a national government open data portal in 2014 and setting up an Open Data Task Force within the civil service to catalyse further open data releases across national and local government. The programme demonstrated the promise of open data by enabling participating journalists to shed light on issues of critical national importance to the broader public. It also put into sharp focus areas that needed more work from publishing government departments. Too many critical datasets were incomplete, not maintained actively, contained inconsistencies that made them difficult to analyse and were not available for free.
A selection of the online tutorials, data recipes and training material have been made available for all to use on the project website including guides to using a range of tools such as Infogr.am, CartoDB, Import.io and DocumentCloud.Data-driven articles produced as part of the programme
Forbes Philippines and BlogWatch were awarded prizes for best story by PCIJ and Open Knowledge based on the originality of their stories, their approach to data collection and the strength of their narrative.
- BlogWatch (best story winner)
- Forbes Philippines (best story winner)
- Philippine power generation: a tidal change in renewable energy – Financial Times
- Where Filipino’s ‘Smart Money’ is going – Business World
- Bottom’s Up? How did Samar communities make use of the BUB fund? – Calbayog Post
- Philppine Star
“The workshop allowed me to be braver in pursuing irregularities and anomalies with the help of data, but also to be careful in making conclusions. It’s a nice intro to data journalism.” – Michael Joseph Bueza, Rappler
“Data Journalism PH 2015 is a workshop every serious journalist should take. More than teaching me practical skills – how to create maps, infographics, and spreadsheets – it made me realize how important it is to use hard facts, as opposed to merely relying on statements, to create a public that is more informed and more critical.” – Patricia Aquino, Interaksyon
“A great program! PCIJ has always set the standard for investigative journalism. Open Knowledge did a great job teaching us data journalism and the different skills it requires. Looking forward to continuing to work with the whole PCIJ team. The reports of the other teams were informative as well. Thumbs up to the whole group.” – Nestor Corrales, Inquirer
“The best journalism program ever that united my writing and analysis skills.” – Rommel Rutor, Calbayog Post
“It was a great chance to know how and why consumers and processors of data (i.e. the journalists) are seeking more from the producers of data (government, private sector). “ – Lala Rimando, Forbes Philippines
“It was a great opportunity to be part of this programme. I’m really interested in improving my technical and editorial skills on data mining and thanks to PCIJ and Open Knowledge, I’ve really learned a lot.” – Kia Obang, BusinessWorld Publishing
“Open data seems like an issue reserved for a select number of people, but it’s a subject that so many people need to be familiar with. Learning more about it through the programme can give you the right tools to turn open data into powerful analysis.” – Kyle Subido, Forbes Philippines
“If you want to learn how to dig into huge data, you gotta take this training. “ – Jose Gerwin Babob, Calbayog Post
“As a reporter covering different beats, including survey results, the training helped me learn new skills that would make me more effective in writing investigative stories by correctly analyzing and interpreting datasets. I really appreciate the efforts of the PCIJ and the Open Knowledge in coming up with such training for journalists like us. This will be a very good addition to our resumes. :-)” – Helen Flores, Philippine Star
“Had a grand time. Learned about the existence of free online tools which could potentially take off about 25% of my previous workload.” – Dan Paurom, Inquirer
“The Data Journalism Philippines 2015 is a timely program for Filipino journalists who are interested in making sense of the huge amount of data that are readily available online. The things that we have learned during the duration of the program equipped us with the necessary skills needed to produce quality data-driven articles for our respective organizations.” – Jan Victor Mateo, Philippine Star
*** This blog post was written by Hazwany Jamaluddin from Sinar Project in Malaysia ***
There are few countries in Southeast Asia region – Vietnam, Myanmar, Cambodia, Thailand, Lao PDR, Singapore, East Timor and Malaysia – that are falling behind in the global open data movement, while others – Indonesia and Philippines – are advancing as members of Open Government Partnership. So, how are these countries in Southeast Asia doing relatively to the rest of the world and with each other?
How open is Southeast Asia?
There several notable trends that we see in the region. There are different sets of restrictions that are mostly related to the access to Internet, Freedom of Expression (FOE) policies, Freedom of Information Acts (FOIA) Laws and Open Government policies.
The following are 3 environmental constraints that are affected by these restrictions:
- 1. Lack of capacity of government to maintain infrastructure and websites; incomplete and inaccessible published official information.
- 2. Limited open data knowledge from the respective government officials.
- 3. Limited capacity and open data knowledge of CSOs to take advantage or to ask for open data.
So, what does this look like on the ground?
In Malaysia, freedom of expression is lacking with very limited freedom of information, limited capacity for open data programs and unclear government policies for open data. Data displayed in Open Government portal are incomplete and not in open data format with support of open licensing. Despite having Freedom of Information Enactment (FOIE) in two states – Penang and Selangor – civil liberties and the public suffers from barriers of entry. Requests must be made in person when submitting a form on the counter with addition to steep request fees. Moreover, requests made via email are not accepted. The implementation quality of FOIE is lacking due to inadequate training among officers and awareness to the public. These challenges are exacerbated as the federal government is protected within the federal territory where the FOIE and FOIA does not exist.
In contrast to Indonesia, the emergence of Open Government policy and FOIA in Philippines and Thailand have broadened the perspective on accessing and collecting valuable data in machine readable format with support of open licensing. Having said that, there are no guarantees that FOIA in Indonesia ensures the public’s right to fully access official information due to the culture of secrecy inherited from the Soeharto era. For both Philippines and Thailand, having open government portal does not guarantee the compatibility and the quality of data. This also is indicative of the readiness of government to share valuable data openly that could be used for both short and long term planning on grassroots developments across the country.
In Singapore, there is no FOIA whereby information disclosure is regulated in a variety of informal and formal ways. The informal culture of secrecy inherited by the dominant People’s Action Party (PAP) drew public scrutiny and repression. Thus, citizens are unable to enjoy their right of access to information.
In Cambodia, there is no clear FOI provisions that currently exist in domestic law. Furthermore, the assertion of the Draft Law on access to information may conceal the government’s wrongdoings and limit access to information. In result, these restrictions leads to very limited capacity building open data programs and advocacy that are lacking whilst sharing almost similar restrictions in FOI laws and FOE policies in Vietnam, East Timor, Lao PDR and Myanmar. Consequently, abuses of power continue to prevail when government is not open or accountable.
When analyzing the results of the Index, 2 key issues arose with regards to the impact of Global Open Data Index in disenfranchised communities.
- 1. Where do we measure the accessibility between decision makers and disenfranchised communities such as single parents, people with disabilities, elderly, youth and children?
- 2. How can we show (and share) with the communities around the world about the reality of relationships and engagements between decision makers and disenfranchised communities?
From budgeting to legislature to welfare, transparency and accountability are lacking. Official governmental websites have poor navigation for content, and often not up to date. With these difficulties, it is hard for CSOs to find valuable information that could measure the state of open data availability in their country.
For countries like Indonesia, Philippines and Thailand, while having benefits of open data policies and Freedom of Information Act placed at the national level, it is uncertain that the open data has reached the sub-national level. Usage, reliability and compatibility of open data is more important at the sub-national level for CSOs because this is where development planning actually takes place.
The state of “openness” becomes more restricted for countries like Vietnam, Myanmar, Cambodia, Lao PDR, East Timor, Singapore and Malaysia. The absence of Lao PDR and Vietnam in Index reflects that valuable information is inaccessible and does not exist on official government websites. The low scores of Myanmar, East Timor, Cambodia and Malaysia also reflects of the incomplete informations from official government websites.
Global Open Data Index has becoming a very useful toolkit and international platform for CSOs to come together regionally in identifying data gaps and needs, and search common grounds to see how data can support their goals. However, at this point, it is uncertain to say the least that there is impact for local communities. In the future, it would be useful and important to have localized data in promoting local governance accountability and transparency. Hopefully, the Global Open Data Index will also be a platform to support Sustainable Development Goals monitoring and will also inspire countries to develop their localised version of the Open Data Index.
There is a need for capacity building on open data by making local communities understand the importance of open data, training/coaching people to make use of open data and putting the pressure of policy makers to implement better Freedom of Information policy, better open data policy and better open government policy.
Winchester, MA Traveling to the 20th annual conference of Museums and the Web 2016 (MW2016) April 6-9, 2016 in Los Angeles? If so plan on joining David Wilcox, DuraSpace, and Stefano Cossu, The Art Institute of Chicago for a half-day workshop focused on managing museum assets as linked data with Fedora 4. The workshop will be held on Wednesday, April 6, 2016 from 1:30-4:30 PM.
“Telling DSpace Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about DSpace implementations for the community and beyond. The following interview includes personal observations that may not represent the opinions and views of Cornell University Library or the DSpace Project.