You are here

Feed aggregator

Karen Coyle: This is what sexism looks like #2

planet code4lib - Sun, 2015-01-11 13:38
Libraries, it seems, are in crisis, and many people are searching for answers. Someone I know posted a blog post pointing to community systems like Stack Overflow and Reddit as examples of how libraries could create "community." He especially pointed out the value of "gamification" - the ranking of responses by the community - as something libraries should consider. His approach was that it is "human nature" to want to gain points. "We are made this way: give us a contest and we all want to win." (The rest of the post and the comments went beyond this to the questions of what libraries should be today, etc.)

There were many (about 4 dozen, almost all men) comments on his blog (which I am not linking to, because I don't want this to be a "call out"). He emailed me asking for my opinion.

I responded only to his point about gamification, which was all I had time for, saying that in that area his post ignored an important gender issue. The competitive aspect was part of what makes those sites unfriendly to women.

I told him that there have been many studies of how children play, and they reveal some distinct differences between genders. Boys begin play by determining a set of rules that they will follow, and during play they may stop to enforce or discuss the rules. Girls begin to play with an unstructured understanding of the game, and, if problems arise during play, they work on a consensus. Boys games usually have points and winners. Girls' games are often without winners and are "taking turns" games. Turning libraries into a "winning" game could result in something like Reddit, where few women go, or if they do they are reluctant to participate.

And I said: "As a woman, I avoid the win/lose situations because, based on general social status (and definitely in the online society) I am already designated a loser. My position is pre-determined by my sex, so the game is not appealing."

I didn't post this to the site, just emailed it to the owner. It's good that I did not. The response from the blog owner was:
This is very interesting. But I need to see some proof.Some proof. This is truly amazing. Search on Google Scholar for "games children gender differences" and you are overwhelmed with studies.

But it's even more amazing because none of the men who posted their ideas to the site were asked for proof. Their ideas are taken at face value. Of course, they didn't bring up issues of gender, class, or race in their responses, as if these are outside of the discussion of what libraries should be. And to bring them up is an "inconvenience" in the conversation, because the others do not want to hear it.

He also pointed me to a site that is "friendly to women." To that I replied that women decide what is "friendly to women."

I was invited to comment on the blog post, but it is now clear that my comments will not be welcome. In fact, I'd probably only get inundated with posts like "prove it." This does seem to be the response whenever a woman or minority points out an inconvenient truth.

Welcome to my world.

District Dispatch: Big shoes to fill

planet code4lib - Sun, 2015-01-11 05:32

Those who worked with Linda know her’s are big shoes to fill

E-rate Orders aside, the library community is starting the New Year with one less champion. Linda Lord, now former Maine State Librarian, is officially retired and has turned the keys over to her successor, Jaimie Ritter.

No one who knows Linda is at all reticent in talking about her dedication to her home state libraries—nor are those of us who work with her as a national spokesperson for libraries. Her work for ALA’s Office for Information Technology Policy (OITP) could be an encyclopedic list covering at least of decade of advocacy. In her most recent role as Chair of the E-rate Task Force, Linda has been invaluable to advancing library interests at the Federal Communications Commission (FCC), in Congress, and with her colleagues. At the height of the recent E-rate activity at the FCC, we joked with Linda that she should have special frequent flier miles for all the flights from Bangor (ME) to Washington D.C. That, and the fact that Linda’s email was first to pop up under the “Ls” and her phone number was always under “recents” on my phone list are testament to our reliance on her experience, her dogged support, and her willingness to work well beyond her role as a member-leader (a volunteer).

Of course Linda’s work is well respected in her home state as is evidenced by a number of articles and even a television interview as her retirement approached. These stories make it clear Linda builds strong, collaborative relationships with her colleagues, whether staff at the state library, librarians across Maine, and as far away as the Senate in Washington, D.C.

“Linda has done an amazing job making information accessible through libraries and schools across Maine,” said Senator Angus King. “She has the essential leadership qualities of vision, perseverance, willingness to work on the details, and a personality that enables her to collaborate and bring out the best in people. Her leadership at the national level on the E-rate program and other issues has been a huge benefit to Maine. She will always have my profound respect and appreciation for all that she’s accomplished for Maine and for the country.”

I can testify first hand on the difference Linda’s work has made for Maine libraries from my (wonderful) summer trips to Maine. In recent years we have noticed a marked improvement in library WiFi. While my kids love to hike when we travel in rural Maine, they now also are dedicated texters and need to know the next time we will be near a library so they can update friends in between dry periods of no connectivity. While passing through a town I point out the universal library sign and one child will ask, “Is that one of Linda’s libraries? Can we stop?” (knowing that there will be plenty of WiFi to go around).

We are proud to be able to share our own remembrances of Linda’s long tenure working with ALA. While I have long considered Linda “my ALA member,” many others have similar sentiments when asked to share anecdotes about working with Linda. I have included a few here.

Emily Sheketoff, executive director of ALA’s Washington Office reminds us all of Linda’s strong leadership qualities that have won her a respected place on the national stage:

“Linda has always been a strong voice for libraries, so OITP recognized and took great advantage of that. Coming from Maine, she had a soft spot for rural libraries and she became our “go-to” person when we needed an example of the difference a well-connected library can make for small towns or rural communities. When ALA staff use a Maine library as an exemplar the response is something along the lines of “Oh we know Linda Lord” and the point is immediately legitimized. She will be missed as a voice for libraries on the national stage.

As Chair of the ALA E-rate Task Force, Linda has spent countless hours on the phone, on email, in person making sure issues get covered—often asking the hard questions of how a policy course could impact the daily life of the librarian who has to implement or live with a policy. This ability has been invaluable as a gentle (and sometimes like a hurricane) reminder that what we do in D.C. has a very real impact locally. She is quite a leader.”

Linda Schatz, an E-rate consultant who worked with Linda and ALA for many years, describes Linda’s dedication to garnering support for the E-rate program:

“As I think about the many ways in which Linda has impacted the E-rate program, perhaps the most long-lasting has been her diligence in working with Members [of Congress] and their staff. Not only did she take the time to meet with and inform Senators Snowe and Collins about the impact of the E-rate program on Maine libraries, she continued to point out the benefits to all libraries and helped with last minute negotiations through the night to prevent legislation that would have had a negative impact on the program. She didn’t stop her communications when Senator Snowe left the Senate but took the time to meet with Senator King and his staff as well to ensure that they, too, understood the importance of the program to libraries. These communications about the E-rate program as well as the general needs of libraries will long be felt by the library community.”

Linda has the respect she does across ALA staff and members who have had the privilege of seeing her in action in large part because of her warm and sincere manner. “Not many people can bring the same passion for network technology as for early childhood learning, but Linda did. Not only was she an incredibly effective advocate, but I have admired and enjoyed her generous and collaborative spirit for years,” said Larra Clark, deputy director for ALA OITP. Linda easily wins over her audience.

Kathi Peiffer, current Chair of the E-rate Task Force and Pat Ball, member of the joint ALA Committee on Legislation and OITP Telecommunications Subcommittee both highlight these qualities in their recollection of Linda. “She is always gracious and has a wonderful sense of humor. She is the Queen of E-rate! (Kathi). She is always smiling and always gracious and I am glad that I had the opportunity to meet and work with her.  I salute a great librarian and lady.” (Pat)

Alan S. Inouye, director of OITP, puts it well when he says, “Saying “thank you” to Linda Lord is just so inadequate. Her contributions to national policy on E-rate are extensive and range from testifying at the U.S. Senate Commerce Committee and participating on FCC expert panels to chairing innumerable E-rate Task Force meetings (at their notorious Sunday 8:00 am times!). As Maine State Librarian, she has greatly advanced library services and visibility in her state in many ways. I hope that the library community, ALA, and OITP can find a way to continue to avail ourselves of Linda’s expertise and experience—retirement notwithstanding!”

So Alan leaves me with a little hope that I can continue to dream up ways we can call on Linda. As we often tell members who get involved with OITP, it’s very difficult to cut the ties once you join us.

And Linda was worried she might lose touch with library issues. I doubt it.

The post Big shoes to fill appeared first on District Dispatch.

Harvard Library Innovation Lab: Link roundup January 10, 2015

planet code4lib - Sat, 2015-01-10 23:35

Stay warm inside with these links.

Beachbot – YouTube

Beachbot draws art in the sand

Twilights: New Ink Paintings on Vintage Books by Ekaterina Panikanova | Colossal

Ink paintings on grids of vintage books

Library Commons Fly Thru

Drones. Inside! Love seeing the video through the stacks. Canyons = aisles.

How to built an e-book library you can touch

Device frames. I love device frames.

Roller Coaster House – Unique Homes for Sale – Popular Mechanics

This household roller coaster makes me want an in-library version

John Miedema: Lila Slipstream II: Extend reading capabilities by processing content into slips

planet code4lib - Sat, 2015-01-10 16:42

Lila is a cognitive computing system that extends writing capabilities. It also extends reading capabilities.

  1. In a previous post I outlined how an author uses existing writing software to generate “slips” of content. A slip is the unit of text for the Lila cognitive system. The slip has just a few required properties: a subject line, a bit of content, and suggestions for tags and categories. The author generates many slips, hence a “slipstream.” In this post, I show part two of the slipstream for other kinds of content.
  2. In the writing process, an author collects and curates related content generated by dialog with other people, e.g., email and blog commments, or written by other people, e.g., articles and books. This content is usually filtered and managed by the author, but the volume piles up well beyond the author’s ability to read. (Notice the icon in the lower right of item two looks like both a book and a scanner. It is assumed that all content will be digital text.)
  3. Existing technologies such as Google Alerts allow authors to monitor the web for undiscovered but related content generated by anyone in the world. This content abides on the open web, growing daily. The volume easily exceeds an author’s ability to curate let alone read. A Lila curation process will be described later.
  4. The second part of the Lila slipstream is a process that will automatically convert the curated and undiscovered content into slips. The common slip unit format will enable Lila to generate visualizations of the content, enabling the author to read and analyze a high volume of content. The visualization tool will be described later.

Patrick Hochstenbach: Je Suis Charlie cartoon for UGent

planet code4lib - Sat, 2015-01-10 14:33
Filed under: Doodles Tagged: jesuisahmed, jesuischarlie, jesuisjuif

Ted Lawless: OrgRef data as RDF

planet code4lib - Sat, 2015-01-10 05:00

This past fall, Data Salon, a UK-based data services company, released an open dataset about academic and research organizations called OrgRef. The data is available as a CSV and contains basic information about over 30,000 organizations.

OrgRef was created with publishers in mind, and so its main focus is on institutions involved with academic content: universities, colleges, schools, hospitals, government agencies and companies involved in research.

This announcement caught our attention at my place of work because we are compiling information about educational organizations in multiple systems, including a VIVO instance, and are looking for manageable ways to consume Linked Data that will enrich or augment our local systems. Since the OrgRef data has been curated and focuses on a useful subset of data that we are interested in, it seemed to be a good candidate for investigation, even it isn't published as RDF. Due to it's size, it is also easier to work with than attempting to consume and process something like VIAF or DBPedia itself.

Process

We downloaded the OrgRef CSV dataset and used the ever helpful csvkit tool to get handle on what data elements exist.

$ csvstat --unique orgref.csv 1. Name: 31149 2. Country: 210 3. State: 51 4. Level: 3 5. Wikipedia: 31149 6. VIAF: 10764 7. ISNI: 5765 8. Website: 25910 9. ID: 31149

The attributes are will documented by OrgRef. To highlight though, identifiers to other systems are included - Wikipedia Page ID (pageid), ISNI, and VIAF. These identifiers will be important for matching data from other systems or finding more LOD resources later. There is also a link to official organizational home pages. We've found that organizational home pages are surprisingly inconsistently available or not to an official page in other sources, so this is something from OrgRef that we would be interested in using right away.

OrgRef to DBPedia

Since we are working on a project that uses RDF as the data model, we wanted to convert this OrgRef data from CSV to RDF. All the organizations in the dataset (as of the December 2014 download) have Wikipedia page IDs (pageid). DBPedia also includes the pageid so we can lookup the DBPedia URI for each and use that in our RDF representation of the data. Rather that sending 30,000 SPARQL queries to DBPedia, we downloaded the DBPedia to pageid ntriples file from DBPedia and wrote a script to output another CSV with OrgRef ID and DBPedia URI pairs, like below.

orgref-id,uri 1859,http://dbpedia.org/resource/Arizona_State_University 2025,http://dbpedia.org/resource/Crandall_University 2236,http://dbpedia.org/resource/Acadia_University 3712,http://dbpedia.org/resource/Bell_Labs 3768,http://dbpedia.org/resource/Bundestag OrgRef as RDF

With a mapping of OrgRef IDs to DBPedia URIs we were able to create an RDF representation of each organization. For an initial pass, we decided to only use name, pageid, ISNI, VIAF, and websites from OrgRef. A script merged the original OrgRef CSV with our DBPedia URI to OrgRefID CSV and produced triples like the following for a single organization.

<http://dbpedia.org/resource/Michigan_Technological_University> a foaf:Organization ; rdfs:label "Michigan Technological University" ; dbpedia-owl:isniId "0000000106635937" ; dbpedia-owl:viafId "150627130" ; dbpedia-owl:wikiPageID "45893" ; schema:url "http://www.mtu.edu" .

The VIAF information is stored as both a string literal (to aid querying by the identifier later) and as an owl:sameAs relationship, since VIAF is published as Linked Data. For ISNI, we are only storing the literal because, as of January 2015, isn't available as Linked Data.

Publishing for querying with Linked Data Fragments.

With the OrgRef data model as RDF we, decided to use a Linked Data Fragments server to publish and query it. LDF is a specification and software for publishing Linked Data datasets in way that minimizes server side requirements. LDF data can be queried via SPARQL using a client developed by the team or via HTTP requests. Ruben Verborgh, one of the researchers behind the LDF, has a posted a one minute video with a clear summary of the motivations behind the effort.

Following the documentation for the LDF server, we setup an instance on Heroku and loaded it with the OrgRef RDF file. You can query this data at http://ldf-vivo.herokuapp.com/orgref with a LDF client or browse it via the web interface. Due to the design of the LDF server, we are able to publish and query this using a free Heroku instance. See this paper for related, lightweight approaches.

Summary

To wrap up, we found a quality, curated, and targeted dataset available as CSV that we would like to integrate into our local projects that use RDF. Using the identifiers in the CSV file, we were able to match it to Linked Data URIs from DBPedia and create an RDF representation of it. We also published the RDF via Linked Data Fragments for others to browse and query.

Our interest in the OrgRef data doesn't stop here though. We want to make use of it on our local applications, particularly a VIVO instance. I'll write more about that later.

Ranti Junus: The sound version of a Google (old) reCAPTCHA

planet code4lib - Fri, 2015-01-09 23:39

Last month, Google announced the new no-captcha reCAPTCHA that is supposedly more accurate and better at preventing spams. We’ll see how this goes.

In the mean time, plenty of websites that employ Google’s reCAPTCHA still use the old version like this:

The problem with this reCAPTCHA is that it fundamentally doesn’t work with screen readers (among other things, like forcing you crossed your eyes trying to figure out each character in the string.) Some people pointed out that reCAPTCHA offers the sound version (see that little red speaker?) that should mitigate the problem.

Here’s the link to sound version of a Google reCAPTCHA: https://dl.dropboxusercontent.com/u/9074989/google-recaptcha-audio.mp3

This example was taken from the PubMed website and happened to be set as a string of numbers.

Enjoy!

p.s. what is this a about PubMed using inaccessible reCAPTCHA? There are other ways to employ non-captcha security techniques without using that kind solution. :-/

p.p.s. In case you’re curious, I could not decipher two out of the eleven (if I counted it correctly) numbers said in that recording.

CrossRef: CrossRef Staff at the FORCE2015 Conference

planet code4lib - Fri, 2015-01-09 21:31

Ed Pentz, Karl Ward, Geoffrey Bilder and Joe Wass will be attending the FORCE2015 Conference in Oxford, UK.They'll be available to answer any CrossRef related questions. The conference runs 12-13 January. Learn more.

Jonathan Rochkind: Control of information is power

planet code4lib - Fri, 2015-01-09 21:04

And the map is not the territory.

From the Guardian, Cracks in the digital map: what the ‘geoweb’ gets wrong about real streets

“There’s no such thing as a true map,” says Mark Graham, a senior research fellow at Oxford Internet Institute. “Every single map is a misrepresentation of the world, every single map is partial, every single map is selective. And every single map tells a particular story from a particular perspective.”

Because online maps are in constant flux, though, it’s hard to plumb the bias in the cartography. Graham has found that the language of a Google search shapes the results, producing different interpretations of Bangkok and Tel Aviv for different residents. “The biggest problem is that we don’t know,” he says. “Everything we’re getting is filtered through Google’s black box, and it’s having a huge impact not just on what we know, but where we go, and how we move through a city.”

As an example of the mapmaker’s authority, Matt Zook, a collaborator of Graham’s who teaches at the University of Kentucky, demonstrated what happens when you perform a Google search for abortion: you’re led not just to abortion clinics and services but to organisations that campaign against it. “There’s a huge power within Google Maps to just make some things visible and some things less visible,” he notes.

From Gizmodo, Why People Keep Trying To Erase The Hollywood Sign From Google Maps

But the sign is both tempting and elusive. That’s why you’ll find so many tourists taking photos on dead-end streets at the base of the Hollywood Hills. For many years, the urban design of the neighbourhood actually served as the sign’s best protection: Due to the confusingly named, corkscrewing streets, it’s actually not that easy to tell someone how to get to the Hollywood Sign.

That all changed about five years ago, thanks to our suddenly sentient devices. Phones and GPS were now able to aid the tourists immensely in their quests to access the sign, sending them confidently through the neighbourhoods, all the way up to the access gate, where they’d park and wander along the narrow residential streets. This, the neighbours complained, created gridlock, but even worse, it represented a fire hazard in the dry hills — fire trucks would not be able to squeeze by the parked cars in case of an emergency.

Even though Google Maps clearly marks the actual location of the sign, something funny happens when you request driving directions from any place in the city. The directions lead you to Griffith Observatory, a beautiful 1920s building located one mountain east from the sign, then — in something I’ve never seen before, anywhere on Google Maps — a dashed grey line arcs from Griffith Observatory, over Mt. Lee, to the sign’s site. Walking directions show the same thing.

Even though you can very clearly walk to the sign via the extensive trail network in Griffith Park, the map won’t allow you to try.

When I tried to get walking directions to the sign from the small park I suggest parking at in my article, Google Maps does an even crazier thing. It tells you to walk an hour and a half out of the way, all the way to Griffith Observatory, and look at the sign from there.

No matter how you try to get directions — Google Maps, Apple Maps, Bing — they all tell you the same thing. Go to Griffith Observatory. Gaze in the direction of the dashed grey line. Do not proceed to the sign.

Don’t get me wrong, the view of the sign from Griffith Observatory is quite nice. And that sure does make it easier to explain to tourists. But how could the private interests of a handful of Angelenos have persuaded mapping services to make it the primary route?

(h/t Nate Larson)


Filed under: General

Open Library: Open Library heads to the stars

planet code4lib - Fri, 2015-01-09 20:29

We are excited to announce that the Open Library metadata, pointing to the growing collection of content housed by Internet Archive, has been selected for inclusion in the core archive of Outernet. If you are not familiar with Outernet, they’re calling themselves Humanity’s Public Library and they want to increase access to information for people around the world. Read more here (they’ve got a funding thing happening as well). In their own words

Currently, 2/3 of humanity lacks Internet access. Outernet wants to broadcast humanity’s best work to the entire world from space. For free. They believe that no one should be denied a basic level of information due to wealth, geography, political environment, or infrastructure. Furthermore, every person should be able to participate in the global marketplace of ideas. They are currently live on four continents with more to come. Users can build their own receiver or purchase one.

Inclusion of Open Library metadata will help Outernet users understand the breadth of content that is available. We’re happy to help get more information to more people.

OCLC Dev Network: OCLC LC Name Authority File (LCNAF) Temporarily Unavailable

planet code4lib - Fri, 2015-01-09 20:00

The LC Name Authority File (LCNAF) is temporarily unavailable due to a problem at the data source level. While users will find that this experimental service is up and running, no data is currently available. The good news for users particularly interested in this data is that you can now access it from the Library of Congress directly at http://id.loc.gov/.

Jenny Rose Halperin: Reading Highlights 2014

planet code4lib - Fri, 2015-01-09 18:06

I did this last year too, but here are some of the best books that I read this year. I tend to read a bit haphazardly and mostly fiction, but here’s the list of books that surprised or excited me most in 2014. I can honestly say that this year I only read a few duds and that most of my reading life was very rich!

Fiction: I read a lot of Angela Carter this year, including Burning Your Boats (her collection of short stories,) Wise Children, which is so wildly inventive, and Nights at the Circus, which many consider to be her best. She remains my favorite author and I am glad she has such a large catalog. Each book is like a really delicious fruit.

Perhaps the most surprising book I read this year was The Name of the Rose by Umberto Eco. I picked it up in a used bookstore in London and found it thrilling. I would love to read more monastery murder mysteries.

In the British romances category, standouts include The Enchanted April, Persuasion, Emma, Sense and Sensibility, and Far From the Madding Crowd.
British romances are my comfort food, and I always turn to them when I don’t know what to read next. I find most through browsing Project Gutenberg and seeing what I haven’t read yet. I love Project Gutenberg and think that the work they’re doing is incredibly important.

I devoured Mavis Gallant’s Paris Stories collection from the NYRB back in January and was very sad when she passed.

In German, I read only one book, which was Schachnovelle by Steftan Zweig. I read it because of the Grand Budapest Hotel connection and it was as good as promised.

I finished off my year with Snow by Orhan Pamuk, which I highly recommend! It is particularly prescient now and asks important questions about Western hegemony, art, and religion.

Memoir: I had somehow missed Heartburn by Nora Ephron and have recommended it to everyone, though it’s halfway between memoir and fiction. It is so smart, so funny, and so bitchy, like the best romcom.

Because a bunch of people have asked me: I had very mixed feelings about Not that Kind of Girl by Lena Dunham. The stories in the collection weren’t novel or exciting; the narratives had appeared in her work repeatedly and seemed like a rehashing of the most boring parts of Girls or Tiny Furniture. By the time she got to the section about her food diary, I honestly wondered if anyone had even thought to edit this work. In all, I found it smug and poorly written.

My Berlin Kitchen: A Love Story by Luisa Weiss was a lovely book about remembrance, identity, and food.

Non-fiction: My team read Cultivating Communities of Practice by Etienne Wegner, and it made a massive impression on me and my work. It is a very brilliant book!

I am cheating a bit here because I just finished it this week, but Don’t Make me Think by Steve Krug was also fantastic and asked all the important questions about usability, testing, and the Web.

Reinventing Organizations by Frederic Laloux made some interesting claims and I am not quite sure what to make of it still, but definitely gave me food for thought.

If you don’t yet use it, Safari Books Online is the best tool for discovering literature in your field, both in terms of platform and content.

Historical fiction: I didn’t read so much in this category this year, but what I did was amazing. In the Garden of Beasts by Erik Larson was so well-researched and engrossing. I am officially a Larson convert!  The Orientalist by Tom Reiss was incredibly exciting as well.

Honorable Mentions: In the field of community management, Jono Bacon’s The Art of Community is a classic. I liked it very much, but found its emphasis on “meritocracy” deeply problematic.

I picked up Good Poems, an anthology by Garrison Keillor at a library sale last month and it is a delight! I leave it on my kitchen table to read while hanging around.

Nicholson Baker is such a good writer, so The Way the World Works was enjoyable, though not my favorite of his.

Feel free to share your favorites as well! Here’s to a 2015 full of even more books!

 

Eric Lease Morgan: Hands-on text analysis workshop

planet code4lib - Fri, 2015-01-09 16:42

I have all but finished writing a hands-on text analysis workshop. From the syllabus:

The purpose of this 5­-week workshop is to increase the knowledge of text mining principles among participants. By the end of the workshop, students will be able to describe the range of basic text mining techniques (everything from the creation of a corpus, to the counting/tabulating of words, to classification & clustering, and visualizing the results of text analysis) and have garnered hands­-on experience with all of them. All the materials for this workshop are available online. There are no prerequisites except for two things: 1) a sincere willingness to learn, and 2) a willingness to work at a computer’s command line interface. Students are really encouraged to bring their own computers to class.

The workshop is divided into the following five, 90-minute sessions, one per week:

  1. Overview of text mining and working from the command line
  2. Building a corpus
  3. Word and phrase frequencies
  4. Extracting meaning with dictionaries, parts­of­speech analysis, and named entity recognition
  5. Classification and topic modeling

For better or for worse, the workshop’s computing environment will be the Linux command line. Besides the usual command-line suspects, participants will get their hands dirty with wget, tika, a bit of Perl, a lot of Python, Wordnet, Treetagger, Standford’s Named Entity Recognizer, and Mallet.

For more detail, see the syllabus, sample code, and corpus.

Eric Lease Morgan: distance.cgi – My first Python-based CGI script

planet code4lib - Fri, 2015-01-09 16:10

Yesterday I finished writing my first Python-based CGI script — distance.cgi. Given two words, it allows the reader to first disambiguate between various definitions of the words, and second, uses Wordnet’s network to display various relationships (distances) between the resulting “synsets”. (Source code is here.)

Reader input

Disambiguate

Display result

The script relies on Python’s Natural Language Toolkit (NLTK) which provides an enormous amount of functionality when it comes to natural language processing. I’m impressed. On the other hand, the script is not zippy, and I am not sure how performance can be improved. Any hints?

Jonathan Rochkind: Fraud in scholarly publishing

planet code4lib - Fri, 2015-01-09 15:38

Should librarianship be a field that studies academic publishing as an endeavor, and works to educate scholars and students to take a critical perspective?  Some librarians are expected/required to publish for career promotion, are investigations in this area something anyone does?

From Scientific American, For Sale: “Your Name Here” in a Prestigious Science Journal:

Klaus Kayser has been publishing electronic journals for so long he can remember mailing them to subscribers on floppy disks. His 19 years of experience have made him keenly aware of the problem of scientific fraud. In his view, he takes extraordinary measures to protect the journal he currently edits, Diagnostic Pathology. For instance, to prevent authors from trying to pass off microscope images from the Internet as their own, he requires them to send along the original glass slides.

Despite his vigilance, however, signs of possible research misconduct have crept into some articles published in Diagnostic Pathology. Six of the 14 articles in the May 2014 issue, for instance, contain suspicious repetitions of phrases and other irregularities. When Scientific American informed Kayser, he was apparently unaware of the problem. “Nobody told this to me,” he says. “I’m very grateful to you.”

[…]

The dubious papers aren’t easy to spot. Taken individually each research article seems legitimate. But in an investigation by Scientific American that analyzed the language used in more than 100 scientific articles we found evidence of some worrisome patterns—signs of what appears to be an attempt to game the peer-review system on an industrial scale.

[…]

A quick Internet search uncovers outfits that offer to arrange, for a fee, authorship of papers to be published in peer-reviewed outlets. They seem to cater to researchers looking for a quick and dirty way of getting a publication in a prestigious international scientific journal.

This particular form of the for-pay mad-libs-style research paper appears to be prominent  mainly among researchers in China. How can we talk about this without accidentally stooping to or encouraging anti-Chinese racism or xenophobia?   There are other forms of research fraud and quality issues which are prominent in the U.S. and English-speaking research world too.  If you follow this theme of scholarly quality issues, as I’ve been trying to do casually, you start to suspect the entire scholarly publishing system, really.

We know, for instance, that ghost-written scholarly pharmaceutical articles are not uncommon in the U.S. too.   Perhaps in the U.S. scholarly fraud is more likely to come for ‘free’ from interested commercial entities, then by researchers paying ‘paper salesmen’ for poor quality papers.  To me, a paper written by a pharmaceutical company employer but published under the name of an ‘independent’ researcher is arguably a worse ethical violation, even if everyone involved can think “Well, the science is good anyway.”  It also wouldn’t shock me if very similar systems to China’s paper-for-sale industry exist in the U.S., on a much smaller scale, but they are more adept at avoiding reuse of nonsense boilerplate, making it harder to detect. Presumably the Chinese industry will get better at avoiding detection too, or perhaps already is at a higher end of the market.

In both cases, the context is extreme career pressure to ‘publish or perish’, into a system that lacks the ability to actually ascertain research quality sufficiently, but which the scholarly community believes has that ability.

Problems with research quality, don’t end here, they go on and on, and are starting to get more attention.

  • An article from the LA Times from Oct 2013,
    Science has lost its way, at a big cost to humanity: Researchers are rewarded for splashy findings, not for double-checking accuracy. So many scientists looking for cures to diseases have been building on ideas that aren’t even true.” (and the HN thread on it).
  • From the Economist, also from last year, “Trouble at the lab: Scientists like to think of science as self-correcting. To an alarming degree, it is not.”
  • From Nature August 2013 (was 2013 the year of discovering scientific publishing ain’t what we thought?), “US behavioural research studies skew positive:
    Scientists speculate ‘US effect’ is a result of publish-or-perish mentality.

There are also individual research papers investigating particular issues, especially statistical methodology problems, in scientific publishing.  I’m not sure if there are any scholarly papers or monographs which take a big picture overview of the crisis in scientific publishing quality/reliability — anyone know of any?

To change the system, we need to understand the system — and start by lowering confidence in the capabilities of existing ‘gatekeeping’.  And the ‘we’ is the entire cross-disciplinary community of scholars and researchers. We need an academic discipline and community devoted to a critical examination of scholarly research and publishing as a social and scientific phenomenon, using social science and history/philosophy of science research methods; a research community (of research on research) which is also devoted to education of all scholars, scientists, and students into a critical perspective.   Librarians seem well situated to engage in this project in some ways, although in others it may be unrealistic to expect.


Filed under: General

Islandora: Announcing the Islandora GIS Interest Group

planet code4lib - Fri, 2015-01-09 14:34

A new Islandora Interest Group has been convened by James Griffin of Lafayette College Libraries. The GIS IG is looking for interested members to join in discussions about how to handle geospatial data sets in Islandora. As with our other Interest Groups, group documents and membership details are handled through GitHub.

  • The primary objective of this interest group is to aim to release a set of functionality that provides for members of the Islandora Community the ability to ingest, browse, and discover geospatial data sets
    • Considered to be essential to this solution is the ability to visualize geospatial vector and raster data sets (features and coverages), as well as the ability to index Fedora Commons Objects within Apache Solr using key geographic metadata fields
  • In doing do, this interest group addresses the preservation, access, and discovery of geospatial data sets (these data sets, not being limited to but including Esri Shapefiles [vector data sets] and images in the GeoTIFF [raster data sets]).
  • It enables and structures descriptive and technical metadata for these data sets (including the usage of ISO 19139-compliant XML Documents referred to as "Federal Geographic Data Committee" (FGDC) Documents and MODS Documents)
  • It explores and defines a series of best practices for the generation of common and open standards for the serialization of entities within geospatial data sets (such as the Keyhole Markup Language Documents and GeoJSON Objects)
  • It captures and manages user stories involving the management of geospatial metadata, and to refactor these into feature or improvement requests for the solution being implemented by the Islandora Community

If you are interested in joining up, please reply on this listserv thread or contact the convenor.

Library of Congress: The Signal: Web Archive Management at NYARC: An NDSR Project Update

planet code4lib - Fri, 2015-01-09 14:21

The following is a guest post by Karl-Rainer Blumenthal, National Digital Stewardship Resident at the New York Art Resources Consortium (NYARC).

A tipping point from traditional to emergent digital technologies in the regular conduct of art historical scholarship threatens to leave unprepared institutions and their researchers alike in a “digital black hole.” NYARC–the partnership of the Frick Art Reference Library, the Museum of Modern Art Library and the Brooklyn Museum Library & Archives–seeks to institute permanent and precedent-setting collecting programs for born-digital primary source materials that make this black hole significantly more gray.

Since the 2013 grant from the Andrew W. Mellon Foundation, for instance, NYARC has archived the web presences of its partner museums and those of prominent galleries, auction houses, artists, provenance researchers and others within their traditional collecting scopes. While working to define description standards and integrating access points with those of traditional resources, NYARC has further leveraged this leadership opportunity by designing this current National Digital Stewardship Residency project, which is to concurrently prepare their nascent collections for long-term management and preservation.

Archiving MoMA’s many exhibit sites preserves them for future art historians, but only if critical elements aren’t lost in the process.

Stewarding web archives to the future generations that will learn from them requires careful planning and policymaking. Sensitive preservation description and reliable storage and backup routines will ultimately determine the accessibility of these benchmarks of our online culture for future librarians, archivists, researchers and students. Before we can plan and prepare for the long term, however, it is incumbent upon those of us with responsibility to steward especially visually rich and complex cultural artifacts to assure their integrity at the point of collection–to assure their faithful rendition of the extent, behavior and appearance of visual information transmitted over this uniquely visual medium.

Quality assurance (QA)–the process of verifying and/or making the interventions necessary to improve the accuracy and integrity of archived web-based resources at the point of their collection–was therefore the logical place to begin defining long term stewardship needs.  As I quickly discovered, though, it also happens to be one of the slipperiest issues for even experienced web archivists. Like putting together a jigsaw puzzle, its success begins with having all of the right pieces, then requires fitting those pieces together in the correct order and sequence, and ultimately hinges on the degree to which our final product’s ‘look and feel’ resembles that of our original vision.

Unless and until the technologies that we use to crawl and capture content from the live web can simply replicate every conceivable experience that any human browser may have online, we are compelled to decide which specific properties of equally sprawling and ephemeral web presences are of primary significance to our respective missions and patrons, and which therefore demand our most assiduous and resource-intensive pursuit.

Determining those priority areas and then finding the requisite time and manpower to do them justice is challenging enough to any web archiving operation. To a multi-institutional partnership sharing responsibility for aesthetically diverse but equally rich and complex web designs, it’s enough to stop you right in your tracks. To keep NYARC’s small army of graduate student QA technicians all moving in the same direction as efficiently as possible, and to sustain a model of their work beyond the end of their grant-funded terms, I’ve therefore spent the bulk of this first phase to my NDSR project building towards the following procedural reference guide. I now welcome the broader web archiving community to review, discuss and adapt this to their own use:

This living document will be updated to reflect technical and practical developments throughout and beyond the remainder of my residency. In the meantime, it will provide NYARC’s decision-makers, and others who are designing permanent web archiving programs, an executive summary of the principles and technologies that influence the potential scopes of QA work. Its procedural guidelines walk our QA technicians through their regular assessment and documentation process. Perhaps most importantly, this roadmap directs them to the areas where they may make meaningful interventions, indicates where they alternatively must rely on help from our software service providers, Archive-It, and flags where future technical development still precludes any potential for improvement. Finally, it inventories the major problem areas and improvement strategies presently known to NYARC to make or break the whole process.

This iteration of NYARC’s documentation is the product of expansive literature review, hands-on QA work, regular consultation and problem solving with interns and professional staff, and the generous advice of colleagues throughout the community. As such, it has prepared me not only for upcoming NDSR project phases focused on preservation metadata and archival storage, but also for a much longer career in digital preservation.

As any such project must, it hinges the success of any rapidly acquired technical knowledge or expertise to equally effective project management, communication and open documentation–skill sets that every emergent professional must cultivate in order to have a permanent role in the stewardship of our always tumultuous digital culture. I’m sure that this small documentation effort will provide NYARC, and similar partners in the field, with the tools to improve the quality of their web archives. Also, I sincerely hope that it provides a model of practice to sustain such improvements over radical and unforeseen technological changes–that it makes the digital black hole just a little more gray.

Shelley Gullikson: Web Rewriting Sprint

planet code4lib - Fri, 2015-01-09 14:08

At the end of October, I was watching tweets coming out of a UX webinar and saw this:

I thought it sounded great, so ran it by Web Committee that same week and we scheduled a sprint for the end of term. Boom. I love it when an idea turns into a plan so quickly!

We agreed that we needed common guidelines for editing the pages. I planned to point to an existing writing guide, but decided to draft one using examples from our own site.

I put together a spreadsheet of all the pages linked directly from the home page or navigation menus, plus all the pages owned by admin or by me. Subject guides and course guides were left out. The committee decided to start with content owned by committee members, rather than asking permission to edit other staff members’ content. We prioritized the resulting list of 57 pages (well, 57 chunks of content – some of those were Drupal “books” with multiple pages).

Seven of us got together on an early December afternoon (six in the room, one online from the East Coast). Armed with snacks, we spent 90 minutes editing and got through most of our top and mid-priority pages.

It was a very positive experience. We got a second set of eyes on content that may have only ever been looked at by one person. We were able to talk to each other to get feedback on clear and concise wording. And we saw pages that were already pretty good, which was a nice feeling too.

We’ve organized another sprint for reading week in February. We’re going to look at the top priority pages again, to see if we can make them even clearer and more concise.

 


DPLA: From Book Patrol: Happy New Calendar!

planet code4lib - Fri, 2015-01-09 14:00

Now that we have our new calendar in place to help track the year ahead let’s have a look back at some of the thousands of calendars available for your perusal at the DPLA. Derived from the Latin word kalendae, which was the name of the first day of every month, there are as many varieties of calendars as there are days of the month.

From a 12th century Book of Hours to a 16th century perpetual calendar to a Native American calendar on buckskin to a handwritten calendar by Lee Harvey Oswald, there is no shortage of creative ways to track time and in many cases to advertise ones business.

Enjoy!

Pages

Subscribe to code4lib aggregator