You are here

Feed aggregator

Open Knowledge Foundation: How Open Data Can Change Pakistan

planet code4lib - Mon, 2015-03-09 10:47

This is a cross-post from the brand new Open Knowledge Pakistan Local Group blog. To learn more about (and get in touch with) the new community in Pakistan, go here.

Pakistan is a small country with a high population density. Within 796,096 square kilometres of its territory, Pakistan has a population of over 180 million people. Such a large population poses immense responsibilities on the government. Majority of the population in Pakistan is uneducated, living in rural areas, with a growing influx of the rural people to the urban areas. Thus we can say that the rate of urbanization in Pakistan is raising rapidly. This is a major challenge to the civic planners and the Government of Pakistan.

Urban population (% of total)

State Library of Denmark: Net archive indexing, round 2

planet code4lib - Mon, 2015-03-09 10:30

Using our experience from our initial net archive search setup, Thomas Egense and I have been tweaking options and adding patches to the fine webarchive-discovery from UKWA for some weeks. We will be re-starting indexing Real Soon Now. So what have we learned?

  • Stored text takes up a huge part of the index: Nearly half of the total index size. The biggest sinner is not surprisingly the content field, but we need that for highlighting and potentially text extraction from search results. As we have discovered that we can avoid storing DocValued fields, at the price of increased document retrieval time, we have turned off storing for several fields.
  • DocValue everything! Or at least a lot more than we did initially. Enabling DocValues for a field and getting low-overhead faceting turned out to be a lot disk-space-cheaper than we thought. As every other feature request from the researchers seems to be “We would also like to facet on field X”, our new strategy should make them at least half happy.
  • DocValues are required for some fields. Due to internal limits on facet.method=fc without DocValues, it is simply not possible to do faceting if the number of references gets high.
  • Faceting on outgoing links is highly valuable. Being able to facet on links makes it possible to generate real-time graphs for interconnected websites. Links with host- or domain granularity are easily handled and there is no doubt that those should be enabled. Based on posivitive experimental results with document-granularity links faceting (see section below), we will also be enabling that.
  • The addition of performance instrumentation made it a lot easier for us to prioritize features. We simply do not have time for everything we can think of and some specific features were very heavy.
  • Face recognition (just finding the location of faces in images, not guessing the persons)  was an interesting feature, but with a so-so success rate. Turning it on for all images would triple our indexing time and we have little need for sampling in this area, so we will not be doing it at all for this iteration.
  • Most prominent colour extraction was only somewhat heavy, but unfortunately the resulting colour turned out to vary a great deal depending on adjustment of extraction parameters. This might be useful if a top-X of prominent colours were extracted, but for now we have turned off this feature.
  • Language detection is valuable, but processing time is non-trivial and rises linear with the number of languages to check. We lowered the number of detected languages from 20 to 10, pruning the more obscure (relative to Danish) languages.
  • Meta-data about harvesting turned out to be important for the researchers. We will be indexing the ID of the harvest-job used for collecting the data, the institution responsible and some specific sub-job-ID.
  • Disabling of image-analysis features and optimization of part of the code-base means faster indexing. Our previous speed was 7-8 days/shard, while the new one is 3-4 days/shard. As we has also doubled our indexing hardware capacity, we expect to do a full re-build of the existing index in 2 months and catching up to the present within 6 months.
  • Our overall indexing workflow, with dedicated builders creating independent shards of a fixed size, worked very well for us. Besides some minor tweaks, we will not be changing this.
  • We have been happy with Solr 4.8. Solr 5 is just out, but as re-indexing is very costly for us, we do not feel comfortable with a switch at this time. We will do the conservative thing and stick to the old Solr 4-series, which currently means Solr 4.10.4.
Document-level links faceting

The biggest new feature will be document links. This is basically all links present on all web pages at full detail. For a single test shard with 217M documents / 906GB, there were 7 billion references to 640M unique links, the most popular link being used 2.4M times. Doing a full faceted search on *:* was understandable heavy at around 4 minutes, while ad hoc testing of “standard” searches resulted in response times varying from 50 ms to 3500 ms. Scaling up to 25 shards/machine, it will be 175 billion references to 16 billion values. It will be interesting to see the accumulated response time.

We expect this feature to be used to generate visual graphs of interconnected resources, which can be navigated in real-time. Or at least you-have-to-run-to-get-coffee-time. For the curious, here is the histogram for links in the test-shard:

References #terms 1 425,799,733 2 85,835,129 4 52,695,663 8 33,153,759 16 18,864,935 32 10,245,205 64 5,691,412 128 3,223,077 256 1,981,279 512 1,240,879 1,024 714,595 2,048 429,129 4,096 225,416 8,192 114,271 16,384 45,521 32,768 12,966 65,536 4,005 131,072 1,764 262,144 805 524,288 789 1,048,576 123 2,097,152 77 4,194,304 1

 


Chris Beer: LDPath in 3 examples

planet code4lib - Sun, 2015-03-08 00:00

At Code4Lib 2015, I gave a quick lightning talk on LDPath, a declarative domain-specific language for flatting linked data resources to a hash (e.g. for indexing to Solr).

LDPath can traverse the Linked Data Cloud as easily as working with local resources and can cache remote resources for future access. The LDPath language is also (generally) implementation independent (java, ruby) and relatively easy to implement. The language also lends itself to integration within development environments (e.g. ldpath-angular-demo-app, with context-aware autocompletion and real-time responses). For me, working with the LDPath language and implementation was the first time that linked data moved from being a good idea to being a practical solution to some problems.

Here is a selection from the VIAF record [1]:

<> void:inDataset <../data> ; a genont:InformationResource, foaf:Document ; foaf:primaryTopic <../65687612> . <../65687612> schema:alternateName "Bittman, Mark" ; schema:birthDate "1950-02-17" ; schema:familyName "Bittman" ; schema:givenName "Mark" ; schema:name "Bittman, Mark" ; schema:sameAs <http://d-nb.info/gnd/1058912836>, <http://dbpedia.org/resource/Mark_Bittman> ; a schema:Person ; rdfs:seeAlso <../182434519>, <../310263569>, <../314261350>, <../314497377>, <../314513297>, <../314718264> ; foaf:isPrimaryTopicOf <http://en.wikipedia.org/wiki/Mark_Bittman> .

We can use LDPath to extract the person’s name:

So far, this is not so different from traditional approaches. But, if we look deeper in the response, we can see other resources, including books by the author.

<../310263569> schema:creator <../65687612> ; schema:name "How to Cook Everything : Simple Recipes for Great Food" ; a schema:CreativeWork .

We can traverse the links to include the titles in our record:

LDPath also gives us the ability to write this query using a reverse property selector, e.g:

books = foaf:primaryTopic / ^schema:creator[rdf:type is schema:CreativeWork] / schema:name :: xsd:string ;

The resource links out to some external resources, including a link to dbpedia. Here is a selection from record in dbpedia:

<http://dbpedia.org/resource/Mark_Bittman> dbpedia-owl:abstract "Mark Bittman (born c. 1950) is an American food journalist, author, and columnist for The New York Times."@en, "Mark Bittman est un auteur et chroniqueur culinaire américain. Il a tenu une chronique hebdomadaire pour le The New York Times, appelée The Minimalist (« le minimaliste »), parue entre le 17 septembre 1997 et le 26 janvier 2011. Bittman continue d'écrire pour le New York Times Magazine, et participe à la section Opinion du journal. Il tient également un blog."@fr ; dbpedia-owl:birthDate "1950+02:00"^^<http://www.w3.org/2001/XMLSchema#gYear> ; dbpprop:name "Bittman, Mark"@en ; dbpprop:shortDescription "American journalist, food writer"@en ; dc:description "American journalist, food writer", "American journalist, food writer"@en ; dcterms:subject <http://dbpedia.org/resource/Category:1950s_births>, <http://dbpedia.org/resource/Category:American_food_writers>, <http://dbpedia.org/resource/Category:American_journalists>, <http://dbpedia.org/resource/Category:American_television_chefs>, <http://dbpedia.org/resource/Category:Clark_University_alumni>, <http://dbpedia.org/resource/Category:Living_people>, <http://dbpedia.org/resource/Category:The_New_York_Times_writers> ;

LDPath allows us to transparently traverse that link, allowing us to extract the subjects for VIAF record:

[1] If you’re playing along at home, note that, as of this writing, VIAF.org fails to correctly implement content negotiation and returns HTML if it appears anywhere in the Accept header, e.g.:

curl -H "Accept: application/rdf+xml, text/html; q=0.1" -v http://viaf.org/viaf/152427175/

will return a text/html response. This may cause trouble for your linked data clients.

Code4Lib: Code4Lib 2016 will be in Philadelphia

planet code4lib - Sat, 2015-03-07 23:40
Topic: code4lib2016

Code4Lib 2016 will be in Philadelphia, PA. The conference hosting proposal gives an idea of what it will be like. All necessary information will be available here as planning develops, and in the Code4Lib2016 category on the wiki.

District Dispatch: Call for Nominations: Robert L. Oakley Memorial Scholarship

planet code4lib - Fri, 2015-03-06 19:02

Bob Oakley

Librarians interested in intellectual property, public policy and copyright have until June 1, 2015, to apply for the Robert L. Oakley Memorial Scholarship. The annual $1,000 scholarship, which was developed by the American Library Association and the Library Copyright Alliance, supports research and advanced study for librarians in their early-to-mid-careers.

Applicants should provide a statement of intent for use of the scholarship funds. Such a statement should include the applicant’s interest and background in intellectual property, public policy, and/or copyright and their impacts on libraries and the ways libraries serve their communities.

Additionally, statements should include information about how the applicant and the library community will benefit from the applicant’s receipt of scholarship. Statements should be no longer than three pages (1000 words). The applicant’s resume or curriculum vitae should be included in their application.

Applications must be submitted via e-mail to Carrie Russell, crussell@alawash.org. Awardees may receive the Robert L. Oakley Memorial Scholarship up to two times in a lifetime. Funds may be used for equipment, expendable supplies, travel necessary to conduct, attend conferences, release from library duties or other reasonable and appropriate research expenses.

The award honors the life accomplishments and contributions of Robert L. Oakley. Professor and law librarian Robert Oakley was an expert on copyright law and wrote and lectured on the subject. He served on the Library Copyright Alliance representing the American Association of Law Librarians and played a leading role in advocating for U.S. libraries and the public they serve at many international forums including those of the World Intellectual Property Organization and United Nations Educational Scientific and Cultural Organization.

Oakley served as the United States delegate to the International Federation of Library Associations Standing Committee on Copyright and Related Rights from 1997-2003. Mr. Oakley testified before Congress on copyright, open access, library appropriations and free access to government documents and was a member of the Library of Congress’ Section 108 Study Group. A valued colleague and mentor for numerous librarians, Oakley was a recognized leader in law librarianship and library management who also maintained a profound commitment to public policy and the rights of library users.

The post Call for Nominations: Robert L. Oakley Memorial Scholarship appeared first on District Dispatch.

LITA: Librarians, Take the Struggle Out of Statistics

planet code4lib - Fri, 2015-03-06 18:50

Check out the brand new LITA web course:
Taking the Struggle Out of Statistics 

Instructor: Jackie Bronicki, Collections and Online Resources Coordinator, University of Houston.

Offered: April 6 – May 3, 2015
A Moodle based web course with asynchronous weekly lectures, tutorials, assignments, and group discussion.

Register Online, page arranged by session date (login required)

Recently, librarians of all types have been asked to take a more evidence-based look at their practices. Statistics is a powerful tool that can be used to uncover trends in library-related areas such as collections, user studies, usability testing, and patron satisfaction studies. Knowledge of basic statistical principles will greatly help librarians achieve these new expectations.

This course will be a blend of learning basic statistical concepts and techniques along with practical application of common statistical analyses to library data. The course will include online learning modules for basic statistical concepts, examples from completed and ongoing library research projects, and also exercises accompanied by practice datasets to apply techniques learned during the course.

Got assessment in your title or duties? This brand new web course is for you!

Here’s the Course Page

Jackie Bronicki’s background is in research methodology, data collection and project management for large research projects including international dialysis research and large-scale digitization quality assessment. Her focus is on collection assessment and evaluation and she works closely with subject liaisons, web services, and access services librarians at the University of Houston to facilitate various research projects.

Date:
April 6, 2015 – May 3, 2015

Costs:

  • LITA Member: $135
  • ALA Member: $195
  • Non-member: $260

Technical Requirements

Moodle login info will be sent to registrants the week prior to the start date. The Moodle-developed course site will include weekly asynchronous lectures and is composed of self-paced modules with facilitated interaction led by the instructor. Students regularly use the forum and chat room functions to facilitate their class participation. The course web site will be open for 1 week prior to the start date for students to have access to Moodle instructions and set their browser correctly. The course site will remain open for 90 days after the end date for students to refer back to course material.

Registration Information

Register Online page arranged by session date (login required)
OR
Mail or fax form to ALA Registration
OR
Call 1-800-545-2433 and press 5
OR
email registration@ala.org

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4269 or Mark Beatty, mbeatty@ala.org.

Harvard Library Innovation Lab: Link roundup March 6, 2015

planet code4lib - Fri, 2015-03-06 17:08

Disney, tanks, Pantone, Bingo and the paperback book.

Raul Lemesoff’s Driveable Library | Mental Floss

Tank bookmobile weapon of mass instruction

Libraries are more popular than Disneyland?

Library visits vs. major tourist attractions

humanæ

Portraits with the exact Pantone color of the skin tone set as the background

Even Composting Comes With Sticker Shock – NYTimes.com

Composting company has customers collect troublesome fruit stickers on a Bingo card to receive free compost.

A Tribute to the Printer Aldus Manutius, and the Roots of the Paperback

The roots of the paperback. Pop into the Grolier Club for a fascinating exhibit.

District Dispatch: Archived webinar on 3D printing available

planet code4lib - Fri, 2015-03-06 17:00

from British Library Sound Archive

Wondering about the legal issues involved with 3D printing and how the library can protect itself from liability when patrons use these technologies in library spaces? Check out our latest archived webinar, “3D printing: policy and intellectual property law”.

The webinar was presented by Charlie Wapner, Policy Analyst (OITP) and Professor Tom Lipinski, Director of the University of Wisconsin-Milwaukee’s I-School.

The post Archived webinar on 3D printing available appeared first on District Dispatch.

CrossRef: New CrossRef Members

planet code4lib - Fri, 2015-03-06 16:12

Updated March 2, 2015

Voting Members

Asian Scientific Publishers
Global Business Publications
Institute of Polish Language
Journal of Case Reports
Journal Sovremennye Tehnologii v Medicine
Penza Psychological Newsletter
QUASAR, LLC
Science and Education, Ltd.
The International Child Neurology Association (ICNA)
Universidad de Antioquia

Represented Members
Balkan Journal of Electrical & Computer Engineering (BAJECE)
EIA Energy in Agriculture
Faculdade de Enfermagem Nova Esperanca
Faculdade de Medicina de Sao Jose do Rio Preto - FAMERP
Gumushane University Journal of Science and Technology Institute
Innovative Medical Technologies Development Foundation
Laboratorio de Anatomia Comparada dos Vertebrados
Nucleo para o Desenvolvimento de Tecnologia e Ambientes Educacionais (NPT)
The Journal of International Social Research
The Korean Society for the Study of Moral Education
Turkish Online Journal of Distance Education
Uni-FACEF Centro Universitario de Franca
Yunus Arastirma Bulteni

Last update February 23, 2015

Voting Members
Asia Pacific Association for Gambling Studies
Associacao Portguesa de Psicologia
Czestochowa University of Technology
Faculty of Administration, University of Ljubljana
Hipatia Press
Indonesian Journal of International Law
International Society for Horticultural Science (ISHS)
Journal of Zankoy Sulaimani - Part A
Methodos.revista de ciencias sociales
Paediatrician Publishers LLC
Physician Assistant Education Association
Pushpa Publishing House
ScienceScript, LLC
Smith and Frankling Academic Publishing Corporation, Ltd, UK
Sociedade Brasileira de Psicologia Organizacional e do Trabalho
Tambov State Technical University
Universidad de Jaen
University of Sarajevo Faculty of Health Sciences

Represented Members
Bitlis Eren University Journal of Science and Technology
Erciyes Iletisim Dergisi
Florence Nightingale Journal of Nursing
IFHAN
Inonu University Journal of the Facult of Education
International Journal of Informatics Technologies
P2M Invest
Saglik Bilimleri ve Meslekleri Dergisi
Samara State University of Architecture and Civil Engineering
Ufa State Academy of Arts

CrossRef: CrossRef Indicators

planet code4lib - Fri, 2015-03-06 15:05

Updated March 2, 2015

Total no. participating publishers & societies 5877
Total no. voting members 3164
% of non-profit publishers 57%
Total no. participating libraries 1931
No. journals covered 38,086
No. DOIs registered to date 72,500,322
No. DOIs deposited in previous month 469,198
No. DOIs retrieved (matched references) in previous month 39,460,869
DOI resolutions (end-user clicks) in previous month 131,824,772

Open Knowledge Foundation: Walkthrough: My experience building Australia’s Regional Open Data Census

planet code4lib - Fri, 2015-03-06 12:47

On International Open Data Day (21 Feb 2015) Australia’s Regional Open Data Census launched. This is the story of the trials and tribulations in launching the census.

Getting Started

Like many open data initiatives come to realise, after filling up a portal with lots of open data, there is a need for quality as well as quantity. I decided to tackle improving the quality of Australia’s open data as part of my Christmas holiday project.

I decided to request a local open data census on 23 Dec (I’d finished my Christmas shopping a day early). While I was waiting for a reply, I read the documentation – it was well written and configuring a web site using Google Sheets seemed easy enough.

The Open Knowledge Local Groups team contacted me early in the new year and introduced me to Pia Waugh and the team at Open Knowledge Australia. Pia helped propose the idea of the census to the leaders of Australia’s state and territory government open data initiatives. I was invited to pitch the census to them at a meeting on 19 Feb – Two days before International Open Data Day.

A plan was hatched

On 29 Jan I was informed by Open Knowledge that the census was ready to be configured. Could I be ready be launch in 25 days time?

Configuring the census was easy. Fill in the blanks, a list of places, some words on the homepage, look at other census and re-use some FAQ, add a logo and some custom CSS. However, deciding on what data to assess brought me to a screaming halt.

Deciding on data

The Global census uses data based on the G8 key datasets definition. The Local census template datasets are focused on local government responsibilities. There was no guidance for countries with three levels of government. How could I get agreement on the datasets and launch in time for Open Data Day?

I decided to make a Google Sheet with tabs for datasets required by the G8, Global Census, Local Census, Open Data Barometer, and Australia’s Foundation Spatial Data Framework. Based on these references I proposed 10 datasets to assess. An email was sent to the open data leaders asking them to collaborate on selecting the datasets.

GitHub is full of friends

When I encountered issues configuring the census, I turned to GitHub. Paul Walsh, one of the team on the OpenDataCensus repository on GitHub, was my guardian on GitHub – steering my issues to the right place, fixing Google Sheet security bugs, deleting a place I created called “Try it out” that I used for testing, and encouraging me to post user stories for new features. If you’re thinking about building your own census, get on GitHub and read what the team has planned and are busy fixing.

The meeting

I presented to the leaders of Australia’s state and territory open data leaders leaders on 19 Feb and they requested more time to add extra datasets to the census. We agreed to put a Beta label on the census and launch on Open Data Day.

Ready for lift off

The following day CIO Magazine emailed asking for, “a quick comment on International Open Data Day, how you see open data movement in Australia, and the importance of open data in helping the community”. I told them and they wrote about it.

The Open Data Institute Queensland and Open Knowledge blogged and tweeted encouraging volunteers to add to the census on Open Data Day.

I set up Gmail and Twitter accounts for the census and requested the census to be added to the big list of censuses.

Open Data Day

No support requests were received from volunteers submitting entries to the census (it is pretty easy). The Open Data Day projects included:

  • drafting a Contributor Guide.
  • creating a Google Sheet to allow people to collect census entries prior to entering them online.
  • Adding Google Analytics to the site.
What next?

We are looking forward to a few improvements including adding the map visualisation from the Global Open Data Index to our regional census. That’s why our Twitter account is @AuOpenDataIndex.

If you’re thinking about creating your own Open Data Census then I can highly recommend the experience and there is great team ready to support you.

Get in touch if you’d like to help with Australia’s Open Data Census.

Stephen Gates lives in Brisbane, Queensland, Australia. He has written Open Data strategies and driven their implementation. He is actively involved with the Open Data Institute Queensland contributing to their response to Queensland’s proposed open data law and helping coordinate the localisation of ODI Open Data Certificates. Stephen is also helping organise GovHack 2015 in Brisbane. Australia’s Regional Open Data Census is his first project working with Open Knowledge.

Open Knowledge Foundation: India Open Data Summit 2015

planet code4lib - Fri, 2015-03-06 09:54

This blog post is cross-posted from the Open Knowledge India blog and the Open Steps blog. It is written by Open Knowledge Ambassador Subhajit Ganguly, who is a physicist and an active member of various open data, open science and Open Access movements.

Open Knowledge India, with support from the National Council of Education Bengal and the Open Knowledge micro grants, organised the India Open Data Summit on February, 28. It was the first ever Data Summit of this kind held in India and was attended by Open Data enthusiasts from all over India. The event was held at Indumati Sabhagriha, Jadavpur University. Talks and workshops were held throughout the day. The event succeeded in living up to its promise of being a melting point of ideas.

The attendee list included people from all walks of life. Students, teachers, educationists, environmentalists, scientists, government officials, people’s representatives, lawyers, people from the tinseltown — everyone was welcomed with open arms to the event. The Chief Guests included the young and talented movie director Bidula Bhattacharjee, a prominent lawyer from the Kolkata High Court Aninda Chatterjee, educationist Bijan Sarkar and an important political activist Rajib Ghoshal. Each one of them added value to the event, making it into a free flow of ideas. The major speakers from the side of Open Knowledge India included Subhajit Ganguly, Priyanka Sen and Supriya Sen. Praloy Halder, who has been working for the restoration of the Sunderbans Delta, also attended the event. Environment data is a key aspect of the conservation movement in the Sunderbans and it requires special attention.

The talks revolved around Open Science, Open Education, Open Data and Open GLAM. Thinking local and going global was the theme from which the discourse followed. Everything was discussed from an Indian perspective, as many of the challenges faced by India are unique to this part of the world. There were discussions on how the Open Education Project, run by Open Knowledge India, can complement the government’s efforts to bring the light of education to everyone. The push was to build up a platform that would offer the Power of Choice to the children in matters of educational content. More and more use of Open Data platforms like the CKAN was also discussed. Open governance not only at the national level, but even at the level of local governments, was something that was discussed with seriousness. Everyone agreed that in order to reduce corruption, open governance is the way to go. Encouraging the common man to participate in the process of open governance is another key point that was stressed upon. India is the largest democracy in the world and this democracy is very complex too.Greater use of the power of the crowd in matters of governance can help the democracy a long way by uprooting corruption from the very core.

Opening up research data of all kinds was another point that was discussed. India has recently passed legislature ensuring that all government funded research results will be in the open. A workshop was held to educate researchers about the existing ways of disseminating research results. Further enquiries were made into finding newer and better ways of doing this. Every researcher, who had gathered, resolved to enrich the spirit of Open Science and Open Research. Overall, the India Open Data Summit, 2015 was a grand success in bringing likeminded individuals together and in giving them a shared platform, where they can join hands to empower themselves. The first major Open Data Summit in India ended with the promise of keeping the ball rolling. Hopefully, in near future we will see many more such events all over India.

LITA: In Praise of Anaconda

planet code4lib - Fri, 2015-03-06 09:00

Do you want to learn to code?  Of course you do, why wouldn’t you?  Programming is fun, like solving a puzzle.  It helps you think in a computational and pragmatic way about certain problems, allowing you to automate those problems away with a few lines of code.  Choosing to learn programming is the first step on your path, and the second is choosing a language.  These days there are many great languages to choose from, each with their own strengths and weaknesses.  The right language for you depends heavily on what you want to do (as well as what language your coworkers are using).

If you don’t have any coder colleagues and can’t decide on a language, I would suggest taking a look at Python.  It’s mature, battle-tested, and useful for a just about anything.  I work across many different domains (often in the same day) and Python is a powerful tool that helps me take care of business whether I’m processing XML, analyzing data or batch renaming and moving files between systems.  Python was created to be easy to read and aims to have one obvious “right” way to do any given task.  These language design decisions not only make Python an easy language to learn, but an easy language to remember as well.

One of the potential problems with Python is that it might not already be on your computer.  Even if it is on your computer, it’s most likely an older version (the difference between Python v2 and v3 is kind of a big deal). This isn’t necessarily a problem with Python though; you would probably have to install a new interpreter (the program that reads and executes your code) no matter what language you choose. The good news is that there is a very simple (and free!) tool for getting the latest version of Python on your computer regardless of whether you are using Windows, Mac or Linux.  It’s called Anaconda.

Anaconda is a Python distribution, which means that it is Python, just packaged in a special way. This special packaging turns out to make all the difference.  Installing an interpreter is usually not a trivial task; it often requires an administrator password to install (which you probably won’t have on any system other than your personal computer) and it could cause conflicts if an earlier version already exists on the system.  Luckily Anaconda bypasses most of this pain with a unique installer that puts a shiny new Python in your user account (this means you can install it on any system you can log in to, though others on the system wouldn’t be able to use it), completely separate from any pre-existing version of Python.  Learning to take advantage of this installer was a game-changer for me since I can now write and run Python code on any system where I have a user account.  Anaconda allows Python to be my programming Swiss Army knife; versatile, handy and always available.

Another important thing to understand about Anaconda’s packaging is that it comes with a lot of goodies.  Python is famous for having an incredible amount of high-quality tools built in to the language, but Anaconda extends this even further. It comes with Spyder, a graphical text editor that makes writing Python code easier, as well as many packages that extend the langauge’s capabilities. Python’s convenience and raw number crunching power has made it a popular language in the scientific programming community, and a large number of powerful data processing and analysis libraries have been developed by these scientists as a result. You don’t have to be a scientist to take advantage of these libraries, though; the simplicity of Python makes these libraries accessible to anyone with the courage to dive in and try them out.  Anaconda includes the best of these scientific libraries: IPython, NumPy, SciPy, pandas, matplotlib, NLTK, scikit-learn, and many others (I use IPython and pandas pretty frequently, and I’m in the process of learning matplotlib and NLTK).  Some of these libraries are a bit tricky to install and configure with the standard Python interpreter, but Anaconda is set up and ready to use them from the start.  All you have to do is use them.

While we’re on the subject of tricky installations, there are many more packages that Anaconda doesn’t  come with that can be a pain to install as well. Luckily Anaconda comes with its own package manager, conda, which is handy for not only grabbing new packages and installing them effortlessly, but also for upgrading the packages you have to the latest version. Conda even works on the Python interpreter itself, so when a new version of Python comes out you don’t have to reinstall anything.  Just to test it out, I upgraded to the latest version of Python, 3.4.2, while writing this article. I typed in ‘conda update python‘ and had the newest version running in less than 30 seconds.

In summary, Anaconda makes Python even more simple, convenient and powerful.  If you are looking for an easy way to take Python for a test drive, look no further than Anaconda to get Python on your system as fast as possible. Even seasoned Python pros can appreciate the reduced complexity Anaconda offers for installing and maintaining some of Python’s more advanced packages, or putting a Python on systems where you need it but lack security privileges. As an avid Python user who could install Python and all its packages from scratch, I choose to use Anaconda because it streamlines the process to an incredible degree.  If you would like to try it out, just download Anaconda and follow the guide.

District Dispatch: Free webinar: Bringing fresh groceries to your library

planet code4lib - Fri, 2015-03-06 06:09

On March 25, 2015, the American Library Association’s Washington Office and the University of Maryland’s iPAC will host the free webinar “Baltimore’s Virtual Supermarket: Grocery Delivery to Your Library or Community Site.” During the webinar, library leaders will discuss Baltimore’s Virtual Supermarket Program, an innovative partnership between the Enoch Pratt Free Library, the Baltimore City Health Department and ShopRite. Through the Virtual Supermarket Program, customers can place grocery online orders at select libraries, senior apartment buildings, or public housing communities and have them delivered to that site at no added cost. In this webinar, you will learn about the past, present, and future of the Virtual Supermarket Program, as well as the necessary elements to replicate the program in your own community.

Webinar speakers
  • Laura Flamm is the Baltimarket and Food Access Coordinator at the Baltimore City Health Department. In this role, Laura coordinates a suite of community-based food access programs that include the Virtual Supermarket Program, the Neighborhood Food Advocates Initiative, and the Healthy Stores Program. Laura holds a Master’s of Science in Public Health from the Johns Hopkins Bloomberg School of Public Health in Health, Behavior, and Society and a certificate in Community-Based Public Health. She believes that eating healthy should not be a mystery or a privilege.
  • Eunice Anderson is Chief of Neighborhood Library Services for the Enoch Pratt Free Library. A Baltimore native, she has worked 36 years at Pratt Library coming up through the ranks from support staff to library professional. In the various positions she’s held, providing quality and enriching library services by assisting customers, supporting and leading staff, and community outreach, has kept her battery charged.

Webinar title: Baltimore’s Virtual Supermarket: Grocery Delivery to Your Library or Community Site
Date: March 25, 2015
Time: 2:00-3:00 p.m. EST
Register now

The post Free webinar: Bringing fresh groceries to your library appeared first on District Dispatch.

District Dispatch: School librarians: Send us your successful IAL story

planet code4lib - Thu, 2015-03-05 23:49

Photo by sekihan via flickr

This week, I joined my colleague Kevin Maher, assistant director of the American Library Association’s (ALA) Office of Government Relations, in meeting with staff from Reach Out and Read, Save the Children and Reading Is Fundamental to lobby congressional Appropriators staff for level funding for Innovative Approaches to Literacy (IAL), a grant program with at least half of funding going to school libraries.

In the U.S. Senate and U.S. House, both Republicans and Democrats all talked about how tight the budget will be, how little money is available….but how much they all want to have an appropriation. But in the Senate, they are not optimistic that they can get a Labor, Health and Human Services education bill on to the U.S. Senate floor for a vote (that hopefully passes).

Many congressional staff members advised us to make sure Members of Congress know about the IAL funding program and how it benefits school libraries. For the first time, we need to submit electronic appropriations forms (like folks used to have to do for earmarks in the past) for all programs, and it will be a stronger submission with a “hometown” local connection.

We are asking every school that has received an IAL grant to support the ALA’s advocacy efforts. Email Kevin Maher kmaher[at]alawash[dot]org with a good story as soon as possible. These forms are due March 12, 2015, so we do not have much time.

The post School librarians: Send us your successful IAL story appeared first on District Dispatch.

Jonathan Rochkind: Factors to prioritize (IT?) projects in an academic library

planet code4lib - Thu, 2015-03-05 22:59
  • Most important: Impact vs. Cost
    • Impact is how many (what portion) of your patrons will be effected; and how profound the benefit may be to their research, teaching, learning.
    • Cost may include hardware or software costs, but for most projects we do the primary cost is staff time.
    • You are looking for the projects with the greatest impact at the lowest cost.
    • If you want to try and quantify, it may be useful to simply estimate three qualities:
      • Portion of userbase impacted (1-10 for 10% to 100% of userbase impacted)
      • Profundity of impact (estimate on a simple scale, say 1 to 3 with 3 being the highest)
      • “Cost” in terms of time. Estimate with only rough granularity knowing estimates are not accurate. 2 weeks, 2 months, 6 months, 1 year. Maybe assign those on a scale from 1-4.
      • You could then simply compute (portion * profundity) / cost, and look for the largest values. Or you could plot on a graph with (benefit = portion * profundity) on the x-axis, and cost on the y-axis. You are looking for projects near the lower right of the graph — high benefit, low cost.
  • Demographics impacted. Will the impact be evenly distributed, or will it be greater for certain demographics? Discipline/school/department? Researcher vs grad student vs undergrad?
    • Are there particular demographics which should be prioritized, because they are currently under-served or because focusing on them aligns with strategic priorities?
  • Types of services or materials addressed.  Print items vs digital items? Books vs journal articles? Other categories?  Again, are there service areas that have been neglected and need to be brought to par? Or service areas that are strategic priorities, and others that will be intentionally neglected?
  • Strategic plans. Are there existing Library or university.strategic plans? Will some projects address specific identified strategic focuses? Can also be used to determine prioritized demographics or service areas from above.
    • Ideally all of this is informed by strategic vision, where the library organization wants to be in X years, and what steps will get you there. And ideally that vision is already captured in a strategic plan. Few libraries may have this luxury of a clear strategic vision, however.

Filed under: General

DPLA: DPLA MAP, version 4.0

planet code4lib - Thu, 2015-03-05 20:30

Hot on the heels of last week’s announcement of KriKri and Heidrun, we here at DPLA HQ are excited to release the newest revision of the DPLA Metadata Application Profile, version 4.0 (DPLA MAP v4.0).

What is an “application profile”? It’s a defined set of metadata properties that combines selected elements from multiple schemas, often along with locally defined ones. An application profile, therefore, allows us to take the parts of other metadata schemes best suited to our needs to build a profile that works for us. We’ve taken full advantage of this model to combine properties from DCMI, EDM, Open Annotation, and more to create the DPLA MAP v4.0. Because the majority of the elements come from standard schemas (indicated by a namespace prefix, such as “dc:date” for Dublin Core’s date element), we remain aligned with the Europeana Data Model (EDM), while having enough flexibility for our local needs.

Our new version of the DPLA MAP has lots of properties tailor-made for storing Universal Resource Identifiers (or URIs) from Linked Open Data (LOD) sources. These are other data sets and vocabularies that publish URIs tied to specific terms and concepts. We can use those URIs to point to the external LOD source and enrich our own data with theirs. In particular, we now have the ability to gather LOD about people or organizations (in the new class we’ve created for “Agents”), places (in the revision of our existing “Place” class) and concepts, topics, or subject headings (in the new “Concept” class).

At the moment DPLA’s plans for LOD include associating URIs that are already present in the records we get from our partners, as well as looking up and populating URIs for place names when we can. In the future, we plan to incorporate more linked data vocabularies such as the Library of Congress Subject Headings and Authorities. After that we can begin to consider other kinds of LOD possibilities like topic analysis or disambiguation of terms, transliteration, enrichment of existing records with more metadata from other sources (a la Wikipedia, for example), and other exciting possibilities.

Every journey begins with a first step, and our journey began with the upgrades announced in recent weeks (as described in our recent Code4Lib presentation, blog posts, and software releases). Along with these upgrades, MAP v4.0 has become our official internal metadata application profile. As of today, documentation for the new version of DPLA MAP v4.0 is available here as well as a new Introduction to the DPLA Metadata Model.

Nicole Engard: Bookmarks for March 5, 2015

planet code4lib - Thu, 2015-03-05 20:30

Today I found the following resources and bookmarked them on <a href=

  • Sphinx Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, written by Georg Brandl and licensed under the BSD license.

Digest powered by RSS Digest

The post Bookmarks for March 5, 2015 appeared first on What I Learned Today....

Related posts:

  1. Calling all Catalogers
  2. Handy Presentation Tools
  3. Open Source Documentation

OCLC Dev Network: WMS Web Services Maintenance March 8

planet code4lib - Thu, 2015-03-05 20:30

All Web services that require user level authentication will be unavailable during the installation window, which is between 2:00 – 8:00 AM Eastern USA, Sunday March 8th

Tara Robertson: BC open textbook Accessibility Toolkit: generosity as a process

planet code4lib - Thu, 2015-03-05 17:03

Last week we published The BC Open Textbook Accessibility Toolkit. I’m really excited and proud of the work that we did and am moved by how generous people have been with us.

Since last fall I’ve been working with Amanda Coolidge (BCcampus) and Sue Doner (Camosun College) to figure out how to make the open textbooks produced in BC accessible from the start.  This toolkit was published using Pressbooks, a publishing plugin for WordPress. It is licensed with the same Creative Commons license as the rest of the open textbooks (CC-BY). This whole project has been a fantastic learning experience and it’s been a complete joy to experience so much generosity from other colleagues.

We worked with students with print disabilities to user test some existing open textbooks for accessibility. I rarely get to work face-to-face with students. It was such a pleasure to work with this group of well-prepared, generous and hardworking students.

Initially we were stumped about how to  get faculty, who would be writing open textbooks, to care about print disabled students who may be using their books. Serendipitously  I came across this awesome excerpt from Sarah Horton and Whitney Queensbury’s book A Web For Everyone. User personas seemed like the way to explain some of the different types of user groups. A blind student is likely using different software, and possibly different hardware than a student with a learning disability. Personas seemed like a useful tool to create empathy and explain why faculty should write alt text descriptions for their images.

Instead of rethinking these from the beginning Amanda suggested contacting them to see if their work was licensed under a Creative Commons license that would allow us to reuse and remix their work. They emailed me back in 5 minutes and gave their permission for us to reuse and repurpose their work. They also gave us permission to use the illustrations that Tom Biby did for their book. These illustrations are up on Flickr and clearly licensed with a CC-BY license.

While I’ve worked on open source software projects this is the first time I worked on an open content project. It is deeply satisfying for me when people share their work and encourage others to build upon it. Not only did this save us time but their generosity and enthusiasm gave us a boost. We were complete novices: none of us had done any user testing before. Sarah and Whitney’s quick responses were really encouraging.

This is the first version and we intend to improve it. We already know that we’d like to add some screenshots of ZoomText and we need to provide better information on how to make formulas and equations accessible. It’s difficult for me to put work out that’s not 100% perfect and complete but other people’s generosity have helped me to relax.

I let our alternate format partners across Canada know about this toolkit. Within 24 hours of publishing this our partner organization in Ontario offered to translate it into French. They had also started working on a similar project and loved our approach. So instead of writing their own toolkit they will use use or adapt ours.  As it’s licensed under a CC-BY license they didn’t even need to ask us to use it or translate it.

Thank you to Mary Burgess at BCcampus who identified accessibility as a priority for the BC open textbook project.

Thank you to Bob Minnery at AERO for the offer of a French translation.

Thank you to Sarah Horton and Whitney Queensbury for your generosity and enthusiasm. I really feel like we got to stand on the shoulders of giants.

Thank you to the students who we worked with. This was an awesome collaboration.

Thank you to Amanda Coolidge and Sue Doner for being such amazing collaborators. I love how we get stuff done together.

Pages

Subscribe to code4lib aggregator