Guest Post by Johnna Percell, Google Policy Fellow
On Thursday, July 16, ALA Washington Office welcomed a few of my colleagues in the Google Policy Fellowship to a lunch discussion of the office’s portfolio. The Google Policy Fellowship provides an opportunity for undergraduates, graduates, and law students who are interested in Internet and technology policy to spend the summer working at public interest groups here in DC as well as in Ottawa, San Francisco, and other cities around the world.
Fellows in attendance Thursday included Sasha Moss, who is working with R Street Institute; Miranda Bogen, a fellow with the Internet Education Foundation; Ian Dunham from the Future of Music Coalition; Maria Paz Canales, who is at the Center for Democracy and Technology; and of course me. The discussion brought together a wealth of experience and insight into many interesting facets of technology policy.
ALA staffers Alan Inouye, Carrie Russell, and Stephen Mayeaux were kind enough to take time out of their day to share with us why libraries are invested in policy and how ALA’s Washington Office is working to further those interests of concern to libraries everywhere. The discussion covered ALA’s work on open access, the Freedom of Information Act (FOIA), copyright, orphan works, mass digitization and the changing roles of cultural institutions.
This prompted a rousing discussion of how emerging technologies are re-writing (have re-written?) the rules of creation, ownership, and access to information. The Internet is breaking down barriers between the role of a consumer and that of a creator. To fully participate in society, you must have access to technology necessary to engage in this exchange. Libraries have an important position at the intersection of disruptive technology, innovation, and access that allows them to facilitate greater participation by everyone.
At the end of our discussion, we all affirmed the urgency of updating laws and policies to address the changing nature of information. It’s invigorating to have the opportunity to spend my summer engaging these ideas with my fellow Google Policy Fellows – a crop of future information professionals who, thanks to ALA, are now more aware of the essential role of libraries in information technology policy.
The post ALA Washington Office hosts Google Policy Fellow lunch appeared first on District Dispatch.
If you are interested in Open Access and Open Data and haven’t hear about ContentMine yet then you are missing out! Graham Steel, ContentMine Community Manager, has written a post for us introducing this exciting new tool.
ContentMine aims to liberate 100,000,000 facts from the scientific literature.
We believe that “The Right to Read is the Right to Mine“: anyone who has lawful access to read the literature with their eyes should be able to do so with a machine.
We want to make this right a reality and enable everyone to perform research using humanity’s accumulated scientific knowledge. The extracted facts are CC0.
Research which relies on aggregating large amounts of dynamic information to benefit society is particularly key to our work – we want to see the right information getting to the right people at the right time and work with professionals such as clinical trials specialists and conservationists. ContentMine tools, resources, services and content are fully Open and can be re-used by anybody for any legal purpose.
ContentMine is inspired by the community successes of Wikimedia, Open StreetMap, Open Knowledge, and others and encourages the growth of subcommunities which design, implement and pursue their particular aims. We are funded by the Shuttleworth Foundation, a philanthropic organisation who are unafraid to re-imagine the world and fund people who’ll change it.
This posh has been reposted from the Open Access Working Group blog.
This update will have four significant changes to three specific algorithms that are high use — so I wanted to give folks a heads up.
1) Merge Records — I’ve updated the process in two ways.
a) Users can now change the data in the dropdown box to a user-defined field/subfield combination. At present, you have defined options: 001, 020, 022, 035, marc21. You will now be able to specify another field/subfield combination (must be the combination) for matching. So say you exported your data from your ILS, and your bibliographic number is in a 907$b — you could change the textbox from 001 to 907$b and the tool will now utilize that data, in a control number context — to facilitate matching.
b) This meant making a secondary change. When I shifted to using the MARC21 method, I removed the ability for the algorithm to collapse multiple records of the same type with the merge file into the source. For example, after the change to the marc21 algorithm, in the following scenario, the following would be true:
source 1 — record 1
merge 1 — matches record 1
merge 2 — matches record 2
merge 3 — matches record 3
The data moved into source 1 would be the data from merge1 — merge 3 wouldn’t be seen. In the previous version prior to utilizing just the Marc21 option, users could collapse records when using the control number index match. I’ve updated the merge algorithm, so that default is now to assume that all source data could have multiple merge matches. This has the practical option of essentially allowing users to take a merge file with multiple duplicates, and merge all data into a single corresponding source file. But this does represent a significant behavior change — so users need to be aware.
2) RDA Helper —
a) I’ve updated the error processing to ensure that the tool can fail a bit more gracefully
b) Updating the abbreviation expansion because the expression I was using could miss values on occasion. This will catch more content — it should also be a bit faster.
3) Linked Data tools — I included the ability to link to OCLC works ids — there were problems when the json outputted was too nested. This has been corrected.
4) Bibframe tool — I’ve updated the mapping used to the current LC flavor.
Updates can be found on the downloads page (Windows/Linux) or via the automated update tool.
- 32-bit: http://marcedit.reeset.net/software/MarcEdit_Setup.msi
- 64-bit: http://marcedit.reeset.net/software/MarcEdit_Setup64.msi
Omeka is a content management systems (CMS) that facilitates the creation of online exhibits. Traditionally, exhibit creators needed to have web design skills to create a webpage. Using Omeka, the process for creating exhibits websites is simpler, which allows exhibit creators to easily extend the presence of our physical objects.
DuraSpace News: Call for Submissions for the Third Annual New England National Digital Stewardship Alliance Meeting
North Dartmouth, MA The University of Massachusettes Dartmouth and Brown University will host the Third Annual New England National Digital Stewardship Alliance Meeting on September 25 in The Grand Reading Room at the Claire T. Carney Library, University of Massachusetts Dartmouth.
Submission topics include a variety of topics including but not limited to:
Yesterday, Patrick Murray-John had a great suggestion on Twitter: a DPLA event celebrating Ada Lovelace day:July 21, 2015
In case you don’t know much about Ada Lovelace, an English mathematician in the early 19th century and the namesake of the awesome Ada Initiative, or perhaps you don’t know about the Ada Lovelace Day, check out this site: http://findingada.com/.
I think a DPLA event for Ada Lovelace day is a fantastic idea, and I put forward the thought that we have a DPLA Ada Lovelace Event that occurs in different cities and communities on that same day, October 13. Each event would introduce people to the Ada Lovelace as well as the DPLA collections and API, then support making apps, syllabi, exhibitions, or other creations focused on highlighting women in mathematics, science, and technology - like Ada herself. These events could communicate virtually to share ideas and excitement, either through Twitter (the proposed hashtag currently is #dplada, but that could change), via a website for the day (http://www.dpladalovelace.us, set up by Patrick MJ), and through community report-backs via the DPLA blog or other such site.
A group of interested folks are now looking into making a possible DPLA Ada Lovelace Event ‘template’, with guidelines and such for all DPLA community representatives or perhaps just folks interested in seeing this happen in their community. The http://www.dpladalovelace.us site would also give more information about this template, the Ada Lovelace Day, the DPLA, and contacts for the various events folks are trying to pull together across the map. Anyone and everyone is welcome to contribute to the existing DPLA Ada Lovelace Event template here. If you are interested in trying to organize such an event, that is the place to mention it currently (until we get a bit further with the planning stages).On a related and personal note…
This part of the post just goes into my own views on the importance bringing people like Ada Lovelace to the forefront. Feel free to skip, as most of the DPLA Ada Lovelace Day info so far is up top.
One of the reasons I find this event so important, besides that the DPLA and surrounding library tech communities just do amazing work, is my own tension with my own Mathematics academic background. The surge in interest and visibility of important historical figures like Ada Lovelace brings into light the struggles women and minorities had and continue to have in STEM fields (science, technology, engineering, mathematics), as well as our battles with imposter syndrome and lack of support/allies.
This half of a blogpost is not about that topic broadly, which is well-covered by others who are better placed to discuss it, but my own disappointment in myself that I didn’t make more of my own Mathematics background and trying to understand what happened. Upon starting college the first time around, I really focused on the sciences. Engineering, Physics, Meteorology, Computer Science - I considered, even started, majors in them all. But beyond my own inability to settle on one topic, the communities in these academic departments supporting women was lacking at best - despite being at the small, liberal arts college I ended up at. I don’t have the horror stories I’ve heard from other women studying in STEM fields, but I didn’t feel like I should stay in these fields either. Having to continually explain that an engineering study group meeting was not a ‘date’ was wearing. When I had a new solution for a computer science problem, the ‘techie guys’ on my dorm floor showed so much astonishment that I created it, it became embarrassing. When I got a 98 on a Physics exam, my professor was convinced I cheated, then when disproven, thought perhaps his tests were then too easy (despite the fact that I got grief for breaking the curve in that class for everyone else). There are other such stories - we all have them, and it just is a testement to how continued, ‘soft rejections’ can seem benign when isolated, but together become a destructive force.
However, I ended up in the Mathematics department, which was was welcoming. There was more than 1 token women professor, as well (although the gender balance in terms of who was tenured versus adjunct was what you’d expect, sadly, and the inclusion of minorities entirely lacking). Most of the female students, though by no means all, in the department were studying to be Math teachers in secondary schools - this was a very popular track for people from the area, and served a great purpose. But I never got into that. When it came time for graduation, however, I had a lot of personal turmoil (hey, college, don’t we all have that) and moved to NYC. I considered pursuing Mathematics at the graduate level, but it just didn’t manifest, despite being accepted to a few programs. I had massive amounts of what I know now to be imposter syndrome, among other things to deal with. I instead ended up teaching middle school Mathematics in Washington Heights (and turned out to be an absolutely horrible middle school teacher, but hey you don’t know until you try). How I got this position remains a bit of a mystery to me looking back, since I had done absolutely no education courses, nor any sort of tutoring work. But a women in mathematics, I mean, of course you’re going to be a math teacher…? (This was the view of my family, friends and random strangers on the street) Who knows what effect hearing that so often had. There was a lot more involved in this outcome for me than societal pressures, I freely admit, but I wonder how big something that seemed/seems so trivial could be.
Regardless, I went through a lot of careers, jobs, etc. in NYC. I’m glad I moved there, despite my hard introduction via Dominican culture, the NYC Public Schools System and Mathematics education. But looking at where I am now, and getting excited about events like this Ada Lovelace Day, and the growing support for women and minorities in STEM, I can’t help but feel disappointment in myself for how little I did with what was, at one time, a great love and amount of time invested in learning Mathematics - in particular, abstract algebra and topology. Like what happens when you stop speaking a language daily, it fades away. Eventually, the assumption became, whether to do with my gender or because I accidentally fell into a career in libraries or both or neither, is that I’ve (only) got a humanities background. Generally folks with that assumption mean well, and its unintentional/unknown on their part, but for me, it hits a nerve every time (that I used to be able to hide pretty well, but have just given up trying to hide as of late). Add weird community ideas on cataloging/metadata work and gender to this mix, and it gets even more complicated. Trying to turn this minefield into a village is rough but needed worked.
Yet, the library tech community has generally been a stellar space for me to work out these issues and responses, because of the growth and support within it for projects like the Ada Initiative and now the DPLA Ada Lovelace Day possible event(s). So I’m excited to see this happen, if only for my own personal work. But I also realize these events cannot happen without us getting involved, imposter syndrome (or something I have with stepping in to help organize events, “I’m being too bossy” syndrome, which I’d argue is a subclass of imposter syndrome) or no. So, hey, you, get involved in helping make this happen. Because we all have stuff to work through, and working together on events like this can help.
Again, this is where the organizing for the DPLA Ada Lovelace is currently happening, and I hope to see you involved: here All are welcomed to contribute and collaborate.
And, to hell with societal pressures, I’m planning to get my next degree in Computer Science, if only as a belated thank you to that Computer Science professor I had in college who told me, ‘you should consider a compsci degree, I’d be happy to support you in this department’. I only realized much later what she was trying to say.
Kara Sowles and Francesca Krihely gave an afternoon workshop at OSCON on how to plan and run tech events. I’m hoping to be doing more and more of this at work in the near future so I was sure to take copious notes.
A successful event is one that feels like magic – that everyone and everything is running smoothly and you never notice the organizers. To create a successful event you just nee the passion to do it.
So let’s get started! First things first, you need to create a document. A shared document with you and your collaborators where you have a ‘single source of truth’. This document has to be kept up to date – it should not just be a series of email. We started with a sample document provided by Kara and Francesca.Purpose and Goals
Purpose: to meet and connect w/ users of software
Goal: 25 qualified leads – users to reach out to after
Purpose: Teach girls to code
Goal: Have 100 girls attend and 50% of them finish the projects
While this is all great, you also have to think about your constraints – things you can’t do. For example if you want to host at your office you might not be able to have 200 attendees. You’re constrained by the space. Your budget might also be a constraint you have to work within. The amount of time you have to plan needs to be defined and will constrain how you accomplish your mission.Structure
- Networking event: bring people together just to chat with each other. This can be a launching point for a bigger event.
- User groups: pretty informal and easier to run than some events. These continue over time though
- Office hours: a simple way to get people together to communicate at specific times
- Online events: work as a great resource for communities that are worldwide. This can be collaborative or one person talking to everything.
- Hack-a-thon: a great way for people to understand new ideas, interact with new technologies and meet new people. These can be collaborative or competitive – so you need to decide what you want.
The key is to think about your community and what kind event they need.Work with a Team
Once you have the structure in place – how do you get help/work in a team of people on the event. When building your team you want to give people the opportunity to get involved (sometimes people don’t know you want help), establish clear roles in order to empower them and make sure there is something in it for the volunteers (WIIFM – What’s in it for me).
Once your team is in place you need to use them/work with them. You need regular checkins, refer often to your document and always remember to delegate and be a good leader.Content
You’re now ready to talk about the content to be covered at your event. This might be speakers, collaboration, workshops and/or activities.
Hack-A-Thon content is a little different from other events because usually in other events content is chosen ahead of time, at hack-a-thons the content is being created on site and is the entire point of the event. Content produced at these events can be extremely valuable to your project. These are also opportunities for sponsors because people see their products being used in a real way. You can also have people share their work on sites like Hacker League to make the event more open.
For the other events you the main source of content is usually the speakers. The first way to get speakers is to invite them. When you invite people you should let the speakers know what you’re going to provide them – honorarium, expense coverage, free registration, etc. The other way to get speakers is to submit a call for speakers. The nice thing with this is that you get a ton of submissions, but that’s also a downside as you’ll have a lot of content to wade through. Make sure that your call page is clear about what you’re looking for along with examples.
In the end when choosing your speakers you need to go back to your document and review your purpose for the event and hold yourself to that purpose.
After doing all the work to get speakers on your agenda, you then need to ‘handle’ them. Speakers are often very busy and you need to give them all relevant information in one document/email. It’s most likely that they’re not going to want to wade through a bunch of emails to find all the information about their talk. You also want to be sure that your speakers actually accept [personal story time – I was once on a conference program and had no idea I was on the program because my confirmation email got lost in the mail].
In addition to speakers you are going to need great facilitators/MCs – this is the person who shares announcements with the attendees throughout the program.
Think about whether or not you want to record the talks at your conference. If you do plan to record sessions that can be very costly. If you do record videos make sure you share your content widely. Share on your website, blog, github, or other sites.Venue
When you start searching for a venue you want to go back to your constraints to see how many tracks you want, what your budget is, and how many people will be attending. Some places to look:
- 20-80 person event
- Tech company offices
- 80-150 person event
- Tech offices (less likely)
- Co-working spaces
- 150-250 person event
- Universities are still an option
- Fun local spaces (movie theaters, art galleries)
- Professional venues (conference centers)
When looking at venues remember to think about amenities. If you’re doing a hack-a-thon you need tables and power and internet.
If you want to look for spaces you can go to a site called Cvent – don’t just trust their site though, do your own research. When looking at hotels remember that while they’re easy to find, they can be really expensive (food and av for example have to come from in house).
When talking to offices you really want to make sure you send them all your details and requirements (# of attendees, date and time, entrance fee, length of event, food, etc etc) – these places are usually dong your a favor and are not prepared like hotels for hosting events.
When it comes to AV, make a wishlist but also know what you really need versus want. Some places you can’t bring your own stuff in – so keep that in mind.How much does it cost?
This section was a series of awesome slides with sample costs. It all boils down to deciding what you want to spend. You can see my full set of images in my OSCON album on Flickr.Sponsorships
Sponsors are a positive part of your event. It can be a lot of work to find these sponsors and make sure they follow through on their promises. You have decide how large a part of your event they want them to be. For some audiences, over-exposure to sponsors can be annoying.
Here are some reasons sponsors might want to contribute to your event:
- recruiting and hiring
- lead generation
- community building
- they believe in your mission
To get sponsors you should have a sponsorship prospectus that you can send out with all the info needed to convince sponsors. In your prospectus you want to have:
- attendee demographics
- who is the target audience
- purpose/mission of event
- format of the event
- what sponsors get
You need to decide ahead of time what sponsors can ‘buy’ and can’t. Can they have a talk slot? can they have the branding control of your talk? etc etc.
Sponsors love booths! These might not always be the best thing for your event though – if there isn’t a ton of movement of the attendees they might not be the best option.Social spaces
When thinking about your attendees back from your purpose, plan your space. If you have a lot of people who might have to charge their batteries – for electronics and for their brains. Think about your activities with your audience in mind.
Also remember to ‘map your flow’ – see where people will be walking and make sure their pathways allow for easy movement.
Part of that social space is the food you’re offering. How messy is it? How accessible is it?Diversity
You want your event to be accessible to all. First it’s key to set your expectations right on the events page:
- Code of Conduct
- Accessibility Info
From that list – you must have a Code of Conduct. You need to train your staff on handling CoC violation reports, list this clearly and publicly for all to see and make sure it’s sent to the speaker confirmation emails.
Think through if you want your event to be family-friendly and/or parent friendly. You might want to provide child care or events for children.Promoting your event
Go where your attendees go to announce your event, post on newsletters, post on local websites and of course social media!
Use incentives to get people to come to your event.Logistics
Most of our event is the planning process, only a small part is onsite at the event – keep that in mind.
Goals + Constraints = Logistics
So, for example, if you have a goal to host 200 attendees you want to be sure you have enough food and a room big enough to fit them.
To tackle logistics you want to stay organized from start to finish with your document, use calendars and remember to delegate! You can also pretend that you’re launching a product – your event is the product. You need project managers, milestones, deadlines and tasks – just list you do with a product release.
Francesca gave us another handy doc we can use for this.
Remember that not all attendees will have smartphones so a printed out schedule is a great thing to give out. OSCON does a daily schedule they print out and hand out on the day of. You can also get standing banners for $150-$200 online and it feels much more professional. Signage is really important as well to get people where they want to go.
If you’re looking for cheap swag take a look at Sticker Mule for stickers! Remember that swag should be so people remember the event and maybe thank yous for your speakers or staff.
Once you’re onsite you want to have a few ‘events kits’ – for example you want to put in batteries, tape, pens, rubber bands, etc. in a bag so you have it on hand at all times.Day of
It’s now the day of the event!! Remember what we’ve said a bunch already – DELEGATE! Make sure everyone has everyone else’s phone number so you can reach whomever you need to reach. And no amount of prep will prevent something from going wrong – just stay calm and be ready to tackle the problem.Post-Event
Finally the event is over … and it’s all a blur of adrenaline and fear. You’re not done yet! Go back to your document and update it. Make sure you met all of your goals. You also want to give yourself a break – either a work from home day or a day off completely.
You’ll want to think about a post event survey. You won’t get everyone to respond so make sure you have some multiple choice and text fields where you can get a genuine response. Using the survey you can measure if you got your intended audience. If you want people to fill out the survey you can have a prize associated with the survey to get more answers.
Finally, you do need to do a post-mortem with your team.
If you want to see the whole talk you can find the slides online.
The post An Introduction to Planning and Running Tech Events appeared first on What I Learned Today....
We are very pleased to announce our 2015-2016 Education Advisory Committee. From an extremely qualified pool of over 300 candidates who responded to our Call for Educators Participants, including educators in many fields and institutions across the U.S., we have selected ten outstanding participants.
The Education Advisory Committee will help DPLA staff build and review curated content sets for education users and plan future education projects. This effort is part of our Whiting Foundation-funded education work. To learn more about our plans, read about our educational use research findings.
If you’d like to follow along with our education work, please email firstname.lastname@example.org.Education Advisory Committee
Adena Barnette is a thirteen-year educator at Ripley High School in West Virginia. She serves on her school’s literacy and curriculum teams and as the social studies department chairman. After being selected as a James Madison Fellow in 2011, she earned her Master’s degree in American History and Government from Ashland University.
Kerry Dunne is the Director of History & Social Studies for Boston Public Schools. Prior to BPS, she served as the K-12 Social Studies Director for the Arlington (Mass.) Public Schools for 7 years, taught history and served as the history department head for 9 years at Framingham High School. Kerry teaches the Pedagogy of Teaching History class at Brandeis University and is appointed to the board of the Massachusetts Council for the Social Studies (MCSS).
Ella Howard is an Associate Professor of History at Armstrong State University in Savannah, Georgia, where she teaches urban history, digital history, material culture, and popular culture. She has previously worked on a Teaching American History grant with local K-8 educators. Her book Homeless: Poverty and Place in Urban America was published by the University of Pennsylvania Press in 2013.
Melissa Jacobs is a Coordinator for Library Services in the New York City Department of Education. She is the founder and former chair of the American Association of School Librarians (AASL) Best Apps for Teaching and Learning, a member of AASL’s Executive Board as Member-at-Large, and the current President of the School Library Systems Association of New York State.This year she was honored with Library Journal‘s Mover and Shakers Award and was named Queens College Graduate School of Library and Information Studies Alumna of the Year.
Susan Ketcham has been teaching English since 2000. She graduated from Purdue University with a BA in English Education and has recently added School Library to her teaching license. This year will be her 14th at East Central High School in St. Leon, Indiana. While she has taught every grade level from 6th-12th, this year she will teach English 9, Honors English 11, and Genres of Literature.
Jamie Lathan is a 14-year social studies teacher at a residential high school (North Carolina School of Science and Mathematics) in Durham, North Carolina. He received his BA in History and MAT in Social Studies teaching from the University of Virginia and his Ph.D. in Curriculum, Culture, and Change from the University of North Carolina at Chapel Hill. He also serves as dean of distance education and extended programs at his high school.
Lakisha Odlum is currently a secondary English Language Arts teacher in New York City. She has been a teacher for 11 years of elementary school through college classes, and will be teaching a graduate course in the fall for student teachers in the English Education department at Teachers College, Columbia University. Over the past three years she has participated in programs through the New York Public Library, National Endowment for the Humanities, and the Gilder Lehrman Institute of American History.
Albert Robertson is a middle school social studies teacher at Meadow Glen Middle School in Lexington, SC. He has taught world history from ancient times all the way up to the present to 6th and 7th graders for the past nine years. This year he was honored as his district’s teacher of the year and as a top five finalist for South Carolina. He currently works as the district lead teacher for middle level social studies and also as an adjunct professor of Historical Literacy and Middle Level Social Studies Methods at the College of Charleston and Newberry College, respectively.
Melissa Strong is an associate professor of English at Northeastern State University in Tahlequah, Oklahoma, where she teaches American literature, gender studies, and writing in traditional, blended, and online formats. She recently published an article on The Long Day, a 1905 best seller exposing the difficult living and working conditions of women in unskilled jobs Nickel and Dimed style, and her essay on teaching with images is forthcoming in MLA Options for Teaching the Literatures of the American Civil War. She is an AP reader for English Literature.
James Walsh is the social studies department chair at Scott County High School, Kentucky’s largest public High School, near Lexington, Kentucky. He has also had the opportunity to work with the C3 teachers project and just started a doctoral program at the University of Kentucky.
My first session at OSCON this year was hosted by Jono Bacon on Community Management.
We’ve seen a remarkable growth in community all over the world – people are getting together to make things, do things, hack, etc. Just this simple idea of peel getting together to make communities makes Jono excited (me too). If you take away the screens, computers, internet, etc – we’re all just people. We all have a basic set of concerns, opportunities, and insecurities. We all want a feeling of self-worth and to do that we need to contribute to communities (family, friends, etc). One key to this is the growth of internet connectivity. People in countries who were never connected before are getting connected – we also have the grown of smart phone use – this means that we as human beings can get to gather and connect to create communities and contribute to making, sharing, creating and more.
Open source is powered by communities! Wikipedia is powered by communities sharing knowledge and making it open! There are sustainable farming groups all over the world. We have the maker revolution. We also notice a lot more political activism because people can get together in easier ways.
Despite all of that we’re inefficient as people – these communities were all mostly accidents. We learn about communities by watching others, the renaissance comes when people swap from watching to writing it down and replicating that information.
Jono shared with us his written down/packaged thoughts on community management in this 1/2 day workshop.
If we want to build strong communities we have to start with a mission. We have to have a point and a focus. In order to assess the type of mission we want we have to look at the world we’re in. First off we’re in the post-Snowden land of privacy, the land of 3D printing and the maker revolution and a world where everyone is getting connected to the internet.
If building a community within or for your business seems like a marketing ploy it will fail. The day was broken up as follows:
- We need a vision – this is the ‘fluffy’ part
- We need requirements – Communities are chaotic, and that makes them fun, but we do need to have some sort of requirements
- We need to make a plan – there are many communities that have naturally sprung up (the ice bucket challenge) but the very best communities have a plan behind them
- We need an infrastructure
- We then need to figure out how to get people involved
- Once we have people join we need to measure the value of the community (especially if you’re at a company)
- The key thing is refinement. We will screw some stuff up – and this is a good thing. Failure is an opportunity to be better
Want to learn more sign up at : http://communityleadershipforum.com
Community leadership is about taking all the talents you’re surrounded about and bringing them together. Contributions come in many shapes and sizes. Not all contributions are code and documentation – some of it is just ideas!Strategy (Vision + Mission + Plans):
Vision – what are we going out there to do? The elevator pitch that will get people excited. Take a global community of connected people and make then as efficient as possible. Jono breaks communities in to two types : read and write. Read communities are those that are user groups – people who need a place to talk and share. Write communities want to get together to change things – open source projects are write communities and the focus of today.
The first thing we need to accept is that people are irrational. We need to use a bit of social engineering or behavioral economics to manage our communities.
Jono brought up the SCARF model (read the full PDF) – this is the core foundation for creating a successful community:
- Status – Clarity in relative importance
- Certainty – Creating a sense of security and predicability
- Autonomy – Building in choice in your environment (even if those choices all lead to the same results – order don’t work – letting people pick is the key)
- Relatedness – Defining clear social groupings and systems (build strong teams and help them work together)
- Fairness – Reducing unfair opportunity and rewards
Every community is different, but every community that is great is great because of great leadership. Some of the most impactful leaders though can be at the bottom of the food chain.
What is great leadership? It’s broken in to two areas:
- Helping people to succeed in their goals
- Helping people to be the best that they can be
The goal with strategy is that we want to build predictable yet surprising results. Instead of trying to convince people who are skeptical – go out there and do it and surprise them. You also have to be honest – you cannot promise success when starting a new community – some things are going to work and some are not.
There are three steps to starting your community within a company or as an extension of your company:Observe:
- look at your environment
- define requirements
- define expectations
- identify key players – this is really important – you need to find the people you want to influence and that you want to influence you
- assess risks/threats to you and others – when you join a company there are going to be people who are gunning for you and those people will bemoan the work that you’re doing and others will actively try to derail your work – these are the people you want to make friends with
- explore short/long term changes – see how quickly people are joining and leaving a company
- create a mission statement – this isn’t something you create once and never look at again – it’s something people should think about every single day ‘why are we doing this?’
- create a set of values – from the mission statement you can pull out a set of values
- create a longer term roadmap – “in 2 years we want to be here”
- create a staff engagement plan – if you work for a company how are you going to get out there an engage with people
- create a community engagement plan – find a way to make visiting the community a habit
- create a budget – “pick a budget and don’t spend all of it”
- a strategic plan (for the execs)
- an elevator pitch (for the staff) – max 5 min – better if under 3
- an execution plan (for you)
- relationships (for the teams)
In the end you have 4 core documents you end up with: mission statement, elevator pitch, strategic plan, implementation plan. Through all of this you want to communicate your strategy, keep people included and make them feel like they’re part of the process.Planning:
Collaborative planning is really really hard! We want to build a culture in which people can plan together but not everyone in your community should play a role in how you plan. These people might be loud, but lack the skills to assist in planning. You need to find the best people to contribute to the plan because they have earned it.
There are two types of people in open source communities – hackers and maintainers. Hackers want to create things! Maintainers want to build stable software and fix bugs and do QA.
For the hackers you want to build a culture of chaos so people and join in easily. This is like an on ramp to the project. You also need project plans in place for the maintainers.
5 areas to consider when planning:
- opinionated – it’s okay to say no to people! If you say yes to everyone the best you can be is average
Objective Key Results (OKR) – a process used at Google. The first step is to plan your next 3 month period – create some measurable objectives (no more than 5). Next you define key results – set these to be deliberately ambitious (on the edge of impossible), but measurable outcomes (no more than 3 for each objective. Next you document the previous two steps and share them with everyone (when you share ambitious goals with the public you don’t want to look like an idiot by not achieving them). You need to provide updates regularly and you have to stress that these are ambitious goals that I might not meet. We shouldn’t just seek to have great results, but regularly exercising and stretching ourselves to make ourselves better. After the 3 month period you grade yourself from 1 to 10. 1 being that you didn’t do a thing – 10 being you finished everything. You should be getting about a 6 or 7 – if you’re getting a 10 then you’re not stretching yourself enough. Finally you want to revise and improve your goals for the next period. Because your assessing yourself you get to improve yourself – it’s not designed to be a tool for your boss to grade you.
The next thing we need to do is connect to the hearts and minds of people. A plan that doesn’t have people on board is just words. We want people to really excited about the work we do – building communities is the way we make the world a better place.Infrastructure:
To build a community is a collaborative effort.
New people will join your community and won’t know what it is or how they can contribute. They want to see that this is a community that is eager to include them – this is the marketing part of things. Next they’re on the ‘on ramp’ in to your community. To get people on the on ramp you want to make it clear that people are critical to what we’re doing and that we want them participate. The next step is to get those community members to develop skills. This is more than providing tools to help people learn, but including instructions on how to participate. People don’t want to read reams of information – we live in the time of twitter and Facebook – we need to provide efficient instructions – quick bullet points. Once our new members have learned how to contribute you want them to ‘do something’. To help with this create a list of bite sized bugs – easy bugs to fix that new members are encouraged to fix. Then once they contribute be sure to provide feedback – people want to feel validated.
For your open source project you’re going to see a basic facilities:
- communication channels
- collaborative editing / knowledge base (wiki)
- code hosting
- issue tracking
- news delivery (blog)
- social media
Jono shared his list of recommendations for these different tools:
The one tool missing on the slide was issue tracking – Jono says Bugzilla is popular and so is Launchpad.Growth
Growth is about engagement. We want people to become ‘sticky’ – we want them to stick around. Jono’s goal is 66 days. 66 days is how long it takes to develop a habit. So we want to encourage conversation, creation, communications and conduct to get our communities to grow in a healthy way.Measuring Impact
“If you’re not measuring it, it didn’t happen”
Aggregate measurements tell a fuller story than KPIs (single number to tell how well something is working). KPI is something like there are a 1000 people on the forum, but an aggregate measurement is something like levels where at level 1 you have to spent X amount of time on the site, participate in X topics, etc etc etc. So then when you say you have 500 level 1 members on your site you know what that means.
What you’re looking for are the stories, the patterns and the trends. If you want to identify a great community member is – look a the whole of their contribution – not just how much code the contribute, but how they participate in discussions as well. Come up with a scale for your community.
Quality is way more important than quantity. Having lots of data is not more important than providing quality data. The data is there to show outcomes and outcomes are about patterns and trends not numbers. You want to illustrate the practical ways that you have succeeded in your community.
Our measurements might show that we failed – and that’s okay. You need to fail and learn from it and improve upon things. Don’t let the fear of failure stop you from measuring the impact of your community. Seeing “failure” in your data lets you realign your plans and community to figure out how to succeed at your goals.Reading recommendations:
Abundance: The Future Is Better Than You Think – by Peter H. Diamandis and Steven Kotler
Art of Community by Jono Bacon (of course)
The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change by Stephen R. Covey
Making Things Happen: Mastering Project Management by Scott Berkun
The Starfish and the Spider: The Unstoppable Power of Leaderless Organizations by Ori Brafman and Rod A. Beckstrom
- Keynote: Licensing Models and Building an Open Source Community
- How to not do support
- Being a woman in an open source community
The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress.
This post is part of a series about digital preservation training informed by the Library’s Digital Preservation Outreach & Education (DPOE) Program. Today I’ll focus on an exceptional individual, Danielle Spalenka, Project Director for the Digital POWRR Project. Prior to managing Digital POWRR, she was the Curator of Manuscripts for the Regional History Center and University Archives at Northern Illinois University.
Barrie: Danielle, first I’d like to applaud the POWRR Project for all its efforts to provide practical digital preservation solutions for low-resourced institutions. For those that aren’t familiar, can you provide a brief overview and recount some of the highlights of the project?
Danielle: Thank you very much Barrie! We are really proud of all that has been accomplished. The Digital POWRR project really began because of a failed attempt to apply for a major digitization grant. The two Co-PI’s of the project, Lynne Thomas and Drew VandeCreek, wanted to digitize a collection of dime novels in the Rare Books and Special Collections department at Northern Illinois University. They applied for an Institute of Museum and Library Services (IMLS) grant to digitize the novels, only to be rejected because they did not have a digital preservation plan built into their proposal. They realized that they probably weren’t the only medium-sized or under-funded institution with this same problem. What resulted was a National Leadership Grant to investigate the problem of, and potential solutions for, digital preservation at institutions with restricted resources. And that’s how POWRR (Preserving digital Objects with Restricted Resources) was born.
From 2012 to 2014, five institutions in Illinois – Northern Illinois University (serving as the lead), Illinois State University, Western Illinois University, Chicago State University, and Illinois Wesleyan University – participated in the study. We investigated, evaluated, and recommended scalable, sustainable digital preservation solutions for libraries with smaller amounts of data and/or fewer resources. During the course of the study, Digital POWRR Project team members realized that many information professionals were aware of the risk of digital object loss but often failed to move forward because they felt overwhelmed by the scope of the problem.
Team members prepared a workshop curriculum based on the findings of the study and presented it to several groups of information professionals as part of the project’s dissemination phase. Demand for the workshops was high – registration filled up quickly and created a long waiting list of eager professionals trying to get into the workshops. Towards the end of the project, organizations of information professionals were still reaching out to team members to bring the workshop to their area. We applied for a grant from the National Endowment for the Humanities Division of Preservation and Access to continue giving the workshops, and in January 2015 received funding from the NEH to extend the reach of the Digital POWRR workshops. That is when I came on board as the project director, replacing Jaime Schumacher, who is now a Co-PI on the project with Drew and Lynne.
In addition to the workshop, another highlight from the project has been the publication of a white paper that has been widely read. The white paper recently won the Preservation Publication Award from the Society of American Archivists, which we are really excited about. Our project team traveled across the country and around the world to present the findings from the study. We look forward to continue traveling across the country to provide the workshop (for free!) through the end of 2016, thanks to funding from the NEH.
Barrie: It must be very exciting to be moving POWRR into a new phase. What’s been accomplished to date?
Danielle: Since we received funding from the NEH in January 2015, we have made a few changes to the workshop based on evaluations from previous participants. We have worked with several regional organizations of information professionals who provided letters of support in our grant application to schedule and promote individual workshops. We’ve done workshops in two locations so far, and were able to provide some travel scholarships that allowed institutions with very limited funds to send a representative to the workshop!
Barrie: I know you just wrapped up back-to-back workshops in Portland, OR. What other cities are hosting POWRR “From Theory to Action” workshops?
Danielle: Our next workshop will be another back-to-back workshop in Albany, NY in October, and we’ll be traveling to Deadwood, SD in November. In 2016, so far we have workshops scheduled in Little Rock, AR (April 2016), St. Paul, MN (June 2016) and San Antonio, TX in July 2016. I’m working to confirm dates in a few other locations, including Atlanta. Depending on our budget for 2016, we hope to go to more locations. I’ve had requests to come to Philadelphia, Montana, New York City, California, and even Alaska! As I continue scheduling workshops, I encourage anyone interested in attending to visit our website for updates.
I would like to thank several organizations that have helped us make sure the workshop remains free: the Black Metropolis Research Consortium; Northwest Archivists, Inc.; the Sustainable Heritage Network; Mid-Atlantic Region Conference of Archivists; the East New York Chapter of ACRL; the Midwest Archives Conference; the Digital Curation Interest Group of ACRL; the Oberlin Group; and the American Association for State and Local History.
Barrie: I understand that your team looked at the DPOE Train-the-Trainer Workshop training materials in developing your own curriculum, as well as other digital preservation training offerings. Can you share some of your observations?
Danielle: When we first started developing the workshop, we did look at what was currently being offered. We wanted the workshop to follow best practices and standards presented by digital preservation instruction currently available. Many of our project team members attended workshops and training sessions, including the DPOE Train-the-Trainer and offerings from the Society of American Archivists (SAA). We also talked to several digital preservation instructors, including Chris Prom and Jackie Esposito – who teach some of the Digital Archives Specialist courses offered through SAA – and Liz Bishoff.
Our review of the landscape of digital preservation instruction was that it is largely aimed at an audience beginning to come to grips with the idea that digital objects are subject to loss if we don’t actively care for them. There are lots of offerings discussing the theory of digital preservation – the “why” of the problem – and we found that there were limited opportunities to learn the “how” of digital preservation, both on the advocacy and technical sides. We also found that other great offerings, like the Digital Preservation Management Workshop Series based at MIT, had a tuition fee that was unaffordable for many prospective attendees, especially from under-funded institutions. Our goal in this phase is to make the workshops free to attend.
A major goal of the workshop is to discuss specific tools and provide a hands-on portion so that participants could try a tool that they could apply directly at their own institutions. We found that hands-on instruction for a specific, basic digital preservation tool, and critical overviews of other available tools that we tested, are largely absent from some course offerings. In the case of DPOE’s Train-the-Trainer Workshop, we liked how it focuses on understanding digital preservation conceptually by describing its individual steps, and also clarifying the difference between preservation and access. Our workshop diverges from the DPOE curriculum by directly training front-line practitioners and providing a critical overview of how digital preservation services and tools actually relate to the steps, their effective use in a workflow, and how to advocate for implementation.
Barrie: Another fantastic outcome of your project is the POWRR Tool Grid, which I read covers over 60 tools. Let’s say I’m just getting up to speed and found the grid a little overwhelming, so, what would be a good place to start?
Danielle: I’m glad you mention the tool grid, because a lot of work went into its creation. I want to mention that the POWRR Project is no longer maintaining the tool grid. When the first phase of the project ended in 2014, so did our ability to maintain it. Instead, we have thrown our support behind COPTR (Community Owned digital Preservation Tool Registry). They have produced the POWRR Tool Grid v2, which combines the form and function of our original tool grid with the sustainability provided by the COPTR data feed.
For those just starting out, I recommend first considering what type of tool you might be interested in. Are you looking for a tool that can help process your digital materials? Are you looking for storage options? How about a tool or service that can do everything? Looking at the specific function of a tool might be a good place to understand the wide variety of tools better.
While we don’t endorse or recommend any specific tool or service, I do encourage people to take a look at the tools we cover in-depth in the workshop. The reason being that we are more familiar with these tools from our testing phase of the project. For help with front-end processing, I suggest looking at Data Accessioner and Archivematica. I have heard good things coming from BitCurator, which might be of particular interest for those interested in digital forensics. For those more interested in storage, services like MetaArchive, DuraCloud, and Internet Archive would be good to investigate. There are very few services that pretty much do it all (at least in the price range for our target audience), but Preservica and the new ArchivesDIRECT are two we have investigated and discuss in our workshops as potential options for institutions with restricted resources.
Barrie: Any other advice on developing a skillset for managing digital content you’d like to share with the readers?
Danielle: A number of tools and services offer free webinars and information sessions to learn more about a specific tool. Some also offer free trial versions that allow you to gain hands-on experience to see if it might work at your institution. You can download and play with the many open-source tools out there to gain some hands-on experience.
Remember that digital preservation is an incremental process, and there are small steps you can take now to start digital preservation activities at your own institution. You don’t have to feel like an expert to begin! And finally, remember you are not alone! One thing we’ve learned through the study and by traveling to the various workshops is that there are many practitioners who recognize the need for digital preservation but have yet to engage in these activities. An easy way to get started is to see what others are doing and talking about. You can ask a question on the Digital Preservation Q&A forum. You can also learn about the latest in digital preservation activities through blogs like The Signal and the blog Digital Preservation Matters. And finally, you can attend a free POWRR workshop!
11:00 - Fedora 4
1:00 - GIS /Documentation (two rooms)
2:00 - Dev Ops
3:00 - UI
4:00 - Metadata These will take place at the Robertson Library in the Language Learning Lab (don't worry, there will be lots of signs!). Unlike the Hackfest, registration is not required. Workshops A sign-up page for the workshops is available here. Your choices will not be set in stone, but we would like to get some general numbers to plan for, so your participation is much appreciated. We also have some homework for Wednesday and Thursday, in general and for a few of the specific workshops: General Install an Islandora Virtual Machine on your laptop, and bring it with you. You will get far more out of the workshops if you can play along on your own laptop. We recommend a bare minimum of 4GB of RAM to run the standard Islandora VM. Installing it in advance and making sure you can run it is highly recommended. The Islandora 7.x-1.5 VM is available for download here. We can provide help before the workshop to get the VM up and running if you are having difficulties. An informal "installfest" will take place at 4:30 on Tuesday, August 4 for anyone who wants a little guidance or troubleshooting. If you are not able to bring a laptop that can run the VM, please let me know. I can set you up with an online sandbox that will suffice for most of the Admin and Intermediate level workshops. Specific If you are planning to attend Islandora Development 101: This workshop (and likely many of the other Developer Track workshops) will work with HEAD, so the 7.x-1.5 VM will not match up. Please install the Islandora Vagrant. If you are planning to attend Fedora 4: The workshop will include a hands-on section using using a Fedora 4 virtual machine image, so please follow these instructions to get the VM up and running on your laptop before the workshop. NOTE: The VM uses 2GB of RAM, so you will need a laptop with at least 4GB of RAM to run it. Depending on your laptop manufacturer, you may also need to enable virtualization in the BIOS.
- Download and install VirtualBox: https://www.virtualbox.org/wiki/Downloads
- Download and install Vagrant: http://www.vagrantup.com/downloads.html
- Download the 4.2.0 release of the Fedora 4 VM: https://github.com/fcrepo4-labs/fcrepo4-vagrant/releases/tag/fcrepo4-vagrant-4.2.0
- Note that you can either clone the repository to your desktop using git or just download the ZIP [https://github.com/fcrepo4-labs/fcrepo4-vagrant/archive/fcrepo4-vagrant-4.2.0.zip] and unzip it
- Using a Command Line Interface, navigate to the VM folder from step 3 and run the command: vagrant up
- Note that this step will take a while as the VM downloads and installs a full virtual environment
- Test the VM by opening your web browser and navigating to: http://localhost:8080/fcrepo
In my last post, I talked about the sprint review meeting; this month we look into planning a sprint. As I said last time, this meeting should be separate from the review, both to differentiate the two and to avoid meeting fatigue.
Sprint planning takes into account the overall project plan and the results of the previous sprint (as presented in the sprint review) and sets out a plan for the next week discrete development time period.
The timing of the sprint planning meeting is the subject of much discussion, and different teams adopt different conventions based on what they feel is the best fit for their particular process. Personally, I prefer to hold the planning meeting on the same day as the review. While this puts pressure on the Product Owner to quickly adjust planning materials based on the outcome of the review, it has several important advantages:
- The knowledge acquired during the review meeting is fresh on everyone’s mind. Given that sprints typically end on a Friday, waiting until after the weekend to plan the next iteration can lead to loss of organizational memory.
- During the time between the review and planning meeting, in theory, no work can be performed (because development priorities have not been set), so minimizing that time is crucial to improved productivity.
- Given that Agile philosophy aims to decrease overhead, having all the necessary meetings in one day helps to contain that part of the development process and focus the team on actual development work.
My ideal sprint boundary process is as follows: have the sprint review in the morning, then take a break (the sprint retrospective can happen here). After lunch, reconvene and hold the planning meeting.
The planning meeting should be less open than the review, as it is more concerned with internal team activities rather than disseminating information to as wide an audience as possible. Only team members and the Product Owner should be present, and the Product Owner may be dismissed after requirements have been presented.
Before the meeting begins, the Product Owner should spend some time rearranging the Product Backlog to reflect the current state of the project. This should take into account the results of the review meeting, so if both happen on the same day the PO will need to be quick on her feet (maybe a kind developer can drop by with some takeout for lunch?).
The planning meeting itself can be divided into two major parts. First, the team will move as many user stories from the backlog into the sprint as it thinks it can handle. Initially this will take some guessing in terms of the team’s development velocity, but as sprints come and go the team should acquire a strong sense for how much work it can accomplish in a given time period. Because the PO has updated the backlog priorities, the team should be able to simply take items off the top until capacity is reached. As each item is moved, the team should ask the PO as many questions as necessary to truly understand the scope of the story.
One the sprint bucket is full, the team will move on to the second part of the exercise, which involves taking each item and breaking it down into tasks. The PO should not be needed for this part, as the team should have collected all the information it needs in the first part of the meeting. When an item has been fully dissected and broken down, individual team members should take responsibility for each of the tasks to complete, and dependencies should be identified and documented.
It’s important to remember that sprint planning is not driven by how much work is left in the backlog, but by how much the team can realistically accomplish. If you have 3 sprints left and there are 45 user stories left in the backlog, but the team’s velocity is 10 stories per sprint, you can’t just put 15 stories in the sprint; at that point the team needs to renegotiate scope and priorities, or rethink deadlines. Pushing a team beyond its comfort zone will result in decreased software quality; a better approach is to question scope and differentiate key features from nice-to-haves.
If you want to learn more about sprint planning meetings, you can check out the following resources:
- Vikrama Dhiman’s slideshare presentation.
- Derek Huether’s simple cheat sheet.
- A look at how Atlassian, the company the makes JIRA, does their own sprint planning.
I’ll be back next month to discuss the sprint retrospective.
What are your thoughts on how your organization implements sprint planning? How do you handle the timing of the review/retrospective/planning meeting cycle? What mechanisms do you have in place to handle the tension between what needs to be done and what the team can accomplish?
“BIS-Sprint-Final-24-06-13-05” image By Birkenkrahe (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons.
I recently decided I need to spend some time after each work day writing up my thoughts on work stuff. If nothing else, these can serve as an outlet and time for improved pondering. So please forgive the winding, rambling nature these posts will take. From these, however, I hope to find little nuggets of ideas that I can actually take and make into decent presentations, workshops, articles, whatever. As ever, I welcome your feedback and questions on these thoughts.
Recently, I made an OpenRefine Reconciliation Service for Geonames, and in particular, for reconciling a metadataset using Library of Congress Authority terms (think LCSH and LCNAF accessible via id.loc.gov) and wanting to pull back Geonames identifiers (URIs, rather, as they are in that vocabulary) and coordinates following ISO 6709 standard.
A number of interesting (perhaps only to me) points came up while working on this project and using this reconciliation service (which I do quite often, as we are migrating legacy DC/XML from a variety of platforms to MODS/XML, and part of this involves batch data enhancements). The questions and points below are what in particular I hope to address in this post:
- Can we standardize, for lack of a better work, the process by which someone creates an OpenRefine reconciliation service based off of a REST API on top of any vocabulary? Also, API keys are the devil.
- More specific to geographic terms/metadata, why do I feel the need to use Geonames? Why not just use LC Authorities, considering they’ve ‘pulled in’ Geonames information, matching URIs, in batch?
- Do we really want to store coordinates and a label AND a URI (and whatever else) for a geographic term within a descriptive metadata record element? Does it even matter what we want when we have to consider what we need and what our systems can currently handle?
- As a follow-up, where the heck would we even put all those datapoints within anything other than MODS? What are some of the RDF metadata models doing, and how can folks still working with XML (or even MARC) prepare for conversion to RDF? Some ideas on the best practices I’m seeing put about, as well as a few proposals for our own work.
And other various points that come up while I’m writing this.Lets All Make OpenRefine Reconciliation Services!
Professionally, I’m in some weird space between cataloging, general data stuff, and systems. So don’t take my word on anything, as usually I’m just translating what already exists in one subdomain to a different subdomain (despite the fact that library domains just assume they can already talk to each other, often).
I start with this though to say I’m not a developer of any sort, yet I was able to pull together the Geonames OpenRefine Reconciliation Service via trial and error, knowledge of how Geonames REST API works (in particular, how queries are structured and data returned), and also by building off all the great community-sourced work that exists. In particular, Ted Lawless wrote a great FAST OpenRefine Reconciliation Service that I used to create something for Geonames. There are some OpenRefine Reconciliation Service templates for others to build off of - in particular, a very simple one in python, some other examples written in php - and an OpenRefine Reconciliation Service API wiki document that you should take with a grain of salt, as it needs seriously revisions, updates, and expansions (which, er, maybe I should help with). This is just scratching the surface of OpenRefine reconciliation examples, templates, and documentation.
However, once you get into building or modifying an existing reconciliation service (recon service from this point on for the sake of my typing hands), you might run into some of the same roadblocks and questions I did. For example, with the Geonames recon service, I wanted in particular to return coordinates for a matched name. However, I did not want to do this via matching with a name, pulling the entire record for that name serialized however (json, xml, doesn’t matter) into a new column, then parsing that records column to find the particular datapoints I wanted to add for each row. This method of ‘reconciliation’ in OpenRefine - seen when someone adds a new column in OpenRefine by retrieving URLs generated from the originating column values - takes far longer than using a recon service, is not scalable unless your metadatasets are amazingly consistent, and offers more chances for errors as you have to parse the records in batch for each datapoint therein you want to pull out (otherwise, you’re spending so much time on each value, you might as well have faceted the column then searched manually in your authority of choice for the datapoints you’re hoping to retrieve). Yet, the recon service templates and the OpenRefine Recon Service metadata (explained somewhat in that above wiki page) did not offer me a place to store and return the coordinates within the recon service metadata (without a hack I didn’t want to make).
As I’m writing this, I realize that a post detailing all the ways one can use OpenRefine to do ‘reconciliation’ work would be helpful, so we know we are comparing apples to apples when discussing. For example, another way that reconciliation can happen in OpenRefine - using the now unsupported but still viable and immensely useful DERI RDF Extension - is yet another approach that has its merits, but could possibly muddle someone’s understanding of what I’m discussing here: the Reconcilation Service API script/app, in my case built in python and working with a REST API.
For what it’s worth, I’d really like to have an OpenRefine in LODLAM call on the different reconciliation services, examples, and how to build them. If you’re interested in this, let me know. I’m happy to talk part of this about my own experiences, but I’d like to have at least 1 other person talk.
Regardless, back to building the Geonames recon service, I could get a basic recon service running by plugging in the Geonames API information in place of the FAST API information in Lawless’ FAST Recon service code, with minor modifications for changes in how Geonames returns data, and the inclusion of an API key. The requirement of an API key made this work that much harder, because it means folks need to go in an add their own (or use the sample one provided and hit the daily API calls limit rather quickly) in the core flask app code. I’m sure there are ways to have the user submit their own code via the CLI before firing up the service, or in other ways, but I kept it as simple as possible since this annoyance wasn’t my main concern.
My main concern with this API was getting good matches with metadata using terms taken from the Library of Congress Authorities, in particular LCSH and LCNAF, and returning top matches along with coordinates (and the term and term URI from Geonames, luckily built into the recon service metadata by default). The matching for terms use the fuzzy-wuzzy library, normally seen in most python Openrefine recon apps regardless. The coordinates for a match were simply appended to the matched term with a vertical bar, something easy to split values off of in OpenRefine (or to remove via the substring function if you happen to not want the coordinates).
But the first tests of this service described above returned really poor results (less than 10 direct or above 90% matches for ~100 record test metadatasets), considering the test metadatasets were already reconciled, meaning the subject_geographic terms I was reconciling were consistent and LCNAF or LCSH (as applicable) form. This is when I took a few and searched in Geonames manually. I invite you to try this yourself: search Knoxville (Tenn.) in http://www.geonames.org. You get no matches to records from the Geonames database, and instead (as is the Geonames default), have results from Wikipedia instead. This is because Geonames doesn’t like that abbreviation - and my sample metadatasets, all taken from actually metadatasets here at work - are all United States-centric, particularly as regards subject_geographic terms. Search http://www.geonames.org now for Knoxville (Tennessee), or Knoxville Tennessee, or Tennessee Knoxville - the first result will be exactly what you’re searching.
What to do, at least in the context of OpenRefine recon services? Well, write a simple python script that replaces those LC abbreviations for states with the full name of the state, then searches Geonames for matches. See that simple, embarassingly simple solution here: http://github.com/cmh2166/geonames-reconcile/blob/master/lc_parse.py. Yep, it’s very basic, but all of a sudden, the reconciliation service was returning much, much better results (for my tests, around 80% direct matches). I invite others to try using this recon service and return your results, as well as other odd Library of Congress to Geonames matching roadblocks for more international geographic metadata sets.
There are other things I wish to improve in the Geonames recon service - some recon services offer the ability, if the top three results returned from reconciliation are not what you wanted at all, to then search the vocabulary further without leaving OpenRefine. I played around a bit with adding this, but had little luck getting it to work. I also want to see if I can expand the OpenRefine recon service metadata to avoid the silly Coordinates hack. I’d love to show folks how to host this somewhere on the web so you do not need to run the Geonames recon service via the simple python server before being able to use it in OpenRefine - however, the API key requirement nips this in the bud.
More to the point though, I want to figure out how better to improve Geonames matching for other, ‘standard’ library authority sources. It seems to me like something is fundamentally off with library data work with the authority services are, from an external data reconciliation viewpoint, so siloed. Not at all what we want if going towards a library data on the web, RDF-modeled, world. It seems. to me.Geonames versus Library of Congress Authorities
So this brings me to two questions, both of which I got from various people hearing my talk about this work: why not just reconcile with the Library of Congress Authorities (which have been matched with Geonames terms via some batch enhancements recently and should have coordinates information now, as it is a requirement for geographic names authoirties in RDA)? And, alternatively, why not just match with Geonames and use their URI, leaving out LCSH for subject_geographic or other geographic metadata (and using it instead for personal/corporate names that aren’t geographic entities, or topical terms, etc.)?
I think this shows better than anything I could say a fundamental divide in how different parts of library tech world see “Authority” data work.
Here is why I decided to not use the Library of Congress Authorities entirely for geographic reconciliation:
- The reconciliation with Geonames within LCNAF/LCSH is present, but is also a second level of work that undermines my wanting to make a helpful, fast, error-averse Openrefine recon service. This is not to say that I don’t think linked authorities data shouldn’t have these cross-file links; of course they should, be also read my bit below on descriptive versus authority record contents.
- The hierarchies in LCNAF/LCSH are…lacking. I’d like to know that, for example, Richmond (Va.) is in Virginia (yes, I know it says Va. in that original heading, but where is the link to the id.loc.gov record to Virginia? It’s not there), which is in United States, etc. etc. Geonames has this information captured.
- When there are coordinates, even if matched with Geonames, it is often stored in a mads:citation-note, without machine-readable data on how the coordinates are encoded. I know I want to pull ISO 6709, but not have to check manually the coordinates for each record to get the information from the right statement and check the encoding.
Note: I’d really love to pull the Library of Congress Name Authority File linked open dataset from id.loc.gov and test what my limited experience has led me to believe on LCNAF lacking consistent Geonames matching, coordinates, and hierarchies - particular for international geographic names, as my own work leads me often to work just with geographic names from the United States.
Note: Because I don’t think the Library of Congress Authorities are the best currently for geographic metadata DOES NOT MEAN I do not use them all the time, appreciate the work it took to build them, or think they should be depreciated or disappear. What I’d like to see is more open ways for the Library of Congress Authorities to share data and data modeling improvements with 1. the library tech community already working on this stuff 2. other, non-traditional ‘authorities’ like Geonames that have a lot to offer. Some batch reconciliation work pulling in limited parts of existing, non-traditional ‘authorities’ without a mind to how we can pull that data out in matchine-actionable reconciliation processes hasn’t really helped boost their implementation in the new world of library data work.
Yet, I am really, really appreciative of all the work the Library of Congress folks do, I wish they were so understaffed, and hell, I’d give my left arm to work there except I’m not smart enough.
Alright, moving on…why not just use the Geonames URIs and labels alone, if I feel this way about the Library of Congress Authorities and geographic terms? The simple reasons is: Facets. Most subject terms are being reconciled, if they weren’t already created using, the LCNAF and LCSH vocabularies. LCSH and LCNAF makes perfect sense as the remaining top choice for topical subjects and names (although there are other players in the non-traditional names authorities field, which I’ll discuss in some other post maybe). As our digital platform discovery interface, as well as our primary library catalog/all data sources discovery interface, are not built currently to facet geographic subjects separate from topical subjects (separate from names as subjects, etc. etc.). So for the sake of good sorting, intelligible grouping, etc., LC Authorities for the terms/labels remain the de facto.
Additionally, I’m not sold that the Library of Congress won’t catch on to the need to open up more to cross-referencing and using non-traditional data sources in more granular and machine-actionable ways. They seem to be at work on it, so I’d prefer to keep their URIs or labels there so reconciliation can happen with their authorities. One must mention too that Geonames does not store references to the same concepts but in the Library of Congress Authorities, so keeping just the Geonames term and URI would make later reconciliation with the Library of Congress Authorities a pain (not to mention, search ‘Knoxville Tennessee’, the preference for Geonames queries, in id.loc.gov, and see all the results you get that aren’t ‘Knoxville (Tenn.)’. ARgh.)
What to do, what to do… well, build a Geonames recon service that takes Library of Congress Authorities headings and returns Geonames additional information, for now.Descriptive Metadata & Authority ‘Metadata’
Let me start this section by saying that Authorities are within the realm of ‘descriptive metadata’, sure. However, when we say ‘descriptive metadata’, we normally think of what is known in Cataloging parlance (for better or for worse) as bibliographic metadata. Item-specific metadata. This digital resource, or physical resources, present in the catalog/digital collection, that we are describing so you can discovery, identify, access (okay okay access metadata isn’t descriptive metadata).
What about authority data? We see a lot of authority files/vocabularies are becoming available as Linked Open Data, but how do we see these interacting with our descriptive metadata beyond generation of ‘authorized’ access point URIs and perhaps some reconciliation, inference, tricks of the discovery interface via the linking and modeling? The Linked Open Data world is quickly blurring the demarcation between authority and non-authority, in my opinion - and I find this really exciting.
So, returning to geospatial metadata, it is not my preference to store coordinates, label, URI(s) - maybe even multiple URIs if I really want to make sure I capture both Geonames and LCNAF - in the descriptive record access point. That’s to say, I’m not terribly excited that, in MODS/XML, this is how I handle the geospatial metadata involved presently:<mods:mods> <mods:subject> <mods:geographic authority="naf" valueURI="http://id.loc.gov/authorities/names/n79109786">Knoxville (Tenn.)</mods:geographic> <mods:cartographics> <mods:coordinates>35.96064, -83.92074</mods:coordinates> </mods:cartographics> </mods:subject> </mods:mods>
Sure, that works, but it can be hell to parse for advanced uses in the discovery layer, as well as to reconcile/create in the descriptive metadata records in batch, and to update as information changes. Also, where is the Geonames authority/uri? Can, and more importantly, should we repeat the authority and valueURI attributes? Break the MODS validation and apply perhaps an authority attribute to the coordinates element, stating from where we retrieved that data? Where is the attribute on either cartographics or coordinates stating what standard the coordinates are following for the machine parsing this to know?
Also, more fundamentally, how much of this should be statements in an Authority record? Wouldn’t you rather have (particularly if you’re soon going to MODS/RDF or perhaps another model in RDF that is actually working at present) something that just gives the locally-preferred label and a valueURI to 1 authority source that can then link to other authority sources in a reliable and efficient manner? Perhaps link to the URI for the Geonames record, then use the Library of Congress Authorities Geonames batch matching to pull the appropriate, same as Library of Congress Authority record that way.
So this is something I’ve been thinking about and working on a lot lately: creating an intermediate, local datastore that handles authority data negotiation. Instead of waiting for LC Authorities to add missing terms to their database (like Cherokee town names, or Southern Appalachia cross-references for certain regional plants, or whatever), or parseable coordinates from Geonames, or for Geonames to add LC Authorities preferred terms or URIs, or whatever other authority you’d like to work with but has XYZ issues, have a local datastore working based off an ontology that is built to interact with the chosen authorities you want to expand upon, links to their records, but puts in your local authority information too. It is a bit of a pipe dream at the moment, but I’ve had some small luck building such a thing using Skosmos, LCNAF, LCSH, Geonames, and Great Smoky Mountain Regional Project vocabularies. We’ll see if this goes anywhere.
Basically, returning to the point of this post, I want the authority data to store information related to the access point mentioned in the descriptive record, not the descriptive record storing all the information. There are data consistency issues as mentioned, as well as the need then for discovery interfaces being built for ever more complex data models (speaking of XML).
However, for the time being, the systems I work with are not great at this Authority reconciliation, so I put it as consistently as I can all in that MODS element(s).
I should note, as a final note I think for this post, that I do not add these URIs or other identifiers as ‘preparation for RDF’. Sure, it’ll help, but I’m adding these URIs and identifiers because text matching has many flaws, especially when it comes to names.Things to follow-up on:
- Getting an OpenRefine Recon Service call together
- Discussing some of the geographic data models out there, as well as what a person working with something other than MODS can do with geographic or other complex authorized accent point data
- A million other things under the sun.
The Senate made great strides this week to ensure needed reform to the Elementary and Secondary Education Act (ESEA). After much debate and across the aisle discussion, yesterday the Senate overwhelmingly passed S. 1177, the Every Child Achieves Act, by a vote of 81-17.
As we discussed in a previous post, the inclusion in the bill of the bi-partisan Reed-Cochran amendment, makes S. 1177 a monumental step forward for schools, their libraries and the millions of students they serve. Most fundamentally and importantly, the amendment (approved 98-0) makes explicit that ESEA funds may be used to support school libraries and “effective school library programs” in multiple ways.
As detailed in ALA’s recent press statement, “The Every Child Achieves Act of 2015 contains several provisions in support of libraries, including state and local planning requirements related to developing effective school library programs and digital literacy skills; professional development activities for school librarians; partnership opportunities for libraries; and competitive grants for developing and enhancing effective school library programs.”
Now that both the House (H.R. 5) and the Senate have completed their bills, the next step will be the appointment of members from both chambers to a conference committee to reconcile differences between the two pieces of legislation. That new bill then must be approved again by both the House and Senate.
Although we do not anticipate this happening before the fall, please do stay tuned and watch for legislative alerts! Your voices will be needed at that time to remind your Members of Congress about the importance of school libraries and how essential it is that the provisions supporting school libraries remain in the final bill.
The post Victory for school libraries as Senate passes education bill appeared first on District Dispatch.
To buy, or not to buy–that is the question:
Whether ’tis nobler in the end to suffer
The slings and arrows of outrageous vendors
Or to take up coding against a sea of problems
And by opposing end them. To code, to commit–
No, more–and by a bug fix to say we end
The heartache, and the thousand natural shocks
That our code is heir to. ‘Tis a consummation
Devoutly to be wished. — With deepest apologies to William Shakespeare
I bet you didn’t know that Hamlet was a librarian. A librarian who was just as pinned on the horns of a software sourcing dilemma as many of us are today. This was one of the takeaways from the OCLC Research Library Partnership June meeting in San Francisco. Previous posts have summarized other aspects of the discussion.
The classic “build vs. buy” debate is certainly not unique to libraries or the institutions of which they are a part, but we all must wrestle that monster. When are vendor solutions good enough? If there is no good solution, how soon might there be one? What happens if we make a major investment in developing our own software solutions and they are eclipsed by the market in 3-5 years? Do we really want to “build tomorrow’s legacy systems today?” (quoted by attendee David Seaman, who is leaving Dartmouth to become the Dean of Libraries and University Librarian at Syracuse University). What if no vendor understands our problems and no commercial solution will ever be good enough?
It was clear from the presentations that this decision employs a unique calculus that takes a wide variety of factors into account. Here are just a few:
- Local ability (staff with coding skill).
- Available development resources.
- Having a problem worth solving that remains unsolved by any commercial option.
- The potential, or lack thereof, of the commercial market solving your problem.
- Whether keeping control over one’s data is important.
- Whether providing leadership in a new area is important to your institution.
For UCLA, Ginny Steele reported, their answers to questions like those above led them to develop their own academic information and profile system. Dubbed Opus, it is “the information system of record for academic appointees at UCLA” and is just now being rolled out, with additional features to come. It will be interesting to see how it works, and how it compares to any similar commercial systems.
A “middle ground” solution that a number of libraries are adopting, including the Dartmouth Library as reported by David Seaman, is to come together with other institutions to collaborate on software development. One of the most robust of such efforts in the library space is the Hydra Project which Dartmouth chose to join. Such collective efforts have many of the benefits of writing your own solutions while minimizing some of the drawbacks.
Whichever path you choose to take it was also clear from a number of speakers that identifiers are an important part of a modern technological infrastructure. Identifiers can serve as a kind of “glue” that enables disambiguation and linkages to other data sources, among other benefits. Wouter Gerritsma, Manager Digital Services & Innovation at VU University Amsterdam, probably made these points as well as any of the speakers that day, although the need for identifiers wound like Ariadne’s thread through the meeting.About Roy Tennant
Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.Mail | Web | Twitter | Facebook | LinkedIn | Flickr | YouTube | More Posts (90)
The primary conclusion was that the models of FRBR and BIBFRAME, with their separation of bibliographic information into distinct entities, are too inflexible for general use. There are simply too many situations in which either the nature of the materials or the available metadata simply does not fit into the entity boundaries defined in those models. This is not news -- since the publication of FRBR in 1998 there are have numerous articles pointing out the need for modifications of FRBR for different materials (music, archival materials, serials, and others). The report of the audio-visual community to BIBFRAME said the same. Similar criticisms have been aimed at recent generations of cataloging rules, whose goal is to provide uniformity in bibliographic description across all media types. The differences in treatment that are needed by the various communities are not mutually compatible, which means that a single model is not going to work over the vast landscape that is "cultural heritage materials."
At the same time, folks in this week's informal discussion were able to readily cite use cases in which they would want to identify a group of metadata statements that would define a particular aspect of the data, such as a work or an item. The trick, therefore, is to find a sweet spot between the need for useful semantics and the need for flexibility within the heterogeneous cultural heritage collections that could benefit from sharing and linking their data amongst them.
One immediate thought is: let's define a core! (OK, it's been done, but maybe that's a different core.) The problem with this idea is that there are NO descriptive elements that will be useful for all materials. Title? (seems obvious) -- but there are many materials in museums and archives that have no title, from untitled art works, to museum pieces ("Greek vase",) to materials in archives ("Letter from John to Mary"). Although these are often given names of a sort, none have titles that function to identify them in any meaningful way. Creators? From anonymous writings to those Greek vases, not to mention the dinosaur bones and geodes in a science museum, many things don't have identifiable creators. Subjects? Well, if you mean this to be "topic" then again, not everything has a topic; think "abstract art" and again those geodes. Most things have a genre or a type but standardizing on those alone would hardly reap great benefits in data sharing.
The upshot, at least the conclusion that I reach, is that there are no universals. At best there is some overlap between (A & B) and then between (B & C), etc. What the informal group that met this week concluded is that there is some value in standardizing among like data types, simply to make the job of developers easier. The main requirement overall, though, is to have a standard way to share ones metadata choices, not unlike an XML schema, but for the RDF world. Something that others can refer to or, even better, use directly in processing data you provide.
Note that none of the above means throwing over FRBR, BIBFRAME, or RDA entirely. Each has defined some data elements that will be useful, and it is always better to re-use than to re-invent. But the attempts to use these vocabularies to fix a single view of bibliographic data is simply not going to work in a world as varied as the one we live in. We limit ourselves greatly if we reject data that does not conform to a single definition rather than making use of connections between close but not identical data communities.
There's no solution being offered at this time, but identifying the target is a good first step.
The Digital Public Library of America is pleased to announce the appoint of Niko Pfund to its distinguished Board of Directors. Pfund is the Global Academic Publisher of Oxford University Press, and President of Oxford University Press, USA.
“The principles of education, dissemination, and access that lie at the core of DPLA’s mission are unimpeachable,” Pfund said. “I’m very pleased to be involved in such a valuable, even noble, enterprise.”
Pfund, a graduate of Amherst College, began his career at Oxford in 1987 as an editorial assistant in law and social science before moving to New York University Press in 1990. At NYU Press, he was an editor and then editor in chief before becoming director in 1996. He returned to Oxford in 2000 in the role of Academic Publisher and is responsible for oversight of the Press’s scholarly and research publishing across the humanities, social sciences, science, law, and medicine, spanning the Press’s offices in Oxford, New York, and Delhi. A frequent speaker on publishing, scholarship, and media, he has given talks at the Library of Congress, the World Bank, and many colleges, universities, libraries, scholarly conferences, and publishing institutes. In 2012, Pfund was named a Notable Person of the Year by Publishers Weekly.
“Niko’s broad knowledge of publishing and the media will contribute significantly to advancing DPLA’s commitment to providing content of interest to people of all ages and to everyone from scholars to school age children,” said Amy Ryan, President of the Board of Directors.
“We couldn’t be happier that Niko Pfund has agreed to join the DPLA board,” said Dan Cohen, DPLA’s Executive Director. “To have a publisher of his great experience and range will be a tremendous asset to our organization.”
Working closely with Cohen, the Board seeks to fulfill DPLA’s broad commitment to openness, inclusiveness, and accessibility, and it endeavors towards those ends in the best interest of its stakeholders, employees, future users, and other affected parties. The Board supports the DPLA’s goal of creating and maintaining a free, open, and sustainable national digital library resource.
Full biographies of the entire DPLA Board of Directors can be found at http://dp.la/info/about/board-committees/.
The MARC data structure, and the AACR2 rules that usually accompany it, are strange beasts. Every once in a while I’m asked why I get so frustrated with them, and I explain that there are things — strange things — that I have to deal with by writing lots of code when I could be spending my time trying to improve relevancy ranking or extending the reporting tools my librarians use to make decisions that affect patrons and their access.
This is one of those tales.
I’m a systems librarian, which in my case means that I deal with MARC metadata pretty much all day, every day. Coming from outside the library world, it took me a while to appreciate the MARC format and how we store data in it, where appreciate can be read as hate hate hate hate hate.
I find it frustrating to deal with data typed into free-text fields all willy-nilly with never a thought for machine readability, where a question like what is the title is considered a complicated trap, and where the word unique, when applied to identifiers, has to have air quotes squeezing it so hard that the sarcasm drips out of the bottom of the ‘q’ in a sad little stream of liquid defeat.
One of the most frustrating things, though, is when a cataloger has clearly worked hard to determine useful information about a work and then has nowhere to put those data. To wit: date of publication.
Many programmers have to deal with timestamps, with all the vagaries of time zones, leap years, leap seconds, etc. In contrast, you’d think that the year in which something was published wouldn’t be fraught with ambiguity and intrigue, but you’d be wrong. Dates are spread out over MARC records in several places, often in unparsable free-text minefields (I’m looking at you, enumeration/chronology) and occasionally in different calendars.
The most “reliable” dates (see? there are those air-quotes again!) live in the 008 fixed field. Of course, they mean different things depending on format determination and so on, but generally you get four bytes to put down four ASCII characters representing the year. When you don’t know the all the digits of the year exactly, you substitute a u for the unknown numbers.
- 1982 — published in 1982
- 198u — published sometime in in the 1980s
- 19uu — published between 1900 and 1999
So, that’s fine. Except that it isn’t. It’s dumb. It made sense to someone at the time to only allow four bytes, because bytes were expensive. But those days have been gone for decades, and we still encode dates like this, despite the fact that having actual start and endpoints for a known range would be better in every way.
Look at what we lose!
- 1982 or 1983 — 198u (ten years vs. two)
- Between 1978 and 1982 — 19uu (one-hundred years vs. five)
- Between the Civil War and WWI — 1uuu (one-thousand years vs about fifty)
The other day, in fact, I came across this date:
Yup. The work was published sometime between 2000 and 2099. My guess is that it was narrowed down to, say, 2009-2011 and this is what we were stuck with. I’d bet big money that its date of publication isn’t, say, after 2016, unless time travel gets invented in the next few years.
But the MARC format works against us, and once again we throw data away because we don’t have a good place to store it, and I’m spending my time trying to figure out a reasonable maximum based on the current date or the date of cataloging or whatnot when it could have just been entered at the time.
As much as we’d like to pretend otherwise, no one is ever going to go back and re-catalog everything. I can almost stomach the idea that we did this thirty years ago. It drives me crazy that we’re still doing it today.
How about it, library-nerd-types? What do you spend your time dealing with that should have been dealt with at another place in the workflow?? [Image: Calendary Calculator from Nuremberg, 1588; Germanic National Museum in Nuremberg. By Anagoria (Own work) [GFDL or CC BY 3.0], via Wikimedia Commons.]