A couple hours ago, I saw reports from Library Journal and The Digital Reader that Adobe has released version 4.0.1 of Adobe Digital Editions. This was something I had been waiting for, given the revelation that ADE 4.0 had been sending ebook reading data in the clear.
ADE 4.0.1 comes with a special addendum to Adobe’s privacy statement that makes the following assertions:
- It enumerates the types of information that it is collecting.
- It states that information is sent via HTTPS, which means that it is encrypted.
- It states that no information is sent to Adobe on ebooks that do not have DRM applied to them.
- It may collect and send information about ebooks that do have DRM.
It’s good to test such claims, so I upgraded to ADE 4.0.1 on my Windows 7 machine and my OS X laptop.
First, I did a quick check of strings in the ADE program itself — and found that it contained an instance of “https://adelogs.adobe.com/” rather than “http://adelogs.adobe.com/”. That was a good indication that ADE 4.0.1 was in fact going to use HTTPS to send ebook reading data to that server.
Next, I fired up Wireshark and started ADE. Each time it started, it contacted a server called adeactivate.adobe.com, presumably to verify that the DRM authorization was in good shape. I then opened and flipped through several ebooks that were already present in the ADE library, including one DRM ebook I had checked out from my local library.
So far, it didn’t send anything to adelogs.adobe.com. I then checked out another DRM ebook from the library (in this case, Seattle Public Library and its OverDrive subscription) and flipped through it. As it happens, it still didn’t send anything to Adobe’s logging server.
Finally, I used ADE to fulfill a DRM ePub download from Kobo. This time, after flipping through the book, it did send data to the logging server. I can confirm that it was sent using HTTPS, meaning that the contents of the message were encrypted.
To sum up, ADE 4.0.1’s behavior is consistent with Adobe’s claims – the data is no longer sent in the clear and a message was sent to the logging server only when I opened a new commercial DRM ePub. However, without decrypting the contents of that message, I cannot verify that it only information about that ebook from Kobo.
But even then… why should Adobe be logging that information about the Kobo book? I’m not aware that Kobo is doing anything fancy that requires knowledge of how many pages I read from a book I purchased from them but did not open in the Kobo native app. Have they actually asked Adobe to collect that information for them?
Another open question: why did opening the library ebook in ADE not trigger a message to the logging server? Is it because the fulfillmentType specified in the .acsm file was “loan” rather than “buy”? More clarity on exactly when ADE sends reading progress to its logging server would be good.
Finally, if we take the privacy statement at its word, ADE is not implementing a page synchronization feature as some, including myself, have speculated – at least not yet. Instead, Adobe is gathering this data to “share anonymous aggregated information with eBook providers to enable billing under the applicable pricing model”. However, another sentence in the statement is… interesting:
While some publishers and distributors may charge libraries and resellers for 30 days from the date of the download, others may follow a metered pricing model and charge them for the actual time you read the eBook.
In other words, if any libraries are using an ebook lending service that does have such a metered pricing model, and if ADE is sending reading progress information to an Adobe server for such ebooks, that seems like a violation of reader privacy. Even though the data is now encrypted, if an Adobe ID is used to authorize ADE, Adobe itself has personally identifying information about the library patron and what they’re reading.
Adobe appears to have closed a hole – but there are still important questions left open. Librarians need to continue pushing on this.
DuraSpace News: Evolving Role of VIVO in Research and Scholarly Networks Presented at the Thomson Reuters CONVERISTM Global User Group Meeting
Winchester, MA Thomson Reuters hosted a CONVERIS Global User Group Meeting for current and prospective users in Hatton Garden, London, on October 1-2, 2014. About 40 attendees from the UK, Sweden, the Netherlands, European Institutions from other countries, and the University of Botswana met to discuss issues pertaining to Research Information Management Systems, the CONVERIS Roadmap, research analytics, and new features and functions being provided by CONVERIS (http://converis5.com).
HangingTogether: Notes from the DC-2014 Pre-conference workshop “Fonds & Bonds: Archival Metadata, Tools, and Identity Management”
Earlier this month I had the good fortune to attend the “Fonds & Bonds” one-day workshop, just ahead of the DC-2014 meeting in Austin, TX. The workshop was held at the Harry Ransom Center of the University of Texas, Austin, which was just the right venue. Eric Childress from OCLC Research and Ryan Hildebrand from the Harry Ransom Center did much of the logistical work, while my OCLC Research colleague Jen Schaffner worked with Daniel Pitti of the Institute for Advanced Technology in the Humanities, University of Virginia and Julianna Barrera-Gomez of the University of Texas at San Antonio to organize the workshop agenda and presentations.
Here are some brief notes on a few of the presentations that made a particular impression on me.
The introduction by Gavan McCarthy (Director of the eScholarship Research Centre (eSRC), University of Melbourne) and Daniel Pitti to the Expert Group on Archival Description (EGAD) included a brief tour of standards development, how this led to the formation of EGAD, and noted EGAD’s efforts to develop the conceptual model for Records in Context (RIC). Daniel very ably set this work within its standards-development context, which was a great way to help focus the discussion on the specific goals of EGAD.
Valentine Charles (of Europeana) and Kerstin Arnold (from the ArchivesPortal Europe APEx project) provided a very good tandem presentation on “Archival Hierarchy and the Europeana Data Model”, with Kerstin highlighting the work of Archives Portal Europe and the APEx project. It was both reaffirming and challenging to hear that it’s difficult to get developers to understand an unexpected data model, when they confront it through a SPARQL endpoint or through APIs. We’ve experienced that in our work as well, and continue to spend considerable efforts in attempting to meet the challenge.
Tim Thompson (Princeton University Library) and Mairelys Lemus-Rojas (University of Miama Libraries) gave an overview of the Remixing Archival Metadata Project (RAMP) project, which was also presented in an OCLC webinar earlier this year. RAMP is “a lightweight web-based editing tool that is intended to let users do two things: (1) generate enhanced authority records for creators of archival collections and (2) publish the content of those records as Wikipedia pages.” RAMP utilizes both VIAF and OCLC Research’s WorldCat Identities as it reconciles and enhances names for people and organizations.
Ethan Gruber (American Numismatic Society) gave an overview of the xEAC project (Ethan pronounces xEAC as “zeek”), which he also presented in the OCLC webinar noted previously in which Tim presented RAMP. xEAC is an open-source XForms-based application for creating and managing EAC-CPF collections. Ethan is terrific at delving deeply into the possibilities of the technology at hand, and making the complex appear straight-forward.
Gavan McCarthy gave a quite moving presentation on the Find & Connect project, where we were able to see some of the previously-discussed descriptive standards and technologies resulting in something with real impact on real lives. Find & Connect is a resource for Forgotten Australians, former child migrants and others interested in the history of child welfare in Australia.
And Daniel Pitti gave a detailed presentation on the SNAC project. OCLC Research has supported this project from its early stages, providing access to NACO and VIAF authority data, and supplying the project with over 2M WorldCat records representing items and collections held by archival institutions … essentially the same data that supports most of OCLC Research’s ArchiveGrid project. The aspirations for the SNAC project are changing, moving from an experimental first phase where data from various sources was ingested, converted, and enriched to produce EAC-CPF records (with a prototype discovery layer on top of those), to the planning for a Cooperative Program which would transform that infrastructure into a sustainable international cooperative hosted by the U.S. National Archives and Records Administration. This is an ambitious and important effort that everyone in the community should be following.
The workshop was very well attended and richly informative. It provided a great way to quickly catch up on key developments and trends in the field. And the opportunity to easily network with colleagues in a congenial setting, including an hour to see a variety of systems demonstrated live, was also clearly appreciated.About Bruce WashburnMail | Web | Twitter | Facebook | LinkedIn | Google+ | Flickr | More Posts (10)
Charlie Reisinger from the Penn Manor School District talked to us next about open source at his school. This was an expanded version of his lightning talk from the other night.
Penn Manor has 9 IT team members – which is a very lean staff for 4500 devices. They also do a lot of their technology in house.
Before we talk about open source we took a tangent in to the nature of education today. School districts are so stuck on the model they’re using and have used for centuries. But today kids can learn anything they would like with a simple connection to the Internet. You can be connected to the most brilliant minds that you’d like. Teachers are no longer the fountains of all knowledge. The classroom hasn’t been transformed by technology – if you walked in to a classroom 60 years ago it would look pretty much like a classroom today.
In schools that do allow students to have laptops they lock them down. This is a terrible model for student inquiry. The reason most of us are here today is because we had a system growing up that we could get in to and try to break/fix/hack.
This came to them partially out of fiscal necessity. When Apple discontinued the white macbook the school was stuck in a situation where they needed to replace these laptops with some sort of affordable device. Using data they collected from the students laptops they found that students spent most of their time on their laptops in the browser or in a word processor so they decided to install Linux on laptops. Ubuntu was the choice because the state level testing would work on that operating systems.
This worked in elementary, but they needed to scale it up to the high schools which was much harder because each course needed different/specific software. They needed to decide if they could provide a laptop for every student.
The real guiding force in decided to provide one laptop per student was the English department. They said that they needed the best writing device that could be given to them. This knocked out the possibility of giving tablets to all students – instead a laptop allows for this need. Not only did they give all students laptops with Linux installed – they gave them all root access. This required trust! They created policies and told the students they trusted them to use the laptops as responsible learners. How’s that working out? Charlie has had 0 discipline issues associated with that. Now, if they get in to a jam where they screwed up the computer – maybe this isn’t such a bad thing because now they have to learn to fix their mistake.
They started this as a pilot program for 90 of their online students before deploying to all 1700 students. These computers include not just productivity software, but Steam! That got the kids attention. When they deployed to everyone though, Steam came off the computers, but the kids knew it was possible so it forced them to figure out how to install it on Linux which is not always self explanatory. This prodded the kids in to learning.
Charlie mentioned that he probably couldn’t have done this 5 years ago because the apps that are available today are so dense and so rich.
There was also the issue of training the staff on the change in software, but also in having all the kids with laptops. This included some training of the parents as well.
Along with the program they created a help desk program as a 4 credit honors level course as independent study for the high school students. They spent the whole time supporting the one to one program (one laptop per student). These students helped with the unpacking, inventorying, and the imaging (github.com/pennmanor/FLDT built by one of the students) of the laptops over 2 days. The key to the program is that the students were treated as equals. This program was was picked up and talked about on Linux.com.
Charlie’s favorite moment of the whole program was watching his students train their peers on how to use these laptops.
- ATO2014: Open Source Schools: More Soup, Less Nuts
- ATO2014: Women in Open Source
- ATO2014: The first FOSS Minor at RIT
Today I found the following resources and bookmarked them on <a href=
- Material Design Icons Material Design Icons are the official open-source icons featured in the Google Material Design specification.
- SmartThings Control and monitor your home from one simple app
Digest powered by RSS Digest
Too many people ask what is the future of libraries and not what “should the future be”. A book that we must read is “Expect More: Demanding Better Libraries For Today’s Complex World“. If we don’t expect more of libraries we’re not going to see libraries change. We have to change the frame of mind that libraries belong the directors – they actually belong to the people and they should be serving the people.
Phil asks how we get some community participate in managing libraries. Start looking at your library’s collection and see if there is at least 1% of the collection in the STEM arena. Should that percent be more? 5%, 10%, more? There is no real answer here, but maybe we need to make a suggestion to our libraries. Maybe instead our funds should go to empower the community more in the technology arena. Maybe we should have co-working space in our library – this can be fee based even – could be something like $30/mo. That would be a way for libraries to help the unemployed and the community as a whole.
Libraries are about so much more than books. People head to the library because they’re wondering about something – so having people who have practical skills on your staff is invaluable. Instead of pointing people to the books on the topic, having someone for them to talk to is a value added service. What are our competitors going to be doing while we’re waiting for the transition from analog to digital to happen in libraries. We need to set some milestones for all libraries. Right now it’s only the wealthy libraries that seem to be moving in this way.
A lot of the suggestions Phil had I’ve seen some of the bigger libraries in the US doing like hosting TED Talks, offering digital issues lectures, etc. You could also invite kids in there to talk about what they know/have learned.
Phil’s quote: “The library fulfills its promise when people of different ages, races, and cultures come together to pool their talents in creating new creative content.” One thing to think about is whether this change from analog to digital can happen in libraries without changing their names. Instead we could call them the digital commons [I'm not sure this is necessary - I see Phil's point - but I think we need to just rebrand libraries and market them properly and keep their name.]
Some awesome libraries include Chattanooga Public Library which has their 4th floor makerspace. In Colorado there are the Anythink Libraries. The Delaware Department of Libraries is creating a new makerspace.
Books are just one of the tools toward helping libraries enhance human dignity – there are so many other ways we can do this.
Phil showed us a video of his:
You can bend the universe by asking questions – so call your library and ask questions about open source or about new technologies so that we plant the seeds of change.
Further reading from Phil: http://sites.google.com/site/librarywritings.
- ATO2014: Open source, marketing and using the press
- ATO2014: How ‘Open’ Changes Products
- ATO2014: Open Source – The Key Component of Modern Applications
Next up at All Things Open was Karen Borchert talking about How ‘Open’ Changes Products.
We started by talking about the open product conundrum. There is a thing that happens when we think about creating products in an open world. In order to understand this we must first understand what a product is. A product is a good, idea, method, information or service that we want to distribute. In open source we think differently about this. We think more about tools and toolkits instead of packages products because these things are more conducive to contribution and extension. With ‘open’ products work a bit more like Ikea – you have all the right pieces and instructions but you have to make something out of it – a table or chair or whatever. Ikea products are toolkits to make things. When we’re talking about software most buyers are thinking what they get out of the box so a toolkit is not a product to our consumers.
Open Atrium is a product that Phase2 produces and people say a lot about it like “It’s an intranet in a box” – but in reality it’s a toolkit. People use it a lot of different ways – some do what you’d expect them to do, others make it completely different. This is the great thing about open source – this causes a problem for us though in open source – because in Karen’s example a table != a bike. “The very thing that makes open source awesome is what makes our product hard to define.”
Defining a product in the open arena is simple – “Making an open source product is about doing what’s needed to start solving a customer problem on day 1.” Why are we even going down this road? Why are we creating products? Making something that is useable out of the box is what people are demanding. They also provide a different opportunity for revenue and profit.
This comes down to three things:
- Understanding the value
- Understanding the market
- Understanding your business model
Adding value to open source is having something that someone who knows better than me put together. If you have an apple you have all you need to grow your own apples, but you’re not going to both to do that. You’d rather (or most people would rather) leave that to the expert – the farmer. Just because anyone can take the toolkit and build whatever they want with it that they will.
Markets are hard for us in open source because we have two markets – one that gives the product credibility and one that makes money – and often these aren’t the same market. Most of the time the community isn’t paying you for the product – they are usually other developers or people using it to sell to their clients. You need this market because you do benefit from it even if it’s not financially. You also need to work about the people who will pay you for the product and services. You have to invest in both markets to help your product succeed.
Business models include the ability to have two licenses – two versions of the product. There is a model around paid plugins or themes to enhance a product. And sometimes you see services built around the product. These are not all of the business models, but they are a few of the options. People buy many things in open products: themes, hosting, training, content, etc.
What about services? Services can be really important in any business model. You don’t have to deliver a completely custom set of services every time you deliver. It’s not less of a product because it’s centered around services.Questions people ask?
Is it going to be expensive to deal with an open source product? Not necessarily but it’s not going to be free. We need to plan and budget properly and invest properly.
Am I going to make money on my product this year? Maybe – but you shouldn’t count on it. Don’t bet the farm on your product business until you’ve tested the market.
Everyone charges $10/mo for this so I’m just going to charge that – is that cool? Nope! You need to charge what the product is worth and what people will pay for it and what you can afford to sell it for. Think about your ROI.
I’m not sure we want to be a products company. It’s very hard to be a product company without buy in. A lot of service companies ask this. Consider instead a pilot program and set a budget to test out this new model. Write a business plan.
- ATO2014: Using Bootstrap to create a common UI across products
- ATO2014: Open source, marketing and using the press
- ATO2014: Saving the world: Open source and open science
Over lunch today we had a panel of 6 women in open source talk to us.
The first question was about their earlier days – what made them interested in open source or computer science or all of it.Intros
Megan started in humanities and then just stumbled in to computer programming. Once she got in to it she really enjoyed it though. Elizabeth got involved with Linux through a boyfriend early on. She really fell in love with Linux because she was able to do anything she wanted with it. She joined the local Linux users group and they were really supportive and never really made a big deal about the fact that she was a woman. Her first task in the open source world was writing documentation (which was really hard) but from there her career grew. Erica has been involved in technology all her life (which she blames her brother for). When she went to school, she wanted to be creative and study arts, but her father gave her the real life speech and she realized that computer programming let her be creative and practical at the same time. Estelle started by studying architecture which was more sexist than her computer science program – toward the end of her college career she found that she was teaching people to use their computers. Karen was always the geekiest person she knew growing up – and her father really encouraged her. She went to engineering school and it wasn’t until she set up her Unix account at the college computer center. She got passionate in open source because of the pacemaker she needs to live – she realized that the entire system is completely proprietary and started thinking about the implications of that.The career path
Estelle has noticed in the open source world that the men she knows on her level work for big corporations where as the women are working for themselves. This was because there aren’t as many options to move up the ladder. Now as for why she picked the career she picked it was because her parents were sexist and she wanted to piss them off! Elizabeth noticed that a lot of women get involved in open source because they’re recruited in to a volunteer organization. She also notices that more women are being paid to work on open source whereas men are doing it for fun more. Megan had never been interviewed by or worked for a woman until she joined academia. Erica noticed that the career path of women she has met is more convoluted than that of the men she has met. The men take computer science classes and then go in to the field, women however didn’t always know that these opportunities were available to them originally. Karen sees that women who are junior have to work a lot harder – they have to justify their work more often [this is something I totally had to deal with in the past]. Women in these fields get so tired because it’s so much work – so they move on to do something else. Erica says this is partially why she has gone to work for herself because she gets to push forward her own ideas. Megan says that there are a lot of factors that are involved in this problem – it’s not just one thing.Is diversity important in technology?
Erica feels that if you’re building software for people you need ‘people’ not just one type of person working on the project. Megan says that a variety perspectives is necessary. Estelle says that because women often follow a different path to technology it adds even more diversity than just gender [I for example got in to the field because of my literature degree and the fact that I could write content for the website]. It’s also important to note that diversity isn’t just about gender – but so much more. Karen pointed out that even at 20 months old we’re teaching girls and boys differently – we start teaching boys math and problem solving earlier and we help the girls for longer. This reinforces the gender roles we see today. Elizabeth feels that diversity is needed to engage more talent in general.What can we do to change the tide?
Megan likes to provide a variety in the types of problems she provides in her classes, with a variety of approaches so that it hits a variety of students instead of alienating those who don’t learn the way she’s teaching. Karen wants us to help women from being overlooked. When a woman make a suggestion acknowledge it – also stop people from interrupting women (because we are interrupted more). Don’t just repeat what the woman says but amplify it. Estelle brings up an example from SurveyMonkey – they have a mentorship program and also offer you to take off when you need to (very good for parents). Erica tries to get to youth before the preconceptions form that technology is for boys. One of the things she noticed was that language matters as well – telling girls you’re going to teach them to code turns them off, but saying we’re going to create apps gets them excited. Elizabeth echoed the language issue – a lot of the job ads are geared toward men as well. Editing your job ads will actually attract more women.What have you done in your career that you’re most proud of?
Estelle’s example is not related to technology – it was an organization called POWER that was meant to help students who were very likely to have a child before graduation – graduate without before becoming a parent. It didn’t matter what what field they went in to – just that the finished high school. Erica is proud that she has a background that lets her mentor so many people. Elizabeth wrote a book! It was on her bucket list and now she has a second book in the works. It was something she never thought she could do and she did. She also said that it feels great to be a mentor to other women. Megan is just super proud of her students and watching them grow up and get jobs and be successful. Karen is mostly proud of the fact that she was able to turn something that was so scary (her heart condition) in to a way to articulate that free software is so important. She loves hearing others tell her story to other people to explain why freedom in software is so important.
- ATO2014: Women in Open Source
- ATO2014: Open Source – The Key Component of Modern Applications
- ATO2014: Building a premier storytelling platform on open source
This post is part of our Open Access Week blog series to highlight great work in Open Access communities around the world. It is written by Alma Swan, Director of Key Perspectives Ltd, Director of Advocacy forSPARC Europe, and Convenor for Enabling Open Scholarship.
Whither the humanities in a world moving inexorably to open values in research? There has been much discussion and debate on this issue of late. It has tended to focus on two matters – the sustainability of humanities journals and the problem(s) of the monograph. Neither of these things is a novel topic for consideration or discussion, but nor have solutions been found that are satisfactory to all the key stakeholders, so the debate goes on.
While it does, some significant developments have been happening, not behind the scenes as such but in a quiet way nevertheless. New publishers are emerging in the humanities that are offering different ways of doing things and demonstrating that Open Access and the humanities are not mutually exclusive.
These publishers are scholar-led or are academy-based (university presses or similar). Their mission is to offer dissemination channels that are Open, viable and sustainable. They don’t frighten the horses in terms of trying to change too much, too fast: they have left the traditional models of peer review practice and the traditional shape and form of outputs in place. But they are quietly and competently providing Open Access to humanities research. What’s more, they understand the concerns, fears and some bewilderment of humanities scholars trying to sort out what the imperative for Open Access means to them and how to go about playing their part. They understand because they are of and from the humanities community themselves.
The debate about OA within this community has been particularly vociferous in the UK in the wake of the contentious Finch Report and the policy of the UK’s Research Councils. Fortuitously, the UK is blessed with some great innovators in the humanities, and many of the new publishing operations are also UK-based. This offers a great opportunity to show off these some new initiatives and help to reassure UK humanities authors at the same time. So SPARC Europe, with funding support from the Open Society Foundations, is now endeavouring to bring these new publishers together with members of the UK’s humanities community.
We are hosting a Roadshow comprising six separate events in different cities round England and Scotland. At each event there are short presentations by representatives of the new publishers and from a humanities scholar who can give the research practitioner perspective on Open Access. After the presentations, the publishers are available in a small exhibition area to display their publications and talk about their publishing programmes, their business models and their plans for the future.
The publishers taking part in the Roadshow are Open Book Publishers, Open Library of the Humanities, Open Humanities Press and Ubiquity Press. In addition, the two innovative initiatives OAPEN and Knowledge Unlatched are also participating. The stories from these organisations are interesting and compelling, and present a new vision of the future of publishing in the humanities.
Humanities scholars from all higher education institutions in the locality of each event are warmly invited to come along to the local Roadshow session. The cities we are visiting are Leeds, Manchester, London, Coventry, Glasgow and St Andrews. The full programme is available here.
We will assess the impact of these events and may send the Roadshow out again to new venues next year if they prove to be successful. If you cannot attend but would like further information on the publishing programmes described here, or would like to suggest other venues the Roadshow might visit, please contact me at firstname.lastname@example.org
Journal of Web Librarianship: Designing a User-Centric Web Site for Handheld Devices: Incorporating Data-Driven Decision-Making Techniques with Surveys and Usability Testing Designing a User-Centric Web Site for Handheld Devices: Incorporating Data...
The following is a guest post from Abbie Grotke, Web Archiving Team Lead, Library of Congress and Co-Chair of the NDSA Content Working Group.
The National Digital Stewardship Alliance is pleased to release a report of a 2013 survey of Web Archiving institutions (PDF) in the United States.
A bit of background: from October through November of 2013, a team of National Digital Stewardship Alliance members, led by the Content Working Group, conducted a survey of institutions in the United States that are actively involved in, or planning to start, programs to archive content from the web. This survey built upon a similar survey undertaken by the NDSA in late 2011 and published online in June of 2012. Results from the 2011-2012 NDSA Web Archiving Survey were first detailed in May 2, 2012 in “Web Archiving Arrives: Results from the NDSA Web Archiving Survey” on The Signal, and the full report (PDF) was released in July 2012.
The goal of the survey was to better understand the landscape of web archiving activities in the U.S. by investigating the organizations involved, the history and scope of their web archiving programs, the types of web content being preserved, the tools and services being used, access and discovery services being provided and overall policies related to web archiving programs. While this survey documents the current state of U.S. web archiving initiatives, comparison with the results of the 2011-2012 survey enables an analysis of emerging trends. The report therefore describes the current state of the field, tracks the evolution of the field over the last few years, and forecasts future activities and developments.
The survey consisted of twenty-seven questions (PDF) organized around five distinct topic areas: background information about the respondent’s organization; details regarding the current state of their web archiving program; tools and services used by their program; access and discovery systems and approaches; and program policies involving capture, availability and types of web content. The survey was started 109 times and completed 92 times for an 84% completion rate. The 92 completed responses represented an increase of 19% in the number of respondents compared with the 77 completed responses for the 2011 survey.
Overall, the survey results suggest that web archiving programs nationally are both maturing and converging on common sets of practices. The results highlight challenges and opportunities that are, or could be, important areas of focus for the web archiving community, such as opportunities for more collaborative web archiving projects. We learned that respondents are highly focused on the data volume associated with their web archiving activity and its implications on cost and the usage of their web archives.
Based on the results of the survey, cost modeling, more efficient data capture, storage de-duplication, and anything that promotes web archive usage and/or measurement would be worthwhile investments by the community. Unsurprisingly, respondents continue to be most concerned about their ability to archive social media, databases and video. The research, development and technical experimentation necessary to advance the archiving tools on these fronts will not come from the majority of web archiving organizations with their fractional staff time commitments; this seems like a key area of investment for external service providers.
We hope you find the full report interesting and useful, whether you are just starting out developing a web archiving program, have been active in this area for years, or are just interested in learning more about the state of web archiving in the United States.
Steven Vaughan-Nichols was up to talk to us about open source, marketing and using the press.
Before Steven was a journalist he was a techie. This makes him unusual as a journalist who actually gets technology. Steven is here to tell us that marketing is a big part of your job if you want a successful open source company. He has heard a lot of people saying that marketing isn’t necessary anymore. The reason it’s necessary is because writing great code is not enough – if no one else knows about it it doesn’t matter. You need to talk with people about the project to make it a success.
We like to talk about open source being a meritocracy – that’s not 100% true – the meritocracy is the ideal or a convenient fiction. The meritocracy is only part of the story – it’s not just about your programming it’s about getting the right words to the right people so that they know about your project. You need marketing for this reason.
Any successful project needs 2 things – 1 you already know – is that it solves a problem that needs a solution – the other part is that it must be able to convince a significant number of people that your project is the solution to their problem. One problem open source has is that they confuse open source with the community – they are not the same thing. Marketing is getting info about your project to the world. The community is used for defining what the project really is.
Peter Drucker, says “The aim of marketing is to know and understand the customer so well the product or service fits him and sells itself.” Knowing the customer better than they know themselves is not an easy job – but it’s necessary to market/sell your product/service. If your project doesn’t fit the needs of your audience then it won’t go anywhere.
David Packard: “Marketing is too important to be left to the marketing department” – and it really is. There is a tendency to see marketing as a separate thing. Marketing should not be a separate thing – it should be honest about what you do and it should be the process of getting that message to the world. Each person who works on the project (or for the company) is a representative of your product – we are always presenting out product to the world (you might not like it – but it’s true). If your name is attached to a project/company then people are going to be watching you. You need to avoid zinging competing products and portray a positive image about you and your product. Even if you’re not thinking about what you’re saying as marketing it is.
Branding is another thing that open source projects don’t always think this through enough – they think this is trivial. Branding actually does matter! What images and words and name you use to describe your product matter. These will become the shorthand that people see your project as. For example if you see the Apply logo you know what it’s about. In our world of open source there is the Red Hat shadow man – whenever you see that image you know that means Red Hat and all the associations you have with that. You can use that association in your marketing. People might not know what Firefox is (yes there are people who don’t know) but they do recognize the cute little logo.
You can no longer talk just on IRC or online, you have to get out there. You need to go to conferences and make speeches and get the word out to people. And always remember to invite people to participate because this is open source. You have to make an active network and get away from the keyboard and talk to people to get the word out there. At this point you need to start thinking about talking to people from the press.
One thing to say to people, to the press, is a statement that will catch on – a catch phrase that will reach the audience you want to reach. The press are the people to talk to the world at large. These are people who are talking to the broader world – talking to people at opensource.com and other tech sites is great – but if you want to make the next leap you need to get to these type of people. Don’t assume that the press you’re talking to don’t know what you’re talking about – but just because they happen to like open source or what you’re talking about – it does not mean that they will write only positive things. The press are critics – they’re not really on your side – even if they like you they won’t just talk your products up. You need to understand that going in.
Having said all that – you do need to talk to the press at some point. And when you do, you need to be aware of a few things. Never ever call the press – they are always on perpetual deadline – you can’t go wrong with email though. When you do send an email be sure to remember to cover a few important things: tell then what you’re doing, tell them what’s new (they don’t care that you have a new employee – they might care if a bigwig quits or is fired), get your message straight (if you don’t know what you’re doing then the press can’t figure it out), and hit it fast (tell them in the first line what you’re doing, who your audience is and why the world should care). Be sure to give the name of someone they can call and email for more info – this can’t be emphasized enough – so often Steven has gotten press releases without contact info on them. Put the info on your website – make sure that there is always a contact in your company for the press. Remember if your project is pretty to send screenshots – this will save the press a lot of time in installing and getting the right images. Steven says “You need to spoon feed us”.
You also want to be sure to know what the press person you’re contacting writes about – do your homework – don’t contact them with your press release if it’s not something they write about. Also be sure to speak in a language that the person you’re talking to will understand [I know I always shy away from OPAC and ILS when talking to the press]. Not everyone you’re talking to has experience in technology. Don’t talk down to the press, just be sure to talk to the person in words they understand. Very carefully craft your message – be sure to give people context and tell them why they should care – if you can’t tell them that there they can’t tell anyone else your story.
Final points – remember to be sweet and charming when talking to the press. When they say something that bothers you, don’t insult the press. If you alienate the press they will remember. In the end the press has more ink/pixels than you do – their words will have a longer reach than you do. If the press completely misrepresents you be sure to send a polite note to the person explaining what was wrong – without using the word ‘wrong’. Be firm, but be polite.
The post ATO2014: Open source, marketing and using the press appeared first on What I Learned Today....
- ATO2014: Open Source at Facebook
- ATO2014: Open Source – The Key Component of Modern Applications
- ATO2014: Women in Open Source
A BLOB is a Binary Large OBject. Each type of BLOB contains a single type of immutable binary content, such as photos, videos, documents, etc. Section 3 of the paper is a detailed discussion of the behavior of BLOBs of different kinds in Facebook's storage system.
Figure 3 shows that the rate of I/O requests to BLOBs drops rapidly through time. The rates for different types of BLOB drop differently, but all 9 types have dropped by 2 orders of magnitude within 8 months, and all but 1 (profile photos) have dropped by an order of magnitude within the first week.
The vast majority of Facebook's BLOBs are warm, as shown in Figure 5 - notice the scale goes from 80-100%. Thus the vast majority of the BLOBs generate I/O rates at least 2 orders of magnitude less than recently generated BLOBs.
In my talk to the 2012 Library of Congress Storage Architecture meeting I noted the start of an interesting evolution:a good deal of previous meetings was a dialog of the deaf. People doing preservation said "what I care about is the cost of storing data for the long term". Vendors said "look at how fast my shiny new hardware can access your data". ... The interesting thing at this meeting is that even vendors are talking about the cost.This year's meeting was much more cost-focused. The Facebook data make two really strong cases in this direction:
- That significant kinds of data should be moved from expensive, high-performance hot storage to cheaper warm and then cold storage as rapidly as feasible.
- That the I/O rate that warm storage should be designed to sustain is so different from that of hot storage, at least 2 and often many more orders of magnitude, that attempting to re-use hot storage technology for warm and even worse for cold storage is futile.
Haystack uses RAID-6 and replicates data across three data centers, using 3.6 times as much storage as the raw data. f4 uses two fault-tolerance techniques:
- Within a data center it uses erasure coding with 10 data blocks and 4 parity blocks. Careful layout of the blocks ensures that the data is resilient to drive, host and rack failures at an effective replication factor of 1.4.
- Between data centers it uses XOR coding. Each block is paired with a different block in another data center, and the XOR of the two blocks stored in a third. If any one of the three data centers fails, both paired blocks can be restored from the other two.
Another point worth noting that the paper makes relates to heterogeneity as a way of avoiding correlated failures:
We recently learned about the importance of heterogeneity in the underlying hardware for f4 when a crop of disks started failing at a higher rate than normal. In addition, one of our regions experienced higher than average temperatures that exacerbated the failure rate of the bad disks. This combination of bad disks and high temperatures resulted in an increase from the normal ~1% AFR to an AFR over 60% for a period of weeks. Fortunately, the high-failure-rate disks were constrained to a single cell and there was no data loss because the buddy and XOR blocks were in other cells with lower temperatures that were unaffected.
DeLisa Alexander from Red Hat was up next to talk to us about women in open source.
How many of you knew that the first computer – the ENIAC was programmed by women mathematicians? DeLisa is here to share with us a passion for open source and transparency – and something similarly important – diversity.
Why does diversity matter? Throughout history we have been able to innovate our way out of all kinds of problems. In the future we’re going to have to do this faster than ever before. Diversity of thoughts, theories and views is critical to this process. It’s not just “good” to think about diversity, it’s important to innovation and for solving problems for quickly.
Why are we having so much trouble finding talent? 47% of the workforce is made up of women but only 12% are getting computer and information science degrees – and only 1-5% of open source contributors are women. How much faster could we solve the world’s big problems with the other 1/2 of the population were participating? We need to be part of this process.
When you meet a woman who is successful in technology – there is usually one person who mentored her (man or woman) to feel positive about her path – we could be that voice for a girl or woman that we know. Another thing that we can do is help our kids understand what is going on and what opportunities there are. Kids today don’t think about the fact that the games they’re playing were developed by a human – they just think that computers magically have software on them. They have no clue that someone had to design the hardware and program the software [I actually had someone ask me once what 'software' was - the hardest question I've ever had to answer!].
The challenge for us is to decide on one person that we’re going to try and influence to stay in the field, join the field, nominate for an award. If each of us do this one thing, next year this room could be filled with 50% women.
- ATO2014: Building a premier storytelling platform on open source
- ATO2014: Easing into open source
- ATO2014: Open Source Schools: More Soup, Less Nuts
For a long time the only free (i'm unaware of commercial ones) implementation of a web archival replay software has been the Wayback Machine (now Openwayback). It's a stable and mature software, with a strong community behind.
To use it you need to be confident with the deploy of a java web application; not so difficult, and documentation is exaustive.
But there is a new player in the game, pywb, developed by Ilya Kramer, a former Internet Archive developer.
Built in python, relatively simpler than wayback, and now used in a pro archiving project at Rhizome.
Winchester, MA DSpaceDirect (http://dspacedirect.org) is a hosted repository solution for low-cost discovery, access, archiving, and preservation.