Aaron Swartz Memorial

aaronswThe Internet community has been reeling for the past week as it grapples with the suicide of Aaron Swartz, a prominent digital rights activist who left a remarkable legacy for a 26-year old. Swartz’s contributions are used by millions of people every day as he played a key role in developing the specifications for RSS (which makes it easy to syndicate online content), Creative Commons licences (which makes is easy to make creative works freely available), and the popular website Reddit.

While much of the immediate focus has centered on mental health issues, draconian computer crime laws, and the bewildering prosecution of Swartz for downloading millions of academic articles – a U.S. prosecutor was seeking as much as 35 years in jail despite the fact that Swartz did not benefit from the downloads and the source of the articles did not want to pursue legal action – the more notable legacy was his effort to make information more openly and freely available.

Swartz aggressively pursued initiatives to increase the availability of information, particularly scientific and legal documents. His efforts were controversial, yet they point to mounting expectations that public information (or information funded by the public) be made easily accessible.

In recent months, there have been some important developments in Canada in furthering Swartz’s vision.  The Canadian Institutes of Health Research, the federal government’s health research funding agency, recently launched a new open access policy that requires funded researchers to make their peer-reviewed publications freely available within 12 months of publication. Given the millions of tax dollars invested in CIHR research annually, the mandatory open access policy should ensure that the public has access to the cutting-edge health research it has helped fund.

Open availability of legal materials is also fast becoming the standard in Canada. The Canadian Legal Information Institute (CanLII), which is Canada’s leading source of free legal materials, now houses more than one million court judgments as well as tens of thousands of legislative documents from all Canadian provincial, territorial and federal jurisdictions. Canadian lawyers, who pay an annual fee to maintain the site as part of their dues, provide the financial support to ensure that CanLII is sustainable (I am a CanLII board member).

After years of limited progress, digitization efforts in Canada are also beginning to bear fruit.  The Internet Archive Canada, which has teamed with academic libraries across Canada, recently announced that it has digitized over 400,000 texts in Canada. Although relatively unknown, the Internet Archive Canada now boasts the largest online collection of Canadian public domain materials since virtually all copyright-expired books in the University of Toronto library have been digitized and are freely accessible and downloadable.

The question facing many Canadian institutions is what comes next.  For academic research, the CIHR open access policy should be emulated by the other federally funded granting institutions so that all taxpayer-funded research features a requirement that the resulting publications be made openly available to the public within months of publication.

Access to legal materials has been a major success story, yet much more can be done. Legal publishers are beginning to make some of their texts freely available and law schools may soon rely on free, online cases as their primary source for legal casebooks. As those materials gravitate online, the pressure is likely to mount to ensure all Canadian cases and statutes are freely available, thereby granting the public full access to the law.

Yet perhaps the biggest step may come as part of efforts to move digitization efforts beyond public domain works toward the creation of a national digital library featuring millions of Canadian titles. Such an initiative would undoubtedly face implementation challenges with respect to copyright (it would likely rely on the newly expanded fair dealing laws), but the vision of universal open access in Canada seems increasingly possible and is consistent with the vision to which Aaron Swartz dedicated his life.

The words are from respected Canadian law professor Michael Geist.

But they underscore  the legacy of  Aaron Swartz, a young man  who died long before his time because he wasn’t able  to live  in a world where  laws meant to protect the people condemn them.

Below is an item from the  Internet Archive blogs:

Bulk Downloading, Aaron Swartz, and Terms of Service

[Aaron Swartz worked for and with the Internet Archive for years.]

Aaron was threatened with 35 years in prison for being accused of something my library actively encourages: bulk downloading of library collections.    Some are calling it “hacking”, which is a problematic distortion of the term in the first place.(1)   It might be time to break down some of what is currently going on in scholarly research as it relates to datamining, bulk downloading, and terms of service.    It makes me very sad and mad because this confusion may have that lead a library (JSTOR) to track down a user, have led MIT to call the police (and not try to call them off later), and have a US Attorney mistake this for a crime which then combined to help lead to a death of a rising star in the Internet community.

Libraries:   All libraries, including JSTOR and the Internet Archive, contains materials from lots of different people and places– some copyrighted some not.   Jim Gray called libraries “Engines of Research.”    Research, by definition, is searching– searching for new patterns and new ideas.    Libraries provide raw materials for researchers.   Fortunately in the digital world, bulk access to materials does not hurt our preservation function as rifling through pages in the past might have.

Academic publishing is changing:  Traditionally academic publications mostly came from non-profit scholarly associations and university presses.   Some organizations started to acquire and aggregate many journals into databases, organizations such as Elsevier, Wiley-Blackwell, JSTOR.   These databases were funded by academic institutions and only available to those subscribing institutions.   Further than this, academic publishing is going more “open.”  New publishers are being created to explicitly allow open-access and bulk access such as the Public Library of Science.   Their open access journals end up being cited more often and this openness explicitly allow research results using “datamining” techniques.    Many universities made all future professors’ articles open access, except when specifically requested not to.

Datamining academic research as academic research:  Datamining academic publications is popular now because modern computers make it easy and the results are novel and publishable.   This involves collecting masses of journal articles so that they can be analyzed by computer programs to find statistical patterns.   This is different from individuals reading a paper at a time.    Biology and medicine is especially helped by this, but it is now going on in humanities and law research.  Larry Lessig wrote:

While at Stanford, Swartz had worked with a law student to download all the law review articles in the Westlaw database, to map funders of research with research conclusions. The result of that research was published in the Stanford Law Review, and showed a troubling connection between funders and their conclusions. At the time of Aaron’s alleged “crime,” he was a fellow at my Center at Harvard. The work of the Center? Studying the corruption of academic research (among other institutions) caused by money.

Bulk downloading or “crawling”:   Bulk downloading is now done for various reasons, and those libraries with large collections take various positions on it and express these positions in Terms of Service and robot exclusions.   “Robots” or “crawlers” in this context are computer programs that do repetitive actions like downloading many documents from a website.  Some such users are search engines, some are backing-up materials, some doing new research such as visualizing data, some building different interfaces to the full dataset (like freebase reuse of wikipedia), or even enabling others to more easily download in bulk.    Most datasets have some sort of licenses involved, so there is some nervousness on the part of the providers to explicitly allow all bulk downloading (for instance of Amazon.com’s book catalog data which is licensed from many players), but in general people are becoming more comfortable with the re-purposing of their data as it becomes more common.

The Internet Archive is regularly crawled.   We try to make our systems strong enough to serve these loads, and sometimes try to get robots to slow down.   We get hit with spam all the time, and occasional denial of service attacks.    But we haven’t called the police– we deal with it.   As a library we try to serve as many users as we can and some of those users are robots.

Open Data is a raising trend supported by government agencies and libraries.   Open Data is bulk data that is specifically licensed for datamining, graphing, and linking to other open data.    This is the minority of databases, but it is growing in importance.   I bring this up because it shows a trend towards openness and datamining.

Terms of Service and Robots.txt files:  These mostly invisible “agreements” that are often defensive documents to protect the organization from users and suppliers.   These are regularly trodden on sometimes resulting in the providers instituting technological measures to slow down mass downloaders.    I think of most Terms of Service as like an old joke about the Soviet Union:  everything is illegal except when it is not.    It is important to note that the specifics of many Terms of Service and robot exclusion files are regularly ignored by millions of people, and enforcement is ignored by millions of organizations.   Enforcement is often very selectively applied.

Bulk Downloading, Aaron Swartz, and Terms of Service:  putting this all together means that mass downloading is often not discouraged as long as it is done slowly enough, what most concerns providers, in our experience, is what is done with the materials after they are downloaded.    Terms of Service documents are generally “CYA” documents in which it is difficult to communicate nuance– but we should recognize that violating them may not be “right”, but is common practice.  Opening up library databases, including but not limited to public domain materials, to new types of research is important especially in academia.   Most organizations are adapting to these new types of computational research opportunities but some will try to stop them.   All in all we do not have a good way to draw lines of what is acceptable practice yet– it is all evolving.   What I know of Aaron’s downloading old journal articles for later use is not outside of what many people do.  What is unusual are the reactions on the part of JSTOR, MIT, and the US prosecutors.

What I am suggesting is we need a bit more slack in the system.   We need to be able to talk things through before we turn to police and courts.   We need to leave room for a new generation of people and ideas that may alter how our institutions work.    No, more than that, we should welcome and encourage people and ideas that will alter how our institutions work.

Aaron helped many of us adapt our institutions’ services to the digital opportunities– let’s continue this important work.

Aaron Swartz Memorial at the Internet Archive, Part 1 (January 24, 2013)

Aaron Swartz

November 8, 1986 – January 11, 2013

Memorial program


danny o’brien
taren stinebrickner-kauffman
lisa rein
seth schoen
peter eckersley
tim o’reilly
molly shaffer van houweling
alex stamos
cindy cohn
brewster kahle
carl malamud

Share: Slashdot, Digg, Del.icio.us, Newsfeeder, Reddit, StumbleUpon, Twitter

Jon Newton — myblogdammit

Follow me on identi.ca

RSS FEED —  http://www.myblogdammit.net/?feed=rss2

Contact me @ myblogdammit (at) shaw.dot ca


FrostWire — Share
Big Files
Why pay to host your content in a server? FrostWire lets you share your creations with millions of people right from your computer, absolutely free?


iVideo Converter converts videos fast for iPad, iPod, iPhone, PSP, Cell Phone, Windows Media, or any major media player, including popular HD and HQ.

And >>>

A twitter game  where words where words are redefined with jokes  🙂

[Follow me on identi.ca]

RSS FEED – http://www.myblogdammit.net/?feed=rss2

First they ignore you, then they laugh at you, then they fight you, then you win ~ Mahatma Gandhi …