The Internet Archive’s Jason Scott Talks Apple II Software Preservation

The archivist and filmmaker discusses his efforts to save the past

Paleotronic had a chat with archivist and documentary film-maker Jason Scott (Get Lamp) about his efforts to preserve Apple II and other vintage computer software…

Firstly, could you give a little background on how the Internet Archive came to be a host for vintage software?

The Archive has been doing software archiving for quite some time, easily back into the early 2000s. At the time, it was to simply mirror TUCOWS, the Canadian software archive that has since become quite the powerhouse ISP and internet presence. We mirrored it, and that was…. it. There was a general set of “data” we allowed people to upload, but left it at that – you were very much on your own to understand, download, and process. It was not great.

I joined around 2011, and at that point, I was asked to give the whole place a boost, and I really, really did – absorbing everything “vintage” I could find from my own archives and archives around the net, so we jumped far up in terms of actual “old” data being online, although a whole lot was locked away in .ZIP files or CD-ROM images and so on. But I think we’re the largest all around software archive in the world, where you can download everything in our collection, at any time, with no restriction.

Secondly, why did you choose to start aggressively archiving Apple II software?

Totally my own bias. I love Apple II software, and there’s been a number of good archives, including the Asimov Archive, that have spent many years collecting Apple material together. It was a big part of my childhood, so that helps. The fact it comes in units of very small (140k) disk images also makes it easier to work with, and the resulting software has a simple and crafted beauty I enjoy. Connecting with the KansasFest community made things even more enjoyable, since there was a vibrant, caring and smart group of people involved with the software as well.

Again, thanks to the efforts of people long before myself, there were thousands of Apple II disk images floating around, so putting them on the Archive was a no-brainer.

The Disk II allowed for a large variety of copy protection schemes. How has this been a hindrance to archiving Apple II software?

Well, 4am discusses the role of copy protection in preservation better than I could ever hope to. The main issue is that in the beginning of the Apple II software industry, people could write diabolically involved protection schemes, which pirates then had to return fire utilizing hardware and software trickery of an even higher order.

The result is that unless you were prepared to truly sink into that knowledge, a piece of software that wasn’t inherently sexy would blockade itself from all sides, right into ensured oblivion. Looking at the huge-level amount of technical effort expended to create Passport, and seeing the wide range of software titles coming to light in the wake of this effort, it’s obvious that a lot of Apple II history was hanging in the balance between remembered and forgotten.

A great example are Educational titles, which simply lacked the cachet to be traded, and games, which got cracked quick and dirty and turn out, in many cases, to have never been truly preserved because people would see a title “out there”, watch it boot up a title screen (or just the game itself) and go “Well, that’s forever.”

Close levels of scrutiny have been finding dozens, maybe hundreds of these oversights, and watching them backfill over the past 5 years has really hinted at how much more might be down there.

Although you would obviously prefer “clean cracks” of software (which haven’t been altered by the person who de-protected them) is there a place for “cracked” disks? Do they have their own historical significance?

I like having both, frankly. Or all three – I would include ways to have completely untouched disk images online as well. There’s lots of room on the Archive for all approaches, and they each have their character, uniqueness, and ubiquity. The piracy-oriented cracks of the 1980s were chances to crow about technical superiority in a very tiny but very interesting battle, while the clean cracks allow easier transfer of the software in the modern era. The full protected images will be of a different but important use to historians as well.

You realised at some point that having in-browser emulation would encourage IA visitors to try out the software. What were the challenges in getting emulation to work?

Brewster Kahle, the founder of Internet Archive, hired me and set me on the “software is locked up” problem almost immediately. He wanted the ability to “play” software like you can “play” movies, music, books and so on at the Archive. With a lot of media hosted by the organization, you can simply click on a button and begin reading, listening and so on. He wanted that for software and kind of left it up to me how to do it, with some thoughts on how it might be possible.

His suggested idea was what other organizations have done, which is set up servers that are playing really good emulations, and then let users access these servers via their browsers running what could be described as “remote access” software. This gets things going very quickly, and it works nicely, but it doesn’t easily scale, especially when you are running things close to the bone, as the Internet Archive prefers to.

I proposed working with some volunteers to port prominent emulators to run inside browsers using Javascript, so that every single user would be providing the emulator for the software on their own machine. This suggestion, which Brewster thought was insane, took a couple years to get going but has paid enormous dividends. We now emulate tens of thousands of old software packages, reasonably well, and people can make use of these old pieces of software basically instantly through the internet.

The challenges were mostly around getting a program named Emscripten (a utility which converts code to javascript) to deal with the unique and massive requirements of emulators, and then to clean up the mess and iterate. When all was said and done, we caused code changes to nearly every major browser, several emulators, and Emscripten itself.

So engineering-wise, it was a bear. But the results have been really notable – emulation in the browser is just part of the landscape, with some advantages and disadvantages, but providing a way to play old software very efficiently and letting the modern world incorporate the lessons of the past with little trouble. It’s a triumph, and many, many good people played a part in making it happen.

How has the Apple II community been helpful in the quest to archive every piece of Apple II software?

The overwhelming majority have been helpful with memories, ideas, rummaging through their old collections, and offering up everything they can find and letting our hash-checkers and comparison utilities find unique items that in many cases might not have been thought of as unique. They’ve gotten the word out to friends, old places of work, and communities I would never have any reach in.

Once people find out those Apple II disks are worth taking a walk through, they’ve generously donated or lent disks to be imaged, and we’ve found thousands upon thousands of new images as a result. I wouldn’t go so far as to say we’re hitting completeness, but there’s many relevant titles and accomplishments that are bootable in a browser or downloadable that most folks would have never seen again.

The amount of people who have been down on this effort have been so scant and rare that it’s definitely a “man bites dog” situation; I could focus on them, but the arguments tend to be in the realm of “we’ve done enough” and “it will be confusing to have so much software available” and, well, life’s short.

The Apple II community is a vast and variant world of tinkerers, geniuses and good people who love that platform and the folks in it a whole lot. I’ve enjoyed doing some small part to preserve it.

Thanks for all your work to save the past, Jason!

Based in San Francisco, The Internet Archive is a “digital library” that provides free public access to collections of digitised materials including software, music, movies and nearly three million public-domain books. It also takes billions of periodic snapshots of webpages.

The Internet Archive hosts an extensive collection of vintage computer software, for a number of different platforms including the Apple II, Sinclair ZX Spectrum and the Atari 8-bit computer line. The Internet Archive also hosts archives of console games, for systems such as the Sega Master System and the Colecovision, and much of its available software can be used through emulators embedded in their associated entries.

The Internet Archive also hosts a large collection of vintage computer and video-game magazines.

It can be reached on the Internet at archive.org

Please consider subscribing to Paleotronic. Subscribers get a full digital PDF edition and an optional glossy full-colour printed copy, and all content is exclusive to subscribers until the following issue is released. Plus, by subscribing you support our efforts to spread understanding of the history and fundamentals of electronics! Thank you.

Be the first to comment

Leave a Reply