Some Research

Published October 29, 2017 at 2:20 PM EDT

Storing data on cassette tape is not a new idea. After Googling around for a bit, I found several old formats to store data on cassette:

Kansas City Standard (KCS) - 2-tone AFSK, rate of 300 baud
Computer Users’ Tape Standard (CUTS) - modified Manchester encoding that resembles AFSK, rate of 300 to 1200 baud
Hobbyists’ Interchange Tape System (HITS) - uses variable length tone bursts to create a self-clocking signal, rate up to 2500 baud.

I also ran into a few non-standard formats that hobbyists played with. This article has some useful information about CUTS as well as non-standard formats and their performance.

These systems have some drawbacks:

Asynchronous transfer (KCS) means tape and receiver must both use the same data rate. Minor speed changes due wow and flutter or a minor motor speed difference could cause bit-slip and loss of data.
Use of tones (KCS, CUTS, HITS) means bandwidth is increased. Storage efficiency is reduced since multiple periods of a waveform are required to store a bit.

I realized in the Tape Artchive project that cassette players can be painful mechanical beasts. Much of the aging technology is hard to get working, and not the most reliable once it is working. I also realized that consumer-level equipment wasn’t designed to give perfect playback speed or even a steady playback speed.

I would like to develop a storage format that accounts for the issues encountered on tape. I like some features of the formats I looked at, especially HITS. HITS is entirely self-clocked, meaning there is no fixed data-rate so small variations in speed should be well tolerated. I have access to more modern equipment that should (testing needed) be more friendly to square pulses than the equipment targeted by HITS, so using single pulses might be possible.

I also noticed all of the systems treat tape as a 2-state medium. I see tape as a ternary medium:

Sketch of ternary representation on tape

Given that tape has three states, a bipolar return to zero line coding can be used. Each bit can be encoded using a + or - pulse that returns to zero. This type of signal is entirely self-clocked, and uses a single pulse to store a bit.

This type of coding has a disadvantage: not all tape equipment will have the same phase. Recording and playback equipment may not have a matching phase causing the bit stream to be received inverted. This can be fixed by using a differential encoding, or by framing the data. I’ll worry more about this later; for now I’ll be testing different waveforms and seeing how my tape players distort them.