Blaine's World

Minor Improvements

Published

Shortly after marking this project “complete”, I had some ideas for improvements. I put these changes on the back burner for a couple months, but continued to come back and hack at them.

The first change was the addition of the command argument -l/--length-seconds that sets the length of audio to generate in seconds. Previously, the length of the output file was specified indirectly using the count of samples to generate per horizontal pixel which was cumbersome to figure out.

The next change was making the -v/--volume argument accept more standard values of gain (0.0 = 0%, 1.0 = 100%). After a lot of experimentation, I found a way to calculate the gain of each sine wave generator so that the sum of all generators is loud, but unlikely to clip. I ended up with a formula that uses the mean brightness of all pixels and the count of sine wave generators to calculate a gain that is applied to each sine wave generator. I figured this out by guessing and checking with bright and dark variations of the spectrogram image. This is in no way scientific and will likely break given the right input image.

During testing I got frustrated waiting on each run to finish so I took a little time to try to improve it. One of the main bottlenecks was pixel access using Image.getpixel. I removed calls to getpixel and replaced them with access to a list of pixel values loaded with Image.getdata. I did a very basic comparison between both versions of the program using the same input image which spanned 5.2 seconds of audio. The old version of the script tool just over 5 minutes to convert the audio, while the new version took just over 2 minutes. There was improvement, but generation still takes a long time. I added a decimal point to the percentage indicator so progress changes more often.

Other changes were minor. The count of sine wave generators was changed from a fixed count to a variable set to the number of integer frequencies between the high and low frequencies. The minimum frequency was also bumped from 1Hz to 10Hz after I realized the random fuzzing of generators was sometimes causing sub-hertz frequencies.

I ran the final version of the script several times and compared the outputs to the previous version, but I couldn’t tell a difference by ear. Instructions and example usage have been uploaded to the code repo linked on the project page.