Vocal Isolating and Amplifying Headphones

The goal of this project is to create a headphone system that isolates the vocals in a media stream and amplify them to increase intelligibility.

Background
The goal of this project is to create a device that assists the hard of hearing in picking out and distinguishing human vocals, in media like television and film, from background noise and other sounds that can distract from overall speech intelligibility. This is being explored from several different approaches, first is a simple filter that lowers the perceived volume of non-human speech frequencies while leaving the spectrum that the human voice commonly falls in (1-5kHz) untouched. This method has the drawback of not attenuating noises that are non-human in origin but occupy the same frequency range. The second approach is to utilize a blind source separation algorithm (BSS) to analyse and separate the individual components that make up the mixed audio signal. This will allow us to extract the human speech from the signal and amplify it separately from the source signal before remixing at the output of the device. The drawback to this method is the processing time required to perform the BSS, which could delay the audio signal by enough for the offset from the video signal to be noticeable.

Deliverables

 * Raspberry Pi with sound card and auto running code
 * Peripherals such as headphones, RCA cables, and power cables
 * Documented code for Future iterations

Design
We are developing two possible solutions to the problem statement but both using the Raspberry Pi. This two pronged approach will allow us to deliver the best possible product.



Raspberry Pi and Sound Card
Our design uses a Raspberry Pi and a sound card to interface with the external media. The Raspberry Pi is used as an IO unit and as a signal processor for either the Real Time Equalizer of the Blind Source Separation.

Pyaudio and Real Time Equalizer
Using a python package, Pyaudio, the pi can sample the RCA input and process it in chunks. Each chunk is separated into channels and then independently processed. Each channel is ran through a bandpass equalizer that aims to retain the human voice (1k-5k Hz). This is then added back to the original mix with a scalar multiplier. This is then sent to the output RCA jacks for the user to enjoy. this

Blind Source Separation
This is an unsolved problem in signal processing, but there are various algorithms that give good approximations for separating various types of mixed signals. One that we are trying to get working for our problem is DUET. It is a python package that excels at separating the human voice from audio. We plan to run the audio stream through duet and then mix the human voice back into the original mix with a scalar multiplier.

Verification
Verification for this product is mostly subjective to the user. We revived feedback from local testers to help improve our product. You can try it for yourself in our Duet example below!

section 1. Original track section 2. DUET isolated vocals section 3. A mixing between section 1 and 2 section 4. Original track for reference



Document Archive