HomeAbout MeProjectsContactMeow

Automatic Transcription and Accident Detection from Radio Chatter

02 Oct 2024

Demo: Fredericton Police Scanner

Last weekend, I hacked together a fully automated police scanner. It listens for fire, EMS, and police radio traffic, and uses a text-to-speech (TTS) model to transcribe this audio into text. From there, an LLM goes through the transcribed text, picking out accidents. All of this information is made available on the website linked above.

Automatic Radio Recording

The automated radio recording was the most turn-key part of this project. In my area, radio traffic is trunked. What this means is that there's essentially some "control" frequencies that transmit metadata. This metadata tells you where to find the voice transmission, as well as who the voice data belongs to. It's kind of like a pointer. For example, the control frequency might say "Channel 13579 currently mapped to 123.456MHz". This would mean tuning to 123.456MHz would give you (digital) voice data, and you'd know it'd be transmitted as channel 13579. This is pretty awesome because it's easy to automate, and you get a bunch of extra metadata for free.

The first part of recording radio data is knowing where to look. For this, I highly recommend RadioReference. It's a big database of frequencies, mapped to geographic areas. For my city, there's a bunch of frequencies mapped to "Maritime Public Safety Radio Network (MPSRN)" which is what I'm interested in. The MPSRN has its own page on RadioReference, which is a big help. Basically, it's a radio network that spans across Atlantic Canada, and is shared by police, EMS, and fire services, among others. Exactly what I need! So I wrote down all the control frequencies for the roughly half-dozen sites I could hear from my house.

To do the actual recording, I used an awesome piece of software called Trunk Recorder. It does exactly what it says on the tin - give it a list of control frequencies and it'll automatically record everything it hears, including metadata. It can actually do a lot more than just that, but that's the part I care about. The output of Trunk Recorder is a bunch of WAV files, as well as a JSON file for each with metadata. This metadata includes the frequency, start/end time of transmission, channel, etc. - Lots of generally useful data!

Here's the entirety of my Trunk Recorder config:

  {
      "ver": 2,
      "captureDir": "/home/stephen/recordings/",
      "controlRetuneLimit": 4,
      "sources": [{
      "center": 772000000.0,
      "rate": 8160000.0,
      "error": 0,
      "gain": 0,
      "antenna": "TX/RX",
      "digitalRecorders": 8,
      "driver": "osmosdr",
      "device": "hackrf=0",
      "ifGain": 32,
      "bbGain": 32
      }],
      "systems": [{
      "control_channels": [770656250,
                           770906250,
                           773031250,
                           773281250,
                           772056250,
                           772306250,
                           770206250,
                           770456250,
                           772118750,
                           772368750],
      "type": "p25",
      "squelch": -50,
      "modulation": "qpsk"
      }]
  }

In terms of hardware, I'm using a cheap HackRF I picked up used, connected to a discone antenna outside of my house. Alternatively, a set of RTL-SDR dongles would work fine, but this just happened to be what I have available. In fact, purchasing the HackRF at a flea market the previous weekend is what inspired the project!

Audio Transcription

For audio transcription, I'm using the Whisper family of models. I started with large-v3, but it was too slow on my CPU, so I downgraded to medium.en. Part-way through the project, large-v3-turbo came out, so I ended up switching to that. It's honestly incredible how well it works, especially with the quality of radio audio, but it still makes mistakes sometimes. I've noticed this especially with street names - in an ideal world it would be nice to fine-tune the model with the streets of my city. The other thing is, even the turbo model is fairly slow. It can keep up with traffic usually, but occasionally there'll be a big burst of activity and it'll take a little while before the site is able to transcribe it all. Eventually, I plan to move this to my GPU server, to increase performance dramatically.

I have a Postgres DB set up to keep track of audio metadata. This is imported automatically from the JSON files. It also holds a path to the audio file, and keeps track of what has been transcribed and what hasn't.

Accident Detection

Accident detection is a little more complex. I can't process each transcribed recording in a vacuum, because there might be valuable context in other messages. Instead, every 20 minutes, I look at the last 3 hours of transcriptions, and feed it into an LLM (llama3.1). All I really do is ask it to pull out the accidents. Each accident is labelled them with a type, as well as roughly when it occurred. I think there are probably better ways to do this, but it was relatively simple to implement. It also works reasonably well without leaving context on the table. However, it's quite inefficient (reprocessing many of the same messages every 20 minutes) and with long transcripts, it struggles a bit. Specifically, sometimes it can completely miss accidents. I also noticed it stopped respecting the prompt as much. I specifically asked it to avoid street names, as they're often transcribed wrong, but I found with longer transcripts it would start ignoring this.

The main disadvantage to doing it the way that I am is that it generates a ton of duplicate accidents, and every 20 minutes it creates even more. This is a problem that must be solved; otherwise, the list of accidents isn't very useful.

Accident Deduplication (Attempt 1)

Once again, I decided to reach out to llama3.1 to deduplicate accidents for me. I had a few different ideas for how this would work, but the one I settled on was relatively simple. Essentially, I'd give llama3.1 a list of the last X hours of accidents, and tell it to deduplicate them. Tweaking the prompt was a bit of work - sometimes it's not obvious if two accidents are the same or not, as they may not have the same summary. I partially counteracted this by turning the temperature down to 0. This feels a bit like a hacky workaround, but it did improve things.

Where the system really fell apart was when there were many accidents. As the context length grew, processing time increased as well. I also noticed the LLM getting sloppy - it might incorrectly duplicate an accident, or miss some entirely. This was pretty bad, as every 20 minutes the accident list on the page might change significantly.

Accident Deduplication (Attempt 2)

The following day, I came up with a better algorithm. Instead of giving the LLM all accidents at once and having it spit them back out, I'd feed it one-at-a-time. After each one, the LLM would simply output a boolean representing if it's a duplicate or not. This improved processing time - I was outputting way fewer tokens, after all. It also meant that the context length wasn't growing as much, and it seemed to make the LLM more accurate. The accident list on the site no longer changes massively every time the deduplication logic runs.

Future Improvements

I think the biggest improvements could come from the logic around pulling accidents out of the transcripts. There's a ton of text there and in most cases, processing a specific transcript more than once is a waste of time.

As an initial improvement, I'd like to have an LLM classify each transcript into "maybe relevant" and "definitely not relevant". Then, when generating the transcript that gets summarized, I could ignore all of the "definitely not relevant" transcripts. Looking through the site, I estimate this could lower processing time by about 80%, or possibly more.

Beyond that, looking back at 3 hours of transcripts is probably overkill. Instead, I could look at any of the new transcripts since the last run, plus an extra 10 minutes to be safe. This would prevent the transcript getting cut off if the accident detector happened to run right in the middle of an accident. We're deduplicating accidents anyway, so this is probably a pretty safe change to make.

Conclusion

While I don't consider this website to be that useful, it seems like a lot of people think it's neat at least. More importantly, I learned a lot while building it - especially around using LLMs more effectively. I'd like to continue using them to build cool things in the future.

Overall, it's very impressive what locally-running LLMs can do, even when running on cheap-ish hardware. Llama3.1 is the largest model I'm using, and it consists of only 8 billion weights. With the 70B models, I imagine performance could be even better.