Nerd-Dictation Installed - Not sure how to parse a sound file through it for translation

zPuppy

New Member
Joined
Dec 22, 2022
Messages
19
Reaction score
9
Credits
243
Hiya guys, I have installed nerd-dictation (speech to text software) and it is quite unbelievable. I am using it now in fact for this sentence.
https://github.com/ideasman42/nerd-dictation (Okay, I didnt speak the url :)

So, I had to install 'parec' which is the command for recording the audio. This audio is from my usb headset as I dictate, however I would like to parse a sound file through nerd-dictation, which I guess can be done with a simple command in terminal, however I cannot find any information on github or anywhere else.
The command I use to start the dictation is;

./nerd-dictation begin --vosk-model-dir=./model &

As an aside I have to cd to nerd-dictation folder for this command to work, and it borks if I don't, so that's my newbie workaround.
If you haven't tried nerd-dictation, give it a whirl, the 40mb offline speech file works extremely well.

Cheers
zPuppy
 


I have attached the below image which shows my audio routing using QPWGraph, which I installed to help explain the confusion after I installed pipewire, (feel free to make up your own mind looking at the image if it did that, haha!!!)

In any event the top green channel leaving the output of my 'USB headset' block only connects to the 'pw parec' block, after I start nerd-dictation using the start command in terminal (./nerd-dictation begin --vosk-model-dir=./model &)
As such there must be a way of using terminal to instruct an mp3 file/software to also direct to the parec module, and this is what I need to do.

I thought that this would be pretty straightforward for linux teckies, and so I have posted this inquiry into Newbie, Getting Started section of the forum, however if this is more suitable to be moved to another area of the forum, that would make sense.

Thanks in advance guys. zPuppy
Nerd-DictationScreenshot.png
 
G'day @zPuppy - I can't help with the solution, but looks really interesting.

if this is more suitable to be moved to another area of the forum, that would make sense.

I'm moving it to Linux Audio / Video , so it won't trail off so quickly as in Getting Started.

Also, perhaps tell us your Linux Distro - name, version and desktop environment.

Good luck.

Wizard
 
@zPuppy - I’d never heard of nerd dictation before. But I’ve taken a long look at the documentation AND the Python source code for nerd-dictation on GitHub and it doesn’t seem like it supports dictation from a sound-file.

It looks like it’s only using parec, or sox to record/stream live, real time audio in .wav format, as input to the speech to text API he’s using.

Unless I’m mistaken, I don’t think there’s a way to pass-in sound files yet. It looks like it only deals with live dictation.

Your best bet might be to contact the original developer of nerd-dictation, to see if it’s possible for them to add that ability at some point.

And even if I did miss something and it is already possible, at the very least, you’d still need to convert your MP3’s to .wav format as it deals purely with .wav data as input.

Edit:
Oh, wait - re-reading your post, you’re wanting to route audio from another application, to the input of parec, or sox?
I don’t have my Linux laptop in front of me right now. But at some point in the next few days, I’ll try look into that.
It might be possible to send the output from VLC to the input of parec/sox. Which would require you to play the audio files in real-time, to perform dictation.

Saying that, parec is the pulseaudio record command. I don’t think you can route an audio file through parec. I think that records from a hw device…… Hmmmm…. IDK… I’ll do some more digging and have a play over the next few days.
 
Last edited:
Thanks so much for responses Wizard and Jas. here are details of my O/S etc...
Operating System: Zorin OS 16.2
Kernel: Linux 5.15.0-58-generic
Architecture: x86-64

As for sessions running;
gnome-session-custom-session
pipewire-media-session

It it helps, I have a HP-Elitebook, 8570W, and the command prompt correctly advised me that the chassis is :laptop, which is possibly a little over helpful.

With regards to Nerd-Dictation Jas, it is unbelievable. The install is step-by-step and for me worked out the box.
I did however have an issue after relocating the Vosk-Model to the Nerd folder, so I reverted back to previous step and issue start/end commands after cd-ing to the nerd folder. I have used the 40MB model, and have not bothered to look at the GB model, and I am Engleeesh, with a Brumy accent (think, Ozzy Osbourne 'Oh FFS Sharon' although not as pronounced thankfully) and still the accuracy is spot-on.

I do have some bodge workarounds, where I will attempt to play a sound file with my USB headphones next to the speaker (yeah, a bodge) and I will try and pull some of the spaghetti strings on my QPWGraph and see if it handles the abuse. If I am successful I will report back.

Cheers
zPuppy
 
and I am Engleeesh, with a Brumy accent (think, Ozzy Osbourne 'Oh FFS Sharon' although not as pronounced thankfully) and still the accuracy is spot-on.
Ah, I'm from deepest, darkest Scumerset, the land of Cider and I sound like one of the Wurzels, ha ha!
 
Nerd-Dictation updates, good news and totally confused news.

So, I did carry out the tests using the spaghetti monster, (new image attached).

With regards to 'parec' details of man page stuff I guess is here...
It uses libsndfile so can handle many different types of audio files.

So, I started my QPGraph, played my sound file (which can be seen by the Videos block top right), then started Nerd-Dictation from terminal, and the Parec block appeared (Top Center). I then paused the sound file, and dragged the connections as per the white arrows I have edited onto pic....

Text output from Nerd below;

"hello this is loosely the morning lately receive request funny obama or me owner of the management yeah he our current how the guy and not a good actually i’m incredibly employees or should be prepared as faan a mile deep yard sale"

Yep, not exactly what was said. And then my computer went into an uncontrollable meltdown as I was attempting to de-activate nerd, by stopping the video sound file, remove the spaghetti, enter 'end' in terminal etc. It took a few minutes and even after I had stopped typing popping up into everything i went near, the keys from my keyboard were borked, so some unusual hangover effects experienced.

Bizarre experience, however there is scope to get this to work. It does need some refinements, and ideally a quick way of being able to de-activate nerd, so I hope this helps.

(As for the nerd output I will resist the temptation to upload to 4chan for an explanation as I have no doubt it would include advice to take my laptop for an exorcism!!!)




ParecMapping.png
 
More bizareness, 28 minutes after my last upload (just now) I had open a text file, and was watching a youtube clip on another pc, and typing stated appearing in the text file. I guess this was from the USB headset picking up sound and sending through parec block, which I have now de-activated (using QPGraph). What is odd is that I had stopped Nerd via terminal command, and so there is some crazy conflict occurred, possible by me playing with the spaghetti strings???
 
Hiya guys, I have installed nerd-dictation (speech to text software) and it is quite unbelievable. I am using it now in fact for this sentence.
https://github.com/ideasman42/nerd-dictation (Okay, I didnt speak the url :)

So, I had to install 'parec' which is the command for recording the audio. This audio is from my usb headset as I dictate, however I would like to parse a sound file through nerd-dictation, which I guess can be done with a simple command in terminal, however I cannot find any information on github or anywhere else.
The command I use to start the dictation is;

./nerd-dictation begin --vosk-model-dir=./model &

As an aside I have to cd to nerd-dictation folder for this command to work, and it borks if I don't, so that's my newbie workaround.
If you haven't tried nerd-dictation, give it a whirl, the 40mb offline speech file works extremely well.

Cheers
zPuppy
@zPuppy
Hi mate, having only just done exactly what you're trying to do, I had to register this account purely to point you in the right direction. It's not every day I get to know the right answer. ;)

I used these two (excellent) guides. Took me a bit to find them but once I did they had me up and running in only a couple of minutes. It's dead easy....once you know how. ;)

singerlinks.com/2021/07/how-to-convert-speech-to-text-using-python-and-vosk/ (there are other excellent guides on here too)

medium.com/analytics-vidhya/offline-speech-recognition-made-easy-with-vosk-c61f7b720215
(While this is written for a Windows environment, it also apply verbatim to Linux but instead of doing a "pip install pyaudio" do a "pip install ffmpeg" and "pip install pydub" into whatever Python environment you're using for Vosk e.g. if using venv like in the Singerlinks guide. Otherwise just use pip to install ffmpeg and pydub to your local/global environment.)

As an additional aside, while the above solutions don't use Nerd-dictation, they just use Vosk, when using Nerd-dictation for on-the-fly speech to text I highly recommend Elograf as a GUI front-end. It sits a clickable start/stop icon in your system tray and makes configuration dead simple e.g. tweaking timeout/idle settings, quickly switching language models, etc. Output is to whatever application you have active at that moment meaning you can switch back and forth on the fly to wherever you want your text to appear. While I haven't found the need yet, making a key-binding to the start/stop function would also be trivial if you wanted. Like Nerd-dictation itself it worked perfectly for me straight out of the box. github.com/papoteur-mga/elograf

Anyhoo, hope this helps you (and others) as much as it helped me. Let us know how you go.

Cheers mate.
 
Last edited:

Members online


Latest posts

Top