how to edit a scanned document?

marbles

New Member
Joined
Jan 2, 2023
Messages
22
Reaction score
3
Credits
217
hi-

i scanned a document and from what i've read, in order to edit it, i need to convert the ocr to text

can someone please tell me how to do this via a GUI program. i don't know how to do terminal window stuff (and i don't have enough data to to learn it, i'm sure i'll have some questions and i just don't have that much data to ask those questions)

the only editing i need to do in the scanned pdf is just use the eraser and maybe add a text box

i know theres some smart people on here, so if you want a challenge :), here you go;

please let me know if theres a GUI scanning software program that will
1. scan a document and create the ocr AND
2. convert the ocr to text AND
3. allow me to edit it (use an eraser)

so i don't have to do use 3 different programs

thx for helping
 


forester

Well-Known Member
Joined
Mar 5, 2022
Messages
544
Reaction score
315
Credits
3,829
I have not used it but Master PDF Editor may have that capability. It gets good words from people I trust, anyway.

In a pinch, I have used GIMP for non-extensive editing.
 

sphen

Well-Known Member
Joined
Dec 12, 2022
Messages
324
Reaction score
322
Credits
3,938
I don't know which applications to recommend for scanning and OCR on Linux. I use a Mac. I sent a PM to @marbles to help with terminology and how OCR works. I changed my mind and decided to post it here for others to find if they search from the internet:

"I can't help with the answer, but I can help with terminology.

"You use scanner software to scan the paper document to a digital 'bitmap'. The bitmap is whatever is on the scanned area on the paper, one dot / blank spot at a time. The scanner software may keep the bitmap in memory or it could write the bitmap to a graphics file like .JPG, .PNG, .TIFF, .RAW, or another file format. Those graphics file formats know nothing about the characters (letters) on the paper. They only know the bitmap (dots) image.

" 'OCR' stands for 'optical character recognition'. It is the process that converts a bitmap into a text file. The OCR software looks at the dots in the bitmap and figures out which are characters and what they are. The OCR software can save the text to a text file or a document file for you.

"NOTE: OCR software is not perfect. It makes errors and requires proof reading. I type fast and accurately. For me, it isn't always worth it.

"The scanner software may have OCR capabilities built-in. You may need two programs - a scanner program to create the image file, and a separate OCR program to convert the image file to a text or document file. That's the part where others can help. I use Mac software for scanning and OCR. I don't know which Linux software would be appropriate for your needs."

-> I hope someone here who knows Linux applications can recommend appropriate scanner software and/or OCR software.
 

osprey

Well-Known Member
Joined
Apr 15, 2022
Messages
541
Reaction score
450
Credits
4,997
marbles wrote:
please let me know if theres a GUI scanning software program that will
1. scan a document and create the ocr AND
2. convert the ocr to text AND
3. allow me to edit it (use an eraser)
so i don't have to do use 3 different programs

Without great expertise in pdf editing, I cannot advise on a single program to achieve your aims. However, the way I've been able to do these things is to scan a doc to pdf format, then open the doc in the xournal program and fiddle about. One can erase anything by painting over it in white, or any colour over it, and add text anywhere, including over the erasure. Erasing can be fiddly because it's painting over a section of the pdf. It hasn't mattered in my case if the pdf is of a picture or text, erasing and filling in text has been adequate, but I can't say I've achieved professional looking docs. If the pdf is merely a form to fill in with empty spaces, a professional looking outcome is possible. My expertise with the xournal program, though sufficient for my needs, is not high order, and I've never needed to work with OCR. YMMV.
 

Lord Boltar

Well-Known Member
Joined
Nov 24, 2020
Messages
2,131
Reaction score
1,520
Credits
15,747
There is Tesseract which is a command line utility that does OCR

Have a look here - https://lindevs.com/install-tesseract-ocr-on-ubuntu/ and here - https://www.howtogeek.com/682389/how-to-do-ocr-from-the-linux-command-line-using-tesseract/
There is also a GUI for Tesseract called YAGF which is in the Debian Repos I do not know if anyone else has it or not - in fact both are in the Debian Repos

Also GOCR here - https://jocr.sourceforge.net/

of the two the only downfall to GOCR is it does not do multiline layouts very wel also as far as I know GOCR has not been updated for a couple of years and might be dead - so there is that
 
Last edited:

KGIII

Super Moderator
Staff member
Gold Supporter
Joined
Jul 23, 2020
Messages
8,549
Reaction score
7,304
Credits
69,523
As mentioned before, whenever I need to do OCR these days, I just use a search engine to find an online OCR service (there are perfectly free services). I think I still have a dedicated scanner that does OCR, but I haven't bothered with that in ages.
 

Terminal Velocity

Active Member
Joined
Oct 13, 2021
Messages
292
Reaction score
189
Credits
2,094
Google docs have an OCR function, once I needed OCR and was the best I found
 
MALIBAL Linux Laptops

Linux Laptops Custom Built for You
MALIBAL is an innovative computer manufacturer that produces high-performance, custom laptops for Linux.

For more info, visit: https://www.malibal.com

Members online


Top