Is there an application that can translate text from the screen automatically?

rabcor · April 15, 2024, 5:13am

I’ve used google lens on my phone to translate text on physical objects sometimes, i was wondering if there is a comparable application on linux that can translate any text on the screen, like say in a game or video.

Kresimir · April 15, 2024, 5:35am

Yes, it’s called tesseract, it’s in the extra repo. Most of these phone programs that do OCR are powered by tesseract under the hood.

You may wish to use some kind of frontend for it. The CLI use is very straightforward:

tesseract image.png output

will create a file called output.txt (the extension is automatically appended by default, that’s quite stupid) containing the text it read in the image.

rabcor · April 15, 2024, 7:16am

doesn’t seem to work, if i screenshot from this https://archive.ph/TbkP1

The red text above the rope in particular, it just spits out this gibberish

VAEBEERID =V RERIEOLERFERE. &
DDA TERNMLS E BN BN TV o LA

Do i need something more than tesseract-data-jpn ?

Edit: Found out, i have to specify -l jpn but this only captures the text, it does not translate it, and certainly wouldn’t work for live translating text form a game, the idea here is to have an overlay next to non-english text on the screen which would show the english translation of it

Edit2: I manged to get sorta what I want by using tesseract + translate-shell…

#!/bin/bash
#Oneliner
#grim -g "$(slurp)" - | tesseract -l jpn - - | notify-send "Translation:" "$(trans -b -)"

lang=jpn # tesseract --list-langs && yay tesseract-data
engine=google # trans -S
scale=100 # scale image by this percentage; if working with very small text can be useful, slows down results considerably.

#Grab Area screenshot
grim -g "$(slurp)" - |
#Post-process screenshot (There's probably a better way)
#convert - -grayscale rec709luma -normalize -contrast-stretch 0 -lat 15x15+2% mogrify -sharpen 0x3.0 - |
convert - -resize $scale% colorspace gray -type grayscale -contrast-stretch 0 -lat 15x15+1% -contrast-stretch 0 -normalize mogrify -sharpen 0x3.0 -opaque none -alpha off  - |
#Read text from screenshot
tesseract -l $lang  - - |
#Translate & display translation
notify-send "Translation:" "$(trans -e $engine :eng -b -)"

The problem though… Is that tesseract is kinda shit, this works, but only sometimes, and the point of failure is always tesseract.

You can see i experimented a lot with trying to make the image data more palatable to tesseract but honestly, it barely helps.

Google lens is orders of magnitude better at finding text (it rarely fails to) and reading it correctly (seems like it doesn’t really fail much at all with that either). Meanwhile compare that to tesseract which can’t even read perfectly solid text from a screenshot… Yeah, it’s shit.