Eventually, there's no way to get rid of mistakes, but there's a way to reduce them.
You'll need:
- gImageReader (Click on the green box)
- Tesseract (Select "Windows Installer" and "Japanese language data" for 3.02)
Part 0
Before we start, I want you to know that you can use this method for many languages (You can find the list by clicking on "Tesseract" link), not only for Japanese. Good alternative is Adobe FineReader, but it doesn't support asian languages.
Part 1
- Install both programs
- Launch gImageReader from your "Start" menu
- Enter the directory address where you have installed Tesseract (It's usually either C:\Program Files\Tesseract-OCR or C:\Program Files (x86)\Tesseract-OCR)
- And now, in "Directory, containing Tesseract languages" box enter the same address, but add \tessdata at the end.
Part 2
A test
- Click "Open" and select a file
- Now change the language from English to Japanese/日本語 and select ja_JP
- Hit the "Recognize all button" or just select the area you need and click "Recognize selection"
Example
It seems to have detected all the selected character correctly, except this one:
And it is ok. Just select the character (but, firstly, zoom in the image) manually. I actually have never seen software that can work with Furigana.
Then click on "Save as" and that's it.
Have a good day :)
No comments:
Post a Comment