Forum Discussion

Community Hero

8 years ago

More OCR questions?

I'm continuing to mess about with this. What I'm trying to determine, is the best "common" values to use in the settings that will give me the best percentage of accurate results. The ima...

Colin_McCrae

Community Hero

Thanks for the suggestions. :smileyhappy:

I'm already cutting out small areas of the image. It's a touchscreen panel built up of multiple buttons. Which are in a grid. So I can extract only the button I want.

So already using a small area. Running it on the entire image was WAY too slow!

As I say, the font starts life as Tahoma, but it gets scaled and rendered by the software that produces the touch panel images so it's not 100% the same as standard system Tahoma by the time it gets there. The colours used also have a big effect. Foreground and background colours on the buttons are user configurable - but there will be limits around the colours you can use if you plan to use OCR. White text on a yellow background = not good!

So I've built myself a little test harness that loops through a lot of the possible options and tries each of my 40 saved button images so I can establish the most reliable settings for us. There are a ton of settings. Besides the actual OCR ones, there are also quite a few you can apply to the image file (saving as greyscale, re-sizing, compressing etc etc) before you run the OCR over it. Way too many permutations to figure it out manually as it's by no means an exact science.

The bit I can't figure out is how using:

font size 14 = 13 matches on large text

font size 16 = 13 matches on large text

font size 24 = 13 matches on large text

font size 30 = 5 matches on small text (?!?!? no idea how 30 is most effective - the text is TINY!)

But ....

Font sizes 14/16/24/30 all in one go, you would expect to match 13 large and 5 small. At least. But it doesn't?!?!? Instead I get 12 large matches and only 3 small. No idea how that's happening. Font size(s) available to the OCR engine is the only parameter changing between runs ...

EnergizerBunny

Contributor

8 years ago

One thing you could try is to turn OFF Clear Type Text on the test machines.

When Clear Type Text is set for your display device, it ‘smooths’ the pixels of the LCD monitor so the image looks better to the human eye, but it ‘skews’ the bitmap image on the screen and can interfere with the OCR function. It can also interfere with the TextObject identification.

On your machine, go to the Control Panel\Display\Adjust Clear Type text to make the change. You cannot do this via remote desktop, you must be on the console.

We do this as the standard setup on all our test machines.

Colin_McCrae
Community Hero
8 years ago
Thanks for that. Produced some interesting results.

I hadn't even looked at the Clear Type settings as I wasn't aware they could affect images. But I tried it, and you're right, it does.

With it off, the aliasing was less pronounced. And it caused fewer letters to appear "joined" together by the aliasing artifacts.

Unfortunately, it didn't translate into a big gain in the accuracy of the OCR. Maybe 10%? If you're lucky?

I think I'm simply going to have to give a few caveats around the colours used (stick to white text on dark backgrounds - which most of them are anyway) and keep the text short and simple. We can control the colours and text used so it's not a problem. Just need to make sure people are aware of it. But, follow a few simple rules, and it should accurate enough for my purposes.

Recent Discussions

mht files are by default opening in Edge browser and its blank
Solved
6 hours agonandini_thota
Searching for mapped object took too much time msg
17 hours agokfletcher
Azure Devops Integration supports Testcomplete Base Fixed Dektop Module License
2 days agoGane195
Devops integration support
2 days agoGane195
TestComplete14.72 is not recognizing WinUI-3 objects
2 days agoKiranGanji

Forum Discussion

More OCR questions?

Related Content

OCR service failed to process the document

OCR Action

Region Checkpoint - OCR

Set Variable Value with OCR

Using Tesseract-OCR in TestComplete

Recent Discussions

mht files are by default opening in Edge browser and its blank

Searching for mapped object took too much time msg

Azure Devops Integration supports Testcomplete Base Fixed Dektop Module License

Devops integration support

TestComplete14.72 is not recognizing WinUI-3 objects