Thanks for the suggestions. :smileyhappy:
I'm already cutting out small areas of the image. It's a touchscreen panel built up of multiple buttons. Which are in a grid. So I can extract only the button I want.
So already using a small area. Running it on the entire image was WAY too slow!
As I say, the font starts life as Tahoma, but it gets scaled and rendered by the software that produces the touch panel images so it's not 100% the same as standard system Tahoma by the time it gets there. The colours used also have a big effect. Foreground and background colours on the buttons are user configurable - but there will be limits around the colours you can use if you plan to use OCR. White text on a yellow background = not good!
So I've built myself a little test harness that loops through a lot of the possible options and tries each of my 40 saved button images so I can establish the most reliable settings for us. There are a ton of settings. Besides the actual OCR ones, there are also quite a few you can apply to the image file (saving as greyscale, re-sizing, compressing etc etc) before you run the OCR over it. Way too many permutations to figure it out manually as it's by no means an exact science.
The bit I can't figure out is how using:
font size 14 = 13 matches on large text
font size 16 = 13 matches on large text
font size 24 = 13 matches on large text
font size 30 = 5 matches on small text (?!?!? no idea how 30 is most effective - the text is TINY!)
But ....
Font sizes 14/16/24/30 all in one go, you would expect to match 13 large and 5 small. At least. But it doesn't?!?!? Instead I get 12 large matches and only 3 small. No idea how that's happening. Font size(s) available to the OCR engine is the only parameter changing between runs ...