Forum Discussion

tbom's avatar
tbom
Contributor
9 years ago

OCR Recognition help

Hi

 

I'm trying to decode a simple "captcha" type image, which is pure text, without any added noise. I'm using the latest and greates TC 11.3 (downloaded yesterday)

 

I have tried different hoops and loops, but to no prevail. 

 

The target is a website running in IE, and I have no control over the target software or image generation.

 

The following code is what I have experimented so far (Python)

def Test1():
captchaImg = Aliases.browser.pageCommonsaas.formFmLocallogin.Image("vertifyCodeImg").Picture(); Log.Picture(captchaImg, "Captcha image") ocrCaptcha = OCR.CreateObject(captchaImg) OCROptions = ocrCaptcha.CreateOptions() OCROptions.ActiveRecognitionSet = 1 OCROptions.ExactSearch = False font = OCROptions.Fonts.Add() font.Name = "Consolas" font.Sizes.Add(18) font.Sizes.Add(16) font.Sizes.Add(20) capCode = ocrCaptcha.GetText(OCROptions) Log.Message(capCode)

An example image is attached (as taken from the Log.Picture() code line)

 

Any pointers are mostly welcome.

 

 

4 Replies

  • Silmaril's avatar
    Silmaril
    SmartBear Alumni (Retired)

    tbom, the following code recognizes your sample captcha in TestComplete 11.31:

    def test():
      captcha = Sys.Browser("iexplore").Page("http://community.smartbear.com/nwkab66374/attachments/nwkab66374/Getting_Started_with_TestComplete/22087/1/captcha.png").Image("captcha_png") 
      Log.Message(recognizeCaptcha(captcha))
    
    def recognizeCaptcha(captcha):
      recognizer = OCR.CreateObject(captcha)
      options = recognizer.DefaultOptions
      options.GrayScaleBinarization = True
      return recognizer.GetText(options)

     

  • This response is not really going to help, but I think you are missing the obvious in that Captcha is intended to prevent automation from decoding the message! :smileyvery-happy: 

    • tbom's avatar
      tbom
      Contributor

      I am aware of the fact, that CAPTCHA normally is designed to keep off bots etc. But this captcha is simple text, there is no fancy distortion in the images, or anything.. Just plain text. (I see it as an extra layer of annoyance for the end user in this context)

       

      And since it's just plain text, without any distortion added, I thought it should be possible to do an OCR on it..

       

      Oh well, I'll just have to ask the system vendor to set it to a fixed value in all our test environments.. 

       

      / Thomas

      • mes6073's avatar
        mes6073
        Contributor

        While I have personally not dealt with resolving captcha's, I did have to engineer a solution to reading text in an owner dranw control written in C++. I was able to leverage the OCR library packaged with Microsoft's PowerPoint and typically found that I had to first enlarge the image as the MS OCR library had issues with the default font and text size. Also, a quick web search and I see there are numerous links that outline how and offer code used to resolve the type of Captcha's you are working with:

         

        Python OCR… or how to break CAPTCHAs

        Solving CAPTCHA with OCR