How to validate the content in PDF File using TestComplete

Marsha_R
Champion Level 3
7 years ago
Did you try it in vbscript? What happened when you did?
- nimishbhuta
  Frequent Contributor
  7 years ago
  Hello Marsha,
  
  I have tried using vbscript and was able to retrieve the text from the PDF but the way text is retreived is not helping me to compare the text values.
  
  Example
  
  In the PDF file there is heading say Supplier and below there are some text related to supplier. In the same row, it has Ship to Address and below it has some text related to ship to address. When I try to extract the text it shows me like this
  
  SupplierShiptoAddress
  
  some text of supplier + some text of shiptoaddress
  
  some text supplier+ some text of shiptoaddress
  
  and son on ..
  
  Please see the attached screenshot the blue lines(I have hided the text due to confidentiality) indicating text.
  
  It is difficult to verify for the supplier text as well as shipto address as both the text are combined.
  
  Ideally, I would require the like Supplier : all corresponding text same with ShiptoAddress. I was thinking if we can export into excel then whether we can have the text in a particular format which is easy to compare but unfortunately I dont have option to export in excel from PDF file. I tried using Paragraph extraction using pdfbox but it shows line by line which is not helping me out.
  
  I require some way to have the correct way of extracting for comparision purpose. Is there any we can convert the pdf into excel programmitcally or any other idea which you can think of?
  
  Regards,
  
  Nimish
  
  Screenshot.docx272 KB
  - jab4743
    Contributor
    7 years ago
    There are potential issues with OCR logic but have you tried using OCR retrievial of all the text in the PDF and then checking the text returned by the OCR logic to see if it contains the data/text in your excel? The most common issue in OCR is font smoothing - be sure to turn off font smoothing on the machine if you choose to try OCR. You can take a 'picture' of a section of the PDF document and just get the text from that picture. But then you may run into resolution and location changing issues. Best to pull all the text from the document if possible if you use OCR.

Forum Discussion

How to validate the content in PDF File using TestComplete

Recent Discussions

I'm not able to handle cookie pop up with test complete using java script,I'm using windows machine.

Unable to install upgrade of TestComplete as previous installation not finished.

If statement

Related Content

Validate Table content in PDF document

Validate SAP API Responses Using ReadyAPI Assertions

automate the validation of testcomplete tool