Forum Discussion

Frequent Contributor

6 years ago

How to validate the content in PDF File using TestComplete

Hello All, I need to validate the content displayed in the PDF with the data in the excel sheet. Please let me know following things : 1) How can I read the PDF file content? 2) How ca...

nimishbhuta

Frequent Contributor

Hello Marsha,

I have tried using vbscript and was able to retrieve the text from the PDF but the way text is retreived is not helping me to compare the text values.

Example

In the PDF file there is heading say Supplier and below there are some text related to supplier. In the same row, it has Ship to Address and below it has some text related to ship to address. When I try to extract the text it shows me like this

SupplierShiptoAddress

some text of supplier + some text of shiptoaddress

some text supplier+ some text of shiptoaddress

and son on ..

Please see the attached screenshot the blue lines(I have hided the text due to confidentiality) indicating text.

It is difficult to verify for the supplier text as well as shipto address as both the text are combined.

Ideally, I would require the like Supplier : all corresponding text same with ShiptoAddress. I was thinking if we can export into excel then whether we can have the text in a particular format which is easy to compare but unfortunately I dont have option to export in excel from PDF file. I tried using Paragraph extraction using pdfbox but it shows line by line which is not helping me out.

I require some way to have the correct way of extracting for comparision purpose. Is there any we can convert the pdf into excel programmitcally or any other idea which you can think of?

Regards,

Nimish

Screenshot.docx272 KB

jab4743

Contributor

6 years ago

There are potential issues with OCR logic but have you tried using OCR retrievial of all the text in the PDF and then checking the text returned by the OCR logic to see if it contains the data/text in your excel? The most common issue in OCR is font smoothing - be sure to turn off font smoothing on the machine if you choose to try OCR. You can take a 'picture' of a section of the PDF document and just get the text from that picture. But then you may run into resolution and location changing issues. Best to pull all the text from the document if possible if you use OCR.

nimishbhuta
Frequent Contributor
6 years ago
Hello,

I don't know on how to use OCR, can you please guide me on it. I am using PDFBox to retreive the text from the PDF file. I am able to retrieve it but I am unable to compare it as it is not correctly organized as mentioned previously.

Kindly provide step by step instructions to do so.

Regards,

Nimish
- jab4743
  Contributor
  6 years ago
  https://support.smartbear.com/testcomplete/docs/testing-with/advanced/ocr/index.html
  - nimishbhuta
    Frequent Contributor
    6 years ago
    Hello,
    
    Based on the below article for OCR, as I understand that it works by taking the picture of the window and then the it validates the content inside the window. In my case, I have opened the PDF file in the browser window and to use the OCR get text method but it is not capturing the content.
    
    I don't know if I am missing something here. Please can you open any PDF file in your IE or chrome browser and provide the actual steps for capturing it.
    
    Also one question, will the OCR file works by provding the PDF file name as argument to retrieve the text?
    
    Regards,
    
    Nimish

Recent Discussions

Access a child in a tree Structure through Keyword Strokes
24 hours agovivekmit
Turn off all Edge messageboxes
2 days agoMW_Didata
How to verify data from another application, the data which keeps changing
2 days agoAli2
Audit and Reporting purposes
2 days agod_gfleetwood92
Textbox values disappear after adding Year value from Date Picker
2 days agobdr100

Forum Discussion

How to validate the content in PDF File using TestComplete

Related Content

how to read and validate json file using testcomplete

Validate Table content in PDF document

Get soap response content

TestComplete with Zephyr Scale

How to change Content-Length?

Recent Discussions

Access a child in a tree Structure through Keyword Strokes

Turn off all Edge messageboxes

How to verify data from another application, the data which keeps changing

Audit and Reporting purposes

Textbox values disappear after adding Year value from Date Picker