How to validate the content in PDF File using TestComplete

nimishbhuta
Frequent Contributor
7 years ago
Hello,

Based on the below article for OCR, as I understand that it works by taking the picture of the window and then the it validates the content inside the window. In my case, I have opened the PDF file in the browser window and to use the OCR get text method but it is not capturing the content.

I don't know if I am missing something here. Please can you open any PDF file in your IE or chrome browser and provide the actual steps for capturing it.

Also one question, will the OCR file works by provding the PDF file name as argument to retrieve the text?

Regards,

Nimish
Marsha_R
Champion Level 3
7 years ago
I suggest that you contact Support directly about this. They can help you select the best way to test your PDF. Here's the link:

https://support.smartbear.com/testcomplete/
tristaanogre
Esteemed Contributor
7 years ago
I'm looking into doing PDF testing myself right now. One thing that the PDFBox offers is the ability to break up the text of the PDF document into pages and, within the pages, breaking it up into paragraphs. It MIGHT be possible that you can find the specific information you want to validate by referencing a particular paragraph ID within the page that you're testing. Investigate, based upon the documentation for TestComplete and PDFBox, whether that will work for you.
nimishbhuta
Frequent Contributor
7 years ago
Hello,

Thanks for your response. I was going through the doc and tried using paragraph feature but the thing is that need to know the start paragraph and end paragraph. I tried entering specific text as mentioned in the text but not luck. I am not sure how can we know the paragraph id. If you are working on PDF and come across how to obtain paragraph then do share your code.

Another approach, I was thinking to use using PDFTextStripperByArea which helps to mark the area and retrieve the text. But somehow this calss is not supported by TestComplete as it is requires pdfbox 2.0 and it is not available in pdfbox 1.8.12

Here is the example

https://www.programcreek.com/java-api-examples/?api=org.apache.pdfbox.util.PDFTextStripperByArea

Regards,

Nimish
tristaanogre
Esteemed Contributor
7 years ago
You can probably use PDFBox 2.0... but keep in mind that you'll need to make sure you have the proper version of the JRE and that the methods and properties available may be different than what is in that article. You can give it a try... there's not "explicit" thing in PDFBox that prevents you from using a more recent version.

As for "knowing the paragraph"... you know the PDF. You have access to your "baseline" of what the PDF is. So, just write a bit of "throw-away" code to cycle through all the paragraphs in your desired page to find the ones you want and then utilize those ID's in your actual test code... that's my intent with my own project at least.

Forum Discussion

How to validate the content in PDF File using TestComplete

Recent Discussions

I'm not able to handle cookie pop up with test complete using java script,I'm using windows machine.

Unable to install upgrade of TestComplete as previous installation not finished.

If statement

Related Content

Validate Table content in PDF document

Validate SAP API Responses Using ReadyAPI Assertions

automate the validation of testcomplete tool