Forum Discussion
Hello,
I am following the article for testing pdf with the testcomplete. I have configured JVM and the class path as per the instruction. But if I write the loadDocument function in TestComplete script unit it does not receognize the code as it is in JAVA.
Can you let me know how can I use this code?
function loadDocument(fileName) { var docObj; // Load the PDF file to the PDDocument object docObj = JavaClasses.org_apache_pdfbox_pdmodel.PDDocument.load_3(fileName); // Return the resulting PDDocument object return docObj; }
Hello,
The article has mentioned about Javascript and it is calling the classes. I am using VBSCript, so can you somebody send across the code for VBScript.
Regards
Nimish
- Marsha_R7 years ago
Champion Level 3
Did you try it in vbscript? What happened when you did?
- nimishbhuta7 years agoFrequent Contributor
Hello Marsha,
I have tried using vbscript and was able to retrieve the text from the PDF but the way text is retreived is not helping me to compare the text values.
Example
In the PDF file there is heading say Supplier and below there are some text related to supplier. In the same row, it has Ship to Address and below it has some text related to ship to address. When I try to extract the text it shows me like this
SupplierShiptoAddress
some text of supplier + some text of shiptoaddress
some text supplier+ some text of shiptoaddress
and son on ..
Please see the attached screenshot the blue lines(I have hided the text due to confidentiality) indicating text.
It is difficult to verify for the supplier text as well as shipto address as both the text are combined.
Ideally, I would require the like Supplier : all corresponding text same with ShiptoAddress. I was thinking if we can export into excel then whether we can have the text in a particular format which is easy to compare but unfortunately I dont have option to export in excel from PDF file. I tried using Paragraph extraction using pdfbox but it shows line by line which is not helping me out.
I require some way to have the correct way of extracting for comparision purpose. Is there any we can convert the pdf into excel programmitcally or any other idea which you can think of?
Regards,
Nimish
- jab47437 years agoContributor
There are potential issues with OCR logic but have you tried using OCR retrievial of all the text in the PDF and then checking the text returned by the OCR logic to see if it contains the data/text in your excel? The most common issue in OCR is font smoothing - be sure to turn off font smoothing on the machine if you choose to try OCR. You can take a 'picture' of a section of the PDF document and just get the text from that picture. But then you may run into resolution and location changing issues. Best to pull all the text from the document if possible if you use OCR.