Forum Discussion

jasmeenkaur27's avatar
jasmeenkaur27
Contributor
8 years ago

PDF Validation

Hi,

 

My pdf contains details of 4 clients but I have to validate the data of only one client whose data is in 3rd page. I know the account number of the required client. How can I jump to that page and validate thedetails?

  • AlexKaras's avatar
    AlexKaras
    8 years ago

    Hi,

     

    Basically, the approach remains the same as mentioned by shankar_r: find the needed page by searching for account number (search either page by page or get the whole content of the document and search there - this depends on the structure of the given document and on how it was created) and do more detailed analysis for the found account. Exact implementation steps depend on the target document.

    However, the usual question that is asked in the situations like yours: do you really need to bother with PDF parsing? Isn't it acceptable, for example, to split this test into three smaller ones: first will get the raw data used to generate the document and verify that the data are as expected. This is easily automated usually. The second test will check that the generated document has correct layout: headers, logos, text columns, etc. This can be more or less easily automated using either image comparison or visual testing tools. And finally the third test, that may be executed manually and less often, will verify that the document contains correct data and has correct layout from the human point of view.

    Thoughts?

  • shankar_r's avatar
    shankar_r
    Community Hero

    To get started with PDF validation you can use this link https://support.smartbear.com/articles/testcomplete/testing-pdf-files-with-testcomplete/

     

    In-order to verify the text in particular page, you can do as described in below link

     

    For Ex:

     

    If you know where the client details comes, then set the start and end pages as same and find the client details in it.

     

    // Set the page index as a start page
    // Note that the page index here is not zero-based
    textStripperObj.setStartPage(3);
    
    // Set the same page index as an end page
    textStripperObj.setEndPage(3);
    
    // Get the text of the page
    text = textStripperObj.getText_2(docObj);
    
    //Use the text to compare your expected values
  • The issue is I do not know the page number which contains the client details.I just know the account number of that client so I search the account number in the PDF and after that I have to verify the remaining details
    • AlexKaras's avatar
      AlexKaras
      Champion Level 3

      Hi,

       

      Basically, the approach remains the same as mentioned by shankar_r: find the needed page by searching for account number (search either page by page or get the whole content of the document and search there - this depends on the structure of the given document and on how it was created) and do more detailed analysis for the found account. Exact implementation steps depend on the target document.

      However, the usual question that is asked in the situations like yours: do you really need to bother with PDF parsing? Isn't it acceptable, for example, to split this test into three smaller ones: first will get the raw data used to generate the document and verify that the data are as expected. This is easily automated usually. The second test will check that the generated document has correct layout: headers, logos, text columns, etc. This can be more or less easily automated using either image comparison or visual testing tools. And finally the third test, that may be executed manually and less often, will verify that the document contains correct data and has correct layout from the human point of view.

      Thoughts?

      • tristaanogre's avatar
        tristaanogre
        Esteemed Contributor

        Right with you, AlexKaras... it again goes along the line of the statement: Everything CAN be automated, true, but not everything SHOULD be automated.  And that SHOULD is not a moral question but more on the lines of ROI.  You can right a LOT of code to parse out the PDF, get it working, etc...  when what is really important is that the back end data is right, that the PDF layout is right, and that plugging that data into the PDF is right.  The first two are VERY easily automated, the last one not so easily... but a quick manual test "Hey, the PDF looks right" is really quick.

         

        Automation supplements manual testing, it should not replace it.