Forum Discussion

ykrrishna's avatar
ykrrishna
Occasional Contributor
8 years ago

Is it possible for extracting data from a table in a PDF file with python in testcomplete?

I am automating the fetching of a particular data from several PDF files in a same or similar format.

 

Note : The data lies in a table ; The PDF has images ; Python scripting is preferred ;

  • tristaanogre's avatar
    tristaanogre
    Esteemed Contributor

    Unfortunately, PDF's are a real bear to work with.  They are not open, usually, to getting information straight from them.

    Fortunately, a couple of versions back, the SmartBear folks wrote up a nice document on how to utilize a PDFBox library to parse out text from a PDF.  You can check it out at https://support.smartbear.com/articles/testcomplete/testing-pdf-files-with-testcomplete/

     

    One note: This will simply extract text.  It will not bring the information in as a "table" structure or anything like that, at least, not that I can tell from first glance. So, you will need to extract the text and then use an aqString find method to find the text in it's particular context.

    • ykrrishna's avatar
      ykrrishna
      Occasional Contributor

      Isnt this utilize java scripting ?

       

      Am currently working in a python project, so am not sure abut using this alone in java . Will it work ?

      I did run across it on my initial search but I skipped it as it said it is handled with java classes !

       

      Kindly bare if its a lame question but still I am not clear weather I can use this in my project !

      • tristaanogre's avatar
        tristaanogre
        Esteemed Contributor

        As far as I know, you can use the JavaBridge even in a Python project.  The idea of the JavaBridge is similar to the .NET CLR stuff in TestComplete... you can bring in and incorporate these kinds of classes and objects in your project and then utilize them, regardless of your scripting language of choice.  Obviously, you would need to adapt the code in the article to by for Python, something that I am not qualified to do because the only thing I really know about Python is that it exists and is supported in TestComplete.  Sorry.