Forum Discussion

kflood's avatar
kflood
Occasional Contributor
3 years ago
Solved

Using wildcards in CheckPDFText

Hello,

 

Does anyone know if there is a way to use wildcards in PDF checks? More specifically I'm comparing a PDF generated by the software, but I know there will be differences in the date the PDF is generated depending on when the test is run, so I was hoping to use a wildcard for that part of the text interpreted by OCR.

  • Yes, there could be other ways to do this, but the example that I have uses Python. The first step is to check that both are PDFs, then convert them to text, ignore the dates, and then log a message if they are the same or different.

    def MainTest():
    path1 = "C:\\pdf_test\\test1.pdf"
    #path2 = "C:\\pdf_test\\test1.pdf"
    path2 = "C:\\pdf_test\\test2.pdf"


    ComparePDF(path1, path2)

    def ComparePDF(path1, path2):
    #make sure parameters are valid paths to pdf files
    if (path1 != "" and aqFile.Exists(path1) and aqFileSystem.GetFileExtension(path1) == "pdf" and \
    path2 != "" and aqFile.Exists(path2) and aqFileSystem.GetFileExtension(path2) == "pdf"):
    # Get the text contents of PDF files
    str1 = PDF.ConvertToText(path1)
    str2 = PDF.ConvertToText(path2)

    #Use the regular expression to replace the date/time stamp
    regEx = "[\d]{1,2}/[\d]{1,2}/[\d]{4}"

    #using re.sub method to replace dates with a constant str
    str1 = re.sub(regEx, "<ignore_date>", str1)
    str2 = re.sub(regEx, "<ignore_date>", str2)

    #log the full text with replaced date values to show that the regular expression filtering worked for both pdf texts
    Log.Message(str1)
    Log.Message(str2)

    # Compare the resulting contents
    if (str1 == str2):
    Log.Message("The text contents of specified PDF files are the same")
    else:
    Log.Message("The text contents are different")

1 Reply

  • Yes, there could be other ways to do this, but the example that I have uses Python. The first step is to check that both are PDFs, then convert them to text, ignore the dates, and then log a message if they are the same or different.

    def MainTest():
    path1 = "C:\\pdf_test\\test1.pdf"
    #path2 = "C:\\pdf_test\\test1.pdf"
    path2 = "C:\\pdf_test\\test2.pdf"


    ComparePDF(path1, path2)

    def ComparePDF(path1, path2):
    #make sure parameters are valid paths to pdf files
    if (path1 != "" and aqFile.Exists(path1) and aqFileSystem.GetFileExtension(path1) == "pdf" and \
    path2 != "" and aqFile.Exists(path2) and aqFileSystem.GetFileExtension(path2) == "pdf"):
    # Get the text contents of PDF files
    str1 = PDF.ConvertToText(path1)
    str2 = PDF.ConvertToText(path2)

    #Use the regular expression to replace the date/time stamp
    regEx = "[\d]{1,2}/[\d]{1,2}/[\d]{4}"

    #using re.sub method to replace dates with a constant str
    str1 = re.sub(regEx, "<ignore_date>", str1)
    str2 = re.sub(regEx, "<ignore_date>", str2)

    #log the full text with replaced date values to show that the regular expression filtering worked for both pdf texts
    Log.Message(str1)
    Log.Message(str2)

    # Compare the resulting contents
    if (str1 == str2):
    Log.Message("The text contents of specified PDF files are the same")
    else:
    Log.Message("The text contents are different")