Ask a Question

Using wildcards in CheckPDFText

Occasional Contributor

Using wildcards in CheckPDFText



Does anyone know if there is a way to use wildcards in PDF checks? More specifically I'm comparing a PDF generated by the software, but I know there will be differences in the date the PDF is generated depending on when the test is run, so I was hoping to use a wildcard for that part of the text interpreted by OCR.


Yes, there could be other ways to do this, but the example that I have uses Python. The first step is to check that both are PDFs, then convert them to text, ignore the dates, and then log a message if they are the same or different.

def MainTest():
path1 = "C:\\pdf_test\\test1.pdf"
#path2 = "C:\\pdf_test\\test1.pdf"
path2 = "C:\\pdf_test\\test2.pdf"

ComparePDF(path1, path2)

def ComparePDF(path1, path2):
#make sure parameters are valid paths to pdf files
if (path1 != "" and aqFile.Exists(path1) and aqFileSystem.GetFileExtension(path1) == "pdf" and \
path2 != "" and aqFile.Exists(path2) and aqFileSystem.GetFileExtension(path2) == "pdf"):
# Get the text contents of PDF files
str1 = PDF.ConvertToText(path1)
str2 = PDF.ConvertToText(path2)

#Use the regular expression to replace the date/time stamp
regEx = "[\d]{1,2}/[\d]{1,2}/[\d]{4}"

#using re.sub method to replace dates with a constant str
str1 = re.sub(regEx, "<ignore_date>", str1)
str2 = re.sub(regEx, "<ignore_date>", str2)

#log the full text with replaced date values to show that the regular expression filtering worked for both pdf texts

# Compare the resulting contents
if (str1 == str2):
Log.Message("The text contents of specified PDF files are the same")
Log.Message("The text contents are different")

Showing results for 
Search instead for 
Did you mean: