Forum Discussion
Any suggestions??
Hi,
We have native methods to compare the files, I think that part is easy. The harder part will be masking the dynamic strings. What I have done in the past is removed data that matches a certain pattern, resave the file, and then compare. An example in python where I mask the dates is provided below:
def ComparePDF(path1, path2):
#make sure parameters are valid paths to pdf files
if (path1 != "" and aqFile.Exists(path1) and aqFileSystem.GetFileExtension(path1) == "pdf" and \
path2 != "" and aqFile.Exists(path2) and aqFileSystem.GetFileExtension(path2) == "pdf"):
# Get the text contents of PDF files
str1 = PDF.ConvertToText(path1)
str2 = PDF.ConvertToText(path2)
# Use the regular expression
# to replace the date/time stamp
regEx = "[\d]{1,2}/[\d]{1,2}/[\d]{4}"
#using re.sub method to replace dates with a constant str
str1 = re.sub(regEx, "<ignore_date>", str1)
str2 = re.sub(regEx, "<ignore_date>", str2)
#log the full text with replaced date values to show that the regular expression filtering worked for both pdf texts
Log.Message(str1)
Log.Message(str2)
# Compare the resulting contents
if (str1 == str2):
Log.Message("The text contents of specified PDF files are the same")
else:
Log.Message("The text contents are different")
Related Content
- 2 years ago
- 2 years ago
- 8 years ago
- 2 years ago
- 4 years ago
Recent Discussions
- 2 days ago
- 2 days ago
- 5 days ago