Forum Discussion
I am able to extract the contents of word & PDF file to text files respectively. Word document is kind of template in which outline will be defined (how pdf should be generated ).Below sample is given.
Word doc:
<firstname>,<LastName>
<ID>,<organisation>
<salary>,<place>
Dear <firstname>,
you are working in the department of <organisation> and we are really honored to have you here. Expecting many more successful years of service from you.
Thanks,
Actual PDF:
John,Kennedy
234,google
USD1245,CA
Dear John,
you are working in the department of google and we are really honored to have you here. Expecting many more successful years of service from you.
Thanks,
can someone help with the comparison logic to validate both the static and dynamic content are getting generated as expected??
Hi,
What language are you using for scripting?
- kathir_433 years agoContributor
javascript
- kathir_433 years agoContributor
Any suggestions??
- mattb3 years agoStaff
Hi,
We have native methods to compare the files, I think that part is easy. The harder part will be masking the dynamic strings. What I have done in the past is removed data that matches a certain pattern, resave the file, and then compare. An example in python where I mask the dates is provided below:def ComparePDF(path1, path2):
#make sure parameters are valid paths to pdf files
if (path1 != "" and aqFile.Exists(path1) and aqFileSystem.GetFileExtension(path1) == "pdf" and \
path2 != "" and aqFile.Exists(path2) and aqFileSystem.GetFileExtension(path2) == "pdf"):
# Get the text contents of PDF files
str1 = PDF.ConvertToText(path1)
str2 = PDF.ConvertToText(path2)# Use the regular expression
# to replace the date/time stamp
regEx = "[\d]{1,2}/[\d]{1,2}/[\d]{4}"
#using re.sub method to replace dates with a constant str
str1 = re.sub(regEx, "<ignore_date>", str1)
str2 = re.sub(regEx, "<ignore_date>", str2)
#log the full text with replaced date values to show that the regular expression filtering worked for both pdf texts
Log.Message(str1)
Log.Message(str2)
# Compare the resulting contents
if (str1 == str2):
Log.Message("The text contents of specified PDF files are the same")
else:
Log.Message("The text contents are different")
Related Content
- 2 years ago
- 2 years ago
- 8 years ago
- 2 years ago
- 4 years ago
Recent Discussions
- 2 days ago
- 2 days ago
- 5 days ago