Forum Discussion
2 Replies
- AlexKarasChampion Level 3
Hi,
Both file formats that you mentioned are containers for different data (mentioned by you as well: text, pictures, formulas, etc.) and thus there is no best way to compare them. Everything depends on your actual needs.
In the simplest case when you need to compare only the text, you may extract the text from the file (for example, using the PDFBox library for pdf files or using the SaveAs functionality of Excel and saving to cvs file for Excel), pre-process the obtained text to exclude dynamic information you are not interested in (e.g. date when the document was generated) and do comparison.
Also, you may open document in its native application (PDF Reader or Excel), take screenshots and compare them with the expected baseline using some third-party provider (e.g. https://applitools.com/).
In the complex cases when you need, for example, to compare formulas from Excel file, you will need to resort to the object model of the given file (e.g. Excel Object model) and to use it to get access to the elements that you need to work with.
All above topics were discussed here several times and you may search for them for more ideas/approaches.
- rusantosContributor
Thanks, I will try all your suggestions!
Related Content
Recent Discussions
- 3 days agoMW_Didata