NisHera
8 years agoValued Contributor
splitting PDF text to array
Hi,
I'm testing PDF file using PDFBox java class.
Everything properly set up and could strip text from pdf.
But problem is converting text to array...my function is like below
function ABCD(){ var docObj = loadDocument("E:\\Temp\\Report100.pdf"); //Create a text stripper object to get text var textStripperObj = JavaClasses.org_apache_pdfbox_util.PDFTextStripper.newInstance(); var text = textStripperObj.getText_2(docObj); Log.Message('',text); var textArray = text.split('\r'); Log.Message(textArray.Length); for (var i=0; i<25; i++){ Log.Message( String(textArray[i])+ String(i)); } }
From log message I could see correct text
but not in textArray... when debug it shows like below
tried with split('\n') , split('\b')...it's not getting array values...
but could see it's braking text to array..
It is not possible to direct compare Old pdf with new pdf because page structure is defferent.
But contents are same (except dates ) for eg Old pdf has 6 pages but New pdf has 5 pages.
Hi NisHera,
This looks very similar to the array issue discussed in this thread. Try replacing
var textArray = text.split('\r');
with
var textArray = text.split('\r').OleValue.toArray();
and see if it helps.