Ask a Question

splitting PDF text to array

SOLVED
NisHera
Valued Contributor

splitting PDF text to array

Hi,

I'm testing PDF file using PDFBox java class.

Everything properly set up and could strip text from pdf.

But problem is converting text to array...my function is like below

 

function ABCD(){
  var docObj = loadDocument("E:\\Temp\\Report100.pdf");
   //Create a text stripper object to get text 
  var textStripperObj = JavaClasses.org_apache_pdfbox_util.PDFTextStripper.newInstance();
  var text = textStripperObj.getText_2(docObj);  
  Log.Message('',text);
  var textArray = text.split('\r');
  Log.Message(textArray.Length);
  for (var i=0; i<25; i++){
    Log.Message( String(textArray[i])+ String(i));
  } 
}

From log message I could see correct text

but not in textArray... when debug it shows like below

tried with split('\n') ,  split('\b')...it's not getting array values...

but could see it's braking text to array.. 

 

debug resultsdebug results

 It is not possible to direct compare Old pdf with new pdf because page structure is defferent.

But contents are same (except dates ) for eg Old pdf has 6 pages but New pdf has 5 pages. 

 

2 REPLIES 2
baxatob
Community Hero

Hi,

 

Can you show the value of string variable text (as is)?

HKosova
SmartBear Alumni (Retired)

Hi NisHera,

 

This looks very similar to the array issue discussed in this thread. Try replacing

var textArray = text.split('\r');

with

var textArray = text.split('\r').OleValue.toArray();

and see if it helps.


Helen Kosova
SmartBear Documentation Team Lead
________________________
Did my reply answer your question? Give Kudos or Accept it as a Solution to help others. ⬇️⬇️⬇️
cancel
Showing results for 
Search instead for 
Did you mean: