Forum Discussion

NisHera's avatar
NisHera
Valued Contributor
8 years ago

splitting PDF text to array

Hi,

I'm testing PDF file using PDFBox java class.

Everything properly set up and could strip text from pdf.

But problem is converting text to array...my function is like below

 

function ABCD(){
  var docObj = loadDocument("E:\\Temp\\Report100.pdf");
   //Create a text stripper object to get text 
  var textStripperObj = JavaClasses.org_apache_pdfbox_util.PDFTextStripper.newInstance();
  var text = textStripperObj.getText_2(docObj);  
  Log.Message('',text);
  var textArray = text.split('\r');
  Log.Message(textArray.Length);
  for (var i=0; i<25; i++){
    Log.Message( String(textArray[i])+ String(i));
  } 
}

From log message I could see correct text

but not in textArray... when debug it shows like below

tried with split('\n') ,  split('\b')...it's not getting array values...

but could see it's braking text to array.. 

 

debug results

 It is not possible to direct compare Old pdf with new pdf because page structure is defferent.

But contents are same (except dates ) for eg Old pdf has 6 pages but New pdf has 5 pages. 

 

  • Hi NisHera,

     

    This looks very similar to the array issue discussed in this thread. Try replacing

    var textArray = text.split('\r');

    with

    var textArray = text.split('\r').OleValue.toArray();

    and see if it helps.

  • HKosova's avatar
    HKosova
    SmartBear Alumni (Retired)

    Hi NisHera,

     

    This looks very similar to the array issue discussed in this thread. Try replacing

    var textArray = text.split('\r');

    with

    var textArray = text.split('\r').OleValue.toArray();

    and see if it helps.

  • baxatob's avatar
    baxatob
    Community Hero

    Hi,

     

    Can you show the value of string variable text (as is)?