Forum Discussion

alibaba82's avatar
alibaba82
Super Contributor
16 years ago

Error While parsing XML file using groovy

Hello,
I have a URL which returns a page in XML. I have some groovy code that writes the XML locally to a file and then parses this file to store certain element to a database.

The code is

Object ReturnTopXAA(int num)
{
def address = AATopXItems + num.toString()
def file = new FileOutputStream("AAXResults.xml")
def out = new BufferedOutputStream(file)
out << new URL(address).openStream()
out.close()
//Parse XML From AAResults.xml
def records = new XmlSlurper().parse(new File("AAXResults.xml"))
def recordsSize = records.response.items.item.id.size()
def ItemsList = [];
for (i in 0..recordsSize-1)
ItemsList.add(records.response.items.item.id)


return ItemsList;
}

This code works fine upto returning 478 items. However if I try to return any more I get the following error.

apache xerces malformedbytesequenceexception invalid byte 2 of 3-byte utf-8 sequence

Any idea what I can do to fix this.

Thanks

Ali

4 Replies

  • alibaba82's avatar
    alibaba82
    Super Contributor
    some more update. I checked what this 479 item was. It was
    Celebrity Exposé

    I guess the é  seems to be causing the issue. How can I modify my code so that these kinds of characters can also be parsed. It would be Ok to skip these values.

    Thanks

    Ali
  • alibaba82's avatar
    alibaba82
    Super Contributor
    would it be possible to get an expedited answer to this please.
    I am in a hurry for this issue.

    Thanks in advance

    Ali
  • omatzura's avatar
    omatzura
    Super Contributor
    Hi Ali,

    Is the file correctely encoded? Does it have the encoding set to UTF-8 in the xml header?

    regards!

    /Ole
    eviware.com
  • alibaba82's avatar
    alibaba82
    Super Contributor
    hi,
    it seems like it was the encoding. The encoding was utf-8 and that did not work. I then changed the encoding to iso-8859-1 and that seemed to work.