Forum Discussion

fa's avatar
fa
Occasional Contributor
7 years ago

XmlSlurper/XmlParser to store data from HTML content

Hello,

 

I am a new user of SoapUI community version (5.0.0).

I need to store the action value from the (HTML) response (below):

 

<html>
   <head>
      <meta content="HTML Tidy for Java (vers. 26 sept. 2004), see www.w3.org" name="generator"/>
      <title>201 Created</title>
   </head>
   <body>
      <h1>TGT Created</h1>
      <form method="post" action="URI">
         Service:
         <input value="" name="service" type="text"/>
         <br/>
         <input value="Submit" type="submit"/>
      </form>
   </body>
</html>

 

I am trying to use Groovy to finally get this value (through XmlSlurper or XmlParser) using the following declarations (to parse the context) through Script assertion (on the same step):

 

import groovy.util.XmlSlurper;

def response = context.expand( '${Login#Response}' ).toString();
def slurper = new XmlSlurper();
def xmldata = slurper.parseText response;

.. but i get "Premature end of file".

Could someone provide some information about this problem?

 

Thank you in advance

  • SoapUI has already converted HTML to standard XML format (with closed tags). By changing the script to responseContentAsXml you could get the action value

     

    import groovy.util.XmlSlurper;
    
    def response = messageExchange.responseContentAsXml;
    def slurper = new XmlSlurper();
    def xmldata = slurper.parseText response;

    def action = xmldata.body.form.@action
    log.info action

     

     

  • PaulMS's avatar
    PaulMS
    Super Contributor

    The response variable is probably null. Is the name of the test step Login correct?

    You could use

    def response = messageExchange.response.responseContent

     

    Usually you would not need to convert the response to string.

  • JHunt's avatar
    JHunt
    Community Hero
    def xml = '''
        <html><head><title>201 Created</title></head>
        <body><h1>TGT Created</h1><form action="URI" method="POST">
        Service:<input value="" name="service" type="text"/>
        <br/>
        <input value="Submit" type="submit"/></form></body></html>
        '''
    def html = new XmlSlurper().parseText(xml)
    assert html.body.form.@action == "URI"

    However, your "Raw view" XML wouldn't parse for me because the input tags are missing the / at the end of the tag so I had to add them manually.

  • JHunt's avatar
    JHunt
    Community Hero

    Sorry, I'm editing this post because I was way off.

     

    It looks like HTML is not a true subset of XML, and some HTML tags are invalid when read as XML.

     

    Even as far as saying that in HTML input tags must NOT be closed, but it is illegal not to close them in XML.

     

    This might be useful to you:

    http://www.frommknecht.net/robust-html-parsing-the-groovy-way/

     

    Edit again: This might still be of use to you...

     

    def html = '''
        <html><head><title>201 Created</title></head>
        <body><h1>TGT Created</h1><form action="URI" method="POST">
        Service:<input value="" name="service" type="text">
        <br/>
        <input value="Submit" type="submit"></form></body></html>
    '''
    def matches = html =~ /.*form action\=\"([^\"]*)\".*/
    matches.find()
    assert matches.group(1) == 'URI'
    • PaulMS's avatar
      PaulMS
      Super Contributor

      SoapUI has already converted HTML to standard XML format (with closed tags). By changing the script to responseContentAsXml you could get the action value

       

      import groovy.util.XmlSlurper;
      
      def response = messageExchange.responseContentAsXml;
      def slurper = new XmlSlurper();
      def xmldata = slurper.parseText response;

      def action = xmldata.body.form.@action
      log.info action

       

       

      • fa's avatar
        fa
        Occasional Contributor

        Hello Paul.

         

        With this groovy expression, I managed to retrieve the content value of the action parameter!

        Thank you very much!

  • fa's avatar
    fa
    Occasional Contributor

    Hello Paul,

     

    You were right, thank you for the feedback. The test step name was misspelled.

     

     I corrected and adapted the script to the following format:

     

    import groovy.util.XmlSlurper;
    
    def response = messageExchange.response.responseContent;
    def slurper = new XmlSlurper();
    def xmldata = slurper.parseText response;

     

    However, I hit a different issue:

    The public identifier must begin with either a single or double quote character.

     

    Should the content be parsed in a different manner?

    • PaulMS's avatar
      PaulMS
      Super Contributor

      There shouldn't be any problem with that groovy script but maybe with the response.

       

      Is that the full response message above?

  • fa's avatar
    fa
    Occasional Contributor

    As a matter of fact, this is the displayed content via XML view.

     

    However, I also provide the content of the Raw view (below):

    HTTP/1.1 201 Created
    Date: Wed, 14 Feb 2018 12:49:42 GMT
    Server: Apache-Coyote/1.1
    X-Frame-Options: SAMEORIGIN
    Cache-Control: no-cache, no-store, max-age=0, must-revalidate
    Pragma: no-cache
    Expires: 0
    X-Content-Type-Options: nosniff
    X-Frame-Options: DENY
    X-XSS-Protection: 1; mode=block
    Location: URI
    Content-Type: text/html;charset=UTF-8
    Content-Length: 383
    Connection: close
    
    <!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\"><html><head><title>201 Created</title></head><body><h1>TGT Created</h1><form action="URI" method="POST">Service:<input type="text" name="service" value=""><br><input type="submit" value="Submit"></form></body></html>

    Do you believe that any escaping may be involved?

  • fa's avatar
    fa
    Occasional Contributor

    Hello,

     

    Indeed, you have a good point in terms of the raw content.

    There are some parsing issues.

     

    The problem is that the URI is based on dynamic data which I should store at each call of this step (for further processing), so it is not possible to just define a text that I will only get the (static) value.

     

    Any ideas how to approach this?

  • JHunt's avatar
    JHunt
    Community Hero

    Paul, it doesn't work.

     

    Here's how getResponseContentAsXml() is implemented:

     

     

        public String getResponseContentAsXml() {
            if (hasResponse() && XmlUtils.seemsToBeXml(getResponseContent())) {
                return getResponseContent();
            } else {
                return null;
            }
        }

     

    def response = messageExchange.responseContentAsXml;
    def slurper = new XmlSlurper();
    def failure
    try {
        def xmldata = slurper.parseText response
    } catch (e) { failure = e.message }
    assert failure == 'The element type "input" must be terminated by the matching end-tag "</input>".'

     

     

    • PaulMS's avatar
      PaulMS
      Super Contributor

      It worked with the raw response above.

       

      What result do you see in the log?

      log.info messageExchange.responseContentAsXml

       

  • JHunt's avatar
    JHunt
    Community Hero

    Hi Paul,

     

    fa has posted two different HTMLs... In message 1, this one does work, even with the earlier answers. The input tags are closed (empty).

     

    But the one in message 5 is what they're actually getting in the raw response. The input tags are opened but not closed.

    <html><head><title>201 Created</title></head><body><h1>TGT Created</h1><form action="URI" method="POST">Service:<input type="text" name="service" value=""><br><input type="submit" value="Submit"></form></body></html>

    So when I  put that onto a mockserver and run as per your answer it doesn't work.