Ask a Question

Extracting Image File Path from HTML Tag - Regular Expressions

SOLVED
SuperTester
Contributor

Extracting Image File Path from HTML Tag - Regular Expressions

Hello,

 

I am trying to come up with a regular expression that will match an image file path that is contained within a HTML tag.

 

<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>

 

Now, I have a regular expression that will match the file path, but not when the file path is contained in the HTML Tag string.

RegEx:  /^(?:[\w]\:|\\)(.*png$)/gim

 

File Path: C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png

 

I think theres two problems with my regular expression. 1) My regular expression needs to start with  "C:", if other characters are included before "C:", than it won't recognize "C:" as a "begin with". 2) The file path is contained within quotes. I'm not taking this into account, but I'm also not sure if this would throw off the regular expression.

 

Any insight would be appreciated!

 

Notes

- I'm using Reg101 to develop my regular expressions. https://regex101.com/

- The numbers "07165ac8_2b0c_44e2_a4d8_1deabe5fb73e" within the string are randomly generated and will change every test execution while the sub-string "ABC_Image_DE" will remain the same.

- Scripting in javascript

3 REPLIES 3
BenoitB
Community Hero

I like regex but sometimes simple text parsing is good too.

 

function getImagePath(HtmlTag, Extension = ".png") {
  let posStart = 8 + aqString.Find(HtmlTag, 'file:///', 0, false);                      // +8 to exclude 'file:///'
  let posEnd   = Extension.length + aqString.Find(HtmlTag, Extension, posStart, false); // To include the extension
  return HtmlTag.substr(posStart, posEnd-posStart);
}  

function testIt() {
  Log.Message(getImagePath('<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>'));
}

 

Un sourire et ça repart

Hey!

 

Thank you very much for the solution! I appreciate the insight of changing from regex to parsing.

 

One thing thats not quite right is that the result string is missing backslashes:

C:UsersUserNameAppDataLocalTempABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png

 

I noticed during debugging that the HTML tag string also missing backslashes during test execution (see screen shot attached. Is this caused by the backslash being a escape character?

 

Thanks again!

Yep, i've made a big mistake. As soon it's assigned it become a processed litteral and the \, as it is an escape char, doesn't exist anymore.

No easy solution yet.

 

You can try with the String.raw but the problem is the assignement of the value...

https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/String/raw

 

 

 

function getImagePath(HtmlTag, Extension = ".png") {
  let posStart = 8 + aqString.Find(HtmlTag, 'file:///', 0, false);                      // +8 to exclude 'file:///'
  let posEnd   = Extension.length + aqString.Find(HtmlTag, Extension, posStart, false); // To include the extension
  return HtmlTag.substr(posStart, posEnd-posStart);
}  

function testIt() {
  const htmlTag = String.raw`<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>`;
  Log.Message(getImagePath(`${htmlTag}`));
}

 

 

 

In regex or string manipulations the problem exist as long as the \ is an escap char.

 

Un sourire et ça repart

cancel
Showing results for 
Search instead for 
Did you mean: