Extracting Image File Path from HTML Tag - Regular Expressions
SOLVED- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Extracting Image File Path from HTML Tag - Regular Expressions
Hello,
I am trying to come up with a regular expression that will match an image file path that is contained within a HTML tag.
<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>
Now, I have a regular expression that will match the file path, but not when the file path is contained in the HTML Tag string.
RegEx: /^(?:[\w]\:|\\)(.*png$)/gim
File Path: C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png
I think theres two problems with my regular expression. 1) My regular expression needs to start with "C:", if other characters are included before "C:", than it won't recognize "C:" as a "begin with". 2) The file path is contained within quotes. I'm not taking this into account, but I'm also not sure if this would throw off the regular expression.
Any insight would be appreciated!
Notes
- I'm using Reg101 to develop my regular expressions. https://regex101.com/
- The numbers "07165ac8_2b0c_44e2_a4d8_1deabe5fb73e" within the string are randomly generated and will change every test execution while the sub-string "ABC_Image_DE" will remain the same.
- Scripting in javascript
Solved! Go to Solution.
- Labels:
-
Scripting
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I like regex but sometimes simple text parsing is good too.
function getImagePath(HtmlTag, Extension = ".png") {
let posStart = 8 + aqString.Find(HtmlTag, 'file:///', 0, false); // +8 to exclude 'file:///'
let posEnd = Extension.length + aqString.Find(HtmlTag, Extension, posStart, false); // To include the extension
return HtmlTag.substr(posStart, posEnd-posStart);
}
function testIt() {
Log.Message(getImagePath('<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>'));
}
Un sourire et ça repart
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey!
Thank you very much for the solution! I appreciate the insight of changing from regex to parsing.
One thing thats not quite right is that the result string is missing backslashes:
C:UsersUserNameAppDataLocalTempABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png
I noticed during debugging that the HTML tag string also missing backslashes during test execution (see screen shot attached. Is this caused by the backslash being a escape character?
Thanks again!
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep, i've made a big mistake. As soon it's assigned it become a processed litteral and the \, as it is an escape char, doesn't exist anymore.
No easy solution yet.
You can try with the String.raw but the problem is the assignement of the value...
https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/String/raw
function getImagePath(HtmlTag, Extension = ".png") {
let posStart = 8 + aqString.Find(HtmlTag, 'file:///', 0, false); // +8 to exclude 'file:///'
let posEnd = Extension.length + aqString.Find(HtmlTag, Extension, posStart, false); // To include the extension
return HtmlTag.substr(posStart, posEnd-posStart);
}
function testIt() {
const htmlTag = String.raw`<p><img src="file:///C:\Users\UserName\AppData\Local\Temp\ABC_Image_DE_tp07165ac8_2b0c_44e2_a4d8_1deabe5fb73e.png" width = ""height = ""></p>`;
Log.Message(getImagePath(`${htmlTag}`));
}
In regex or string manipulations the problem exist as long as the \ is an escap char.
Un sourire et ça repart
