Posts: 2
Threads: 1
Joined: May 2006
I'm trying to search through HTML using findrx and pulling out a numeric string within quotes. Obviosuly, I can't include the double quote character directly into the search pattern, so the code I'd LIKE to use is below, though it isn't working. The \042 = the double quote octal code.
Any ideas?
Htm el=htm("BODY" "" "" " Internet Explorer" "" 0 0x20)
findrx(el.HTML "value=\042(\d+)\042" 0 4 tmp)
Posts: 175
Threads: 43
Joined: Jun 2004
I tested your regex on other software. It works. Here's some things to try.
- Make sure "tmp" is an array or flag 4 will not work.
Index 0 of "tmp" contains the entire match, index 1 will contain your value.
Check your input.
Use this regex to account for possible spaces "value\s?=\s?\042(\d+)\042".
Matt B
Posts: 2
Threads: 1
Joined: May 2006
Well, after struggling, I dumped out the contents of el.HTML and saw that it did not match what I got when I went to the HTML source through IE. It rearranged the attributes and removed the quotes, which is why my pattern wouldn't match. After that, I was able to code up something that worked.
My assumption is that the IE DOM doesn't return the exact source that was loaded to create the page?
Posts: 12,140
Threads: 142
Joined: Dec 2002
Quote:My assumption is that the IE DOM doesn't return the exact source that was loaded to create the page?
Yes, even if you use el.DocText(1).
Quote:I can't include the double quote character directly into the search pattern
You can:
findrx(el.HTML "value=''(\d+)''" 0 4 tmp)