Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract div block from HTML using xPath or QM string functions
#1
<!DOCTYPE html>
Code:
Copy      Help
<html>
    <head>
        <title>Page Title</title>
    </head>
    <body>
        <div class="title"> <---------------------------------------------------- EXTRACT BEGIN
            <a href="https://www.download/book.pdf" target="_blank">
                Book title
            </a>
            <div class="js-subproduct-admin-edit" data-entity-kind="subproduct" data-machine-name="booktitle_1"></div>
        </div> <---------------------------------------------------- EXTRACT END
        </div class="some_other_block">
            test
        </div>            
    </body>
</html>

Is it possible using xPath in QM to extract div block:  <div class="title">  , see arrows above
Or can it be done using QM string/regex functions?
I have some attempts but they are ugly not efficient solutions.

I can do it using Acc functions of QM but I want to do it without using Acc, if possible.
I searched "xPath on html" -> https://www.google.com/search?client=fir...th+on+html
But attempting it QM, I get errors.
#2
Can use HtmlDoc class

Function Function35
Code:
Copy      Help
str html=
;<!DOCTYPE html>
;<html>
;;;;<head>
;;;;;;;;<title>Page Title</title>
;;;;</head>
;;;;<body>
;;;;;;;;<div class="title">
;;;;;;;;;;;;<a href="https://www.download/book.pdf" target="_blank">
;;;;;;;;;;;;;;;;Book title
;;;;;;;;;;;;</a>
;;;;;;;;;;;;<div class="js-subproduct-admin-edit" data-entity-kind="subproduct" data-machine-name="booktitle_1"></div>
;;;;;;;;</div>
;;;;;;;;</div class="some_other_block">
;;;;;;;;;;;;test
;;;;;;;;</div>
;;;;</body>
;</html>

HtmlDoc d.InitFromText(html)
ARRAY(MSHTML.IHTMLElement) div
int i
d.GetHtmlElements(div "div")
for i 0 div.len
,str cn=div[i].className
,if cn="title"
,,out "------------------InnerHtml------------------"
,,out div[i].innerHTML
,,out "------------------OuterHtml------------------"
,,out div[i].outerHTML
#3
Thanks!
#4
an example to use this without having browser open
this will extract the desired text from your 1st post here
Code:
Copy      Help
HtmlDoc doc doc2
doc.InitFromWeb("https://www.quickmacros.com/forum/showthread.php?tid=7213")
str s=doc.d3.getElementById("pid_35428").innerText
int i
out
doc2.InitFromText(s)
ARRAY(MSHTML.IHTMLElement) div div2
doc2.GetHtmlElements(div "div")
for i 0 div.len
,str cn=div[i].className
,if cn="title"
,,out "------------------InnerHtml------------------"
,,out div[i].innerHTML
,,out "------------------OuterHtml------------------"
,,out div[i].outerHTML
,,break
#5
Thank you!


Forum Jump:


Users browsing this thread: 1 Guest(s)