07-14-2009, 09:19 AM
Hello,
i'm now fighting with html text extraction.
I've got several different type of data to extract. I tried hard with HTMLDoc class, with provided examples, but it's not enough.
NB: red line is wanted text
1)
<div class='text'>
<b>Covers</b><br/>
http://xxxyyyyzzzz.com/somefile.html <- i want that
</div>
2)
<a href="http://xxxyyyyzzzz.com/somefile.html" target="_blank">http://xxxyyyyzzzz.com/somefile.html</a></div>
3)
<div class="image">
<a href="http://xxxyyyyzzzz.com/somefile" target="_blank"><img src="http://xxxyyyyzzzz.com/somefile.jpeg"
4)
<a href="http://xxxyyyyzzzz.com/somefile" target="_top">Download</a><br>
5)
dd.d3.getElementById("lgpd").outerHTML : why d3, what is it.
6)where to find containerTag & containerNameOrIndex reference?
Long post but long time search :/
kind regards,
Laurent.
i'm now fighting with html text extraction.
I've got several different type of data to extract. I tried hard with HTMLDoc class, with provided examples, but it's not enough.
NB: red line is wanted text
1)
<div class='text'>
<b>Covers</b><br/>
http://xxxyyyyzzzz.com/somefile.html <- i want that
</div>
2)
<a href="http://xxxyyyyzzzz.com/somefile.html" target="_blank">http://xxxyyyyzzzz.com/somefile.html</a></div>
3)
<div class="image">
<a href="http://xxxyyyyzzzz.com/somefile" target="_blank"><img src="http://xxxyyyyzzzz.com/somefile.jpeg"
4)
<a href="http://xxxyyyyzzzz.com/somefile" target="_top">Download</a><br>
5)
dd.d3.getElementById("lgpd").outerHTML : why d3, what is it.
6)where to find containerTag & containerNameOrIndex reference?
Long post but long time search :/
kind regards,
Laurent.