Web 3.0 - the real solution
The real solution would be to implement a headless browser that renders
the page into memory exactly, pixel by pixel, as the current version of
whatever-popular-browser does. This last part is important because
those "web designers" use those browsers as reference. After the render
is produced, a script would be interpreter emulating user input and extracting
data. The idea is that page internals may change while the generic
look and feel remains the same, so describing things like "find a table
that has 'product' in its header, then find column 'price' and row
'tea' and print the value in their crosssection" instead of trying
to parse the html itself. We need in-memory rendering for this, since
operators "near to", "above", "to the left" would be common.