Web 3.0

What is web 3.0

Web 3.0 is a set of shell and awk libs (scripts designed for reuse) that makes programming wget | process | display | wget... loops more efficient. Goal is to provide a CLI interface for interactive web pages. For more information refer to the reference manual .

Why do we need this?

in most cases such a simple CLI is much more efficient to use than web 2.0 GUI - no, javascript/ajax doesn't help, it even made this aspect worse
web pages traditionally look and work like how the designer designed it, not how it is comfortable/efficient for the user (which varies from user to user)
automation - you can check new content or create content on web from a "crontab" or "at" or irc bot
working around broken services - if your favorite web shop offers a very limited search facility you can just download the full list and use grep on it

Rationale

Web 1.0, also known as simply "web" or "world wide web", was a relatively simple technology built around the idea that some people want to publish documents and others want to download and display those documents. The two main components, http and html served exactly those purposes. They even provided some interactivity using web forms, but still, transactions were simple: normally one connection per request, one document per request. In the early age with low bandwidth, most authors optimized their pages for content.

After some time pages become more fancy. So called "web designers" appeared, they broke up content into tables and stuffed pages with a lot of tiny pictures forming frames and other design elements. Interactivity of the forms was not enough anymore and sites started to employ javascript heavily, to run different sort of programs client side. Simple, content serving sites became more and more rare in the age of web 2.0. Things became "ajaxy", sometimes even a simple "user/password" login form requires javascript and downloading of several 10k of "web design".

Web 3.0 is a new interface for the same sites: one that is optimized for content. As most of the content we are talking about are made of letters, this interface is CLI, not GUI. It takes simple commands, uses the ugly web2.0 interface instead of the user and returns with simple, content-only results.

I have been doing this for ages. Ever since web2.0 started to slow me down first. There are great tools out there, wget for handling all the http level, awk/grep/sed for processing the result. Unfortunately this is not entirely true: processing the result is PITA, especially that part when you finally got a long, crypting script that extracts the tiny bit of information you need from the ocean of senseless web design and the other day the web designer changes some divs/tables and your script fails. In such case it was sometimes easier to rewrite the script from scratch...

After a while I got enough of this. I had an idea for an ultimate solution but I am not good enough to implement that. Instead, I impemented a set of shell and awk libs called web3.0 that can easy processing all the html.