Web 3.0 reference manual

Introduction

Web3.0 consits of a web3 file format, tools to convert html and related data to this format and back to native format and tools to process web3 file format.

All the above functions are designed to be used from shell, sometimes with an option from using them from awk, so they are implemented as shell functions or awk functions in files which can be included from shell or awk. Shll means POSIX shell (compatible with /bin/sh, bash, ksh, zsh, etc.) Including such a shell lib can be done with the source command (a single dot in POSIX in shell). Awk scripts should run on any awk implementation (tested with mawk and gawk). Awk libs can be included using -f or the provided web3 awk wrapper. Most awk functions have a shell wrapper so they are directly usable from shell. As a third option, a shell script (CLI wrapper) is provided that can call any of the functions directly form the command line - or from a non-POSIX shell or any other language.

The preferred way of developing web3 applications:

Using web3

Most common web3 applications will implement the following steps in a single pipeline:
  1. web3_wget to download a page
  2. html_to_web3 to convert it to web3 format
  3. web3_* calls to extract html nodes containing target information
  4. custom script to reformat the result

Some pages require login before operation. Web3 has strong form support: after the 3rd step of the above procedure, the login form can be piped to web3_form_extract which will extract raw form data and output it in another web3 file. Replacing value fields in that web3 file is trivial with tools like sed. The result then can be fed to web3_form_to_post which will output a http post request that web3_wget can submit. The output of this second web3_wget is the page the user gets after login.

A similar method can be used in a loop to retrieve multiple pages of search results of search engines of web shops, after extracting the "next" link from each page.