Subject: Email-Based Web Access: Project Design Date: Thu, 24 Sep 1998 05:15:41 +0800 From: Robert Van Buskirk Organization: Calmar Online To: Eritrea Internet Development Group , robert@calmaronline.com Dear Eritrea Internet folks: Over the past few days we have all come to agreement, that we should get started on developing full Internet access using the current dial-up email system. The way we will do this is we will design and implement what I call an Email-based proxy web service. For this service, there will be a set of servers in Eritrea which store popular web pages. Then when anyone wants a web page in Eritrea, the server will get it via email and provide it for Eritrean users a few hours later. It is not real Internet, but it is a start and will ease the transition to full Internet access in Eritrea. I would now like to propose a project design for our Email mediated Internet Access project. I think the steps to follow are as follows: (1) Define services and functions for Internet Access system. (2) Formulate Software Design (3) Distribute/delegate software development work (4) Integrate, Test, and Implement new Service systems. --------------- Now that is an asbract formulation of what needs to be done. I now try to define some of the specifics of each of these steps. DEFINING SERVICES AND FUNCTIONS There are several services and functions that should be provided by this new email-based internet access system. They include: A) Direct User web access i) fetching text Web pages ii) fetching HTML web pages iii) fetching images iv) fetching the results of form-posts v) fetching FTP downloads vi) getting size or volume information on images and downloads vii) providing documentation or help For direct user access, we essentially develop a program or series of Perl scripts which check incoming email for a series for commands (such as GETTEXT, GETHTML, GETFTP, GETIMG, etc). The program then performs the desired information retrieval service and emails the results to the sender. Note that if we have a good set of email-based information retrieval tools running on a server or two on the Internet, then the people in Eritrea can program applications which will automatically get certain types of information and post them on a local web server or email them to a list of local users. B) Discussion List Server Tools The discussion list server tools will be a series of scripts and programs that will allow Linux server administrators to set up a discussion group web site. This web site will receive discussion list email (like dehai and dehai news) and organize it into a user-friendly web site. This project has been proposed by Daniel, and it should be possible to implement it in simple form in a couple of weeks. C) An Email-based Web Proxy Server This tool will enable a server in Eritrea to keep copies of requested web pages and maintain a proxy web site. It includes a web page request form where users can request or bring new web pages to the server for their viewing. The server can also maintain a primitive small search facility, and it can periodically fetch updated copies of popular web sites. For example, the server can fetch every day, a copy of a news summary page and post that in a news directory of the web tree. The server could then provide a local Eritrea internet news service. We will have to concentrate on how to make this web proxy-service as efficient as possible, for example when we fetch images and other components of the page, we will need to have some criteria for sorting through the different page components and sending the smallest components of the page, while avoiding the big expensive images. By making it very efficient we will be able to provide the service to more people at lower cost. FORMULATING SOFTWARE DESIGN: In formulating the software design, I will provide a rough flow diagram for the email-based web proxy, since this the the most complicated component of the project, and it contains most of the other project components as operational subcomponents. FLOW DIAGRAM EMAIL-BASED WEB PROXY SERVICE -------------------------------------- |User connects to Web Server in Eritrea |Home page is CGI (Common Graphics Interface) |form that asks user for URL. Or asks user |for search of Web data downloaded to Eritrea --------------------------------------- | V ------------------------- |Checks format |-------->If invalid return home ------------------------- with error message | V ------------------------ |If valid check for page | |copy on local server |-------->If Local go to page with ------------------------ Update button added | V ------------------------------- |If not local or if update | |button is pushed go to page | |request form: | |Request Options: | |(1)Get default (i.e with only | | small images) | |(2)Crawl (i.e. get links too) | |(3)Get all images | ------------------------------- | V ---------------- |Form submitted |-----> User sent back to home page ---------------- | V --------------------------------------------------- |Email is sent to web-page fetching server in the | |U.S. or elsewhere on the Internet. The email | |body has the appropriate Majordomo-style web fetch| |commands | --------------------------------------------------- | | V --------------------------------------------------- |Email received at U.S. server, server fetches the |requested internet data as follows: |GETTEXT: Gets text version of page |GETHTML: Gets only html source of page |GETIMG: Gets image |GETHTML+IMG: Gets html source and linked images | below a default size |GETHTML+ALLIMG: Gets htmlsource an all images no | matter the size. |GETHTML+LINKS: Gets html source and all of the links | referenced in the source. |GETPOST: Retrieves result of form posting | POSTDATA: data used in a form posting. ------------------------------------------------------ | V ------------------------------------------- |access log and error log records | |are made for request, and data is sent in | |a standardized format to the requester | |and server | ------------------------------------------- | V ----------------------------------------------------- Mail received at Eritrea server. Local processing done, modifying retrieved source: 1) Links to previously retrieved pages converted to local links 2) Non-locally available links converted to form action for new request. 3) Files stored on server 4) User notified via email that requested web page has arrived. ------------------------------------------------------ Please comment on this software design, provide me with any comments and questions that you might have. DISTRIBUTING AND DELEGATING SOFTWARE DEVELOPMENT TASKS: Initially we have to agree on a platform and programming language. I propose that we do everything on Linux server using perl as the scripting language for all of our different tools and routines. I think Perl is the most powerful language for this. And it is relatively easy to learn. All of the Eritrea-side collaborators need to tell me if they have sufficient Perl programming references. If not, then we should find out a way to get some more Perl reference books to all of you. The next thing is we need to make sure that everyone has a Linux server configured at an HTTP server, and that everyone has the capability of browsing web pages on their server. Last, but not least, we need to make sure that everyone has their servers set up to support the Common Graphics Interface (CGI) for Perl. If not, contact me, and I will provide you with the necessary instructions and Perl routines. Once we have this foundation set, then we can start the software development work. In a follow-up email I will try to describe an initial excersize that everyone can try to implement to make sure that they have the foundation to proceed. Sincerely, Robert Van Buskirk Eritrea Technical Exchange