Analyzing Web Sites

In this chapter we will consider two very simple but useful Web agents that I have written on Actor Prolog for my own needs.

Example 1. Search for obsolete and erroneous hyperlinks on Web Site.

Let us consider the Holes7_d.A example in the Web\Holes directory. Studying this program you will discover that the Holes7_d.A source file contains nothing except comments. Point is that the program was created completely by the means of visual component-oriented programming. Therefore it contains only some SADT diagrams (see the Holes7_d.IDL file) that refer to classes in the MOROZOV component library that is supplied together with the examples. The Holes7_d.DLG file also contains no definitions of dialog boxes. All the definitions are loaded from the component library.

Let us start the Web agent.



Fig. 1.1. Start of the Web agent.

The Context diagram consists of three blocks: "Logic Programming Sites", "Edit", and "Check". This names correspond to the '_Logic_Programming_Sites', '_Edit', and '_Check' classes in the component library.



Fig. 1.2. A schema of agent work.

Click the left mouse button on the "Edit" block. A dialog box will open, where you can enter the address of Web Site to be checked. Some Web sites that are linked with the logic programming and Prolog are listed in the pulldown list (these addresses are transferred to the "Edit" block from the "Logic Programming Sites" block). The "Select" button opens a standard dialog box for file selection. You can select a file to be checked on your hard disk with the help of this feature. The "Process" button transfers selected address to the "Check" block for further processing.

Let us select the

http://techref.massmind.org/techref/language/prolog/index.htm

Web Site and press the "Process" button.



Fig. 1.3. Selection of Web Site to be checked.

Now press the "Check" block. The "Check" block will open a dialog box, where one can assign parameters of selected Web Site check. Selected address is indicated in the "Start point of analysis" edit control. The "Output mode" radio button group allows selecting the type of report: the agent can output the report to text window or into the file. The name of the file is to be assigned in the edit control corresponding to the "create file" radio button. Enter the maximal allowed time of waiting of response from the checked Site in the "Maximal waiting time" edit control. If the waiting time exceeds the given value, the agent considers corresponding Web page as temporary inaccessible and outputs an error message.

The "Web domain to be analyzed" edit control serves for restriction of the search space. The agent will not check references situated on a Web page if the address of this page does not contain the string given in the edit control under consideration. The "Search depth limit" checkbox switch on a restriction on the depth of the examination of the Web pages tree. One can assign the maximal depth of examination, a number, in the edit control corresponding to this checkbox.



Fig. 1.4. Input of check parameters.

The "Do check only one of the variants" edit control serves for optimization of search on the servers supporting automatic conversion of letters. For instance, the Web resources published at the server of our institute, at the www.cplire.ru/win, www.cplire.ru/koi, www.cplire.ru/mac, www.cplire.ru/alt, and www.cplire.ru/iso addresses are the same files that are automatically converted into different codings. One can indicate some possible variants of addressing of the same resource in the "Do check only one of the variants" edit control to inform the agent that it is enough to check only one of variants during the Web Site inspection.



Fig. 1.5. Input of check parameters.

The "Start analysis" button starts the check of given Web Site with the indicated parameters.



Fig. 1.6. Web Site check.

In the course of the check the agent creates a report. The report is printed to text window or into HTML file.



Fig. 1.7. The agent outputs the report to the text window.

One can view the report recorded in the file with the help of any Web browser. Press the "Show report file" button in the "Parameters of analysis" dialog box to view the file.



Fig. 1.8. Agent report recorded in the file.

During the first test of the agent we have discovered a lot of obsolete hyperlinks, bad addresses, and simply syntax errors on our own Web Site. Check your Web Site with the help of Actor Prolog right now.

Example 2. Search of keywords on given Web Site.

The Site_d.A agent in the Web\ScanSite directory helps me to investigate various Web resources. This program looks through a tree of hyperlinks and found pages containing keywords that are interesting for me. Very often it turns out that this program is very useful because many Web Sites have no own search engines or implement search very bad.

Let us start the agent.



Fig. 2.1. Start of the Web agent.

The Top-Most diagram consists of five blocks: "Logic Programming Keywords", "Edit Keywords", "Logic Programming Sites", "Edit", and "Scan Site".



Fig. 2.2. A schema of agent work.

The destination of the "Logic Programming Sites" and "Edit" blocks is the same as in the previous example. The "Edit Keywords" block is intended for editing of the keywords list (particularly, it can load a ready list from the "Logic Programming Keywords" block if necessary). The "Scan Site" block is the main one. It implements the search of given keywords on the indicated Web Site.

Press the "Edit Keywords" block and load a list of keywords in the open dialog box (use the "Load" button).



Fig. 2.3. Editing of keyword list.

Close the dialog box by the "O.K." button. Press the "Edit" block and indicate the Web Site to be analyzed.



Fig. 2.4. Selection of Web Site to be analyzed.

Close the dialog box by the "Process" button. Press the "Scan Site" block and enter parameters of analysis.



Fig. 2.5. Selection of analysis parameters.

All the controls in the dialog box have the same meaning as in the previous example. Press the "Start analysis" button to begin the analysis.



Fig. 2.6. Search for keywords on Web Site.

During the analysis the agent reports addresses of Web pages containing given keywords as well as the number of occurrence of every keyword.



Fig. 2.7. The results of the analysis.

You can also direct the output to HTML file:



Fig. 2.8. Selection of analysis parameters.

In this case after the end of the analysis the agent will create the file containing addresses of Web pages where given keywords were found. Press the "Show report file" button to view the report file.



Fig. 2.9. A view of the agent report by standard Web browser.

Note that both the examples considered in this chapter were assembled of ready components by the means of visual programming.

If you have created your own components and want to share them with other developers, we will have the pleasure to include your components into free distribution package of Actor Prolog.

Table of content