In this project a web server will be constructed based on the implementation of TCP and the socket interface introduced earlier in the course. The web server will support browsing with authentication of user access. In addition, a dynamic web page will be implemented which may be updated in a web form.
Another interesting excersise in this project is the file system in an embedded system. The solution will be somewhat simplified due to the read-only memory available. In principle, the simple file system may be generated without knowledge about the details but a general understanding will help in the implementation of the dynamic web page.
There is a lot of string manipulation involved in this project at present. Anyone unfamiliar with the standard C-language string library will have to learn about the basic functions available.
There are a number of requirements for the solution in order to consider this project as finished
There is some administration in order to obtain the source code. Refer to the suggested design of the class hierarchy as well as advice on how to approach the completion of the skeleton code. An executable which may be loaded into the ETRAX unit is compiled, linked, and loaded in the same manner as in the previous overview of the system. Finally, you will have to test your solution.
HTTP Made Really Easy, which is a practical guide to writing web clients and servers without previous experience.
The application-level protocol is described in RFC1945 covering the Hypertext Transfer Protocol HTTP/1.0. At least the introduction should be read by those unfamiliar with the HTTP protocol.
Read about the world wide web consortium and use the home page in search for further references.
The directory ~inin/kurs/www contains a sample web structure which may be used as provided. Should you want to create your own web, consult the requirements in order to finish the project. The directory structure of the sample web is
/index.htm
/dynamic/dynamic.htm
/pict/aum.gif
/pict/big.jpg
/pict/small1.gif
/pict/small2.gif
/pict/small3.gif
/pict/small4.gif
/pict/small5.gif
/private/private.htm
where three different subdirectories are present.
The files dynamic.htm and private.htm are created dynamically by the server and can not be stored in the read-only memory of the server. Instead, they must be handled separately by the file system. The file private.htm contains a form which allows dynamical editing of the content of the file dynamic.htm. The form requires authentication prior to use. When text is entered into the form and submitted, the file dynamic.htm changes and the changes should also be reflected in the form the next time text is entered in it.
The simple file system is based on unpacked LHarc archives and is implemented as a singleton class with two methods. The method readFile will need modifications in order to allow dynamical retrieval and the method writeFile must be implemented in order to allow storage. The read-only memory of the server introduces a difficulty which may be solved by a linked list which holds the changes in the file system. A request for the file dynamic.htm in the readFile method then searches the list first before the actual file system is consulted.
Most of the implementation in the class SimpleApplication in the previous project may be reused when the class HTTPServer is defined and implemented. The method TCP::connectionEstablished will require modifications in order to start the new thread. In addition the previous usage of the telnet ECHO port number might require changes into support of the WWW port number 80.
One important difference in the HTTPServer compared with the SimpleApplication is a request buffer. Mostly, a number of segments must be received since HTTP requests are often transmitted in more than one TCP segment.
Different parts of a response may be sent in more than one segment. For example,
mySocket->Write(aResponseLine);
mySocket->Write(aHeaderLine);
mySocket->Write(aFile);
mySocket->Close();
All requests are terminated with an empty line, <CRLF>. The token <CRLF> means ASCII 0x0D, carriage return, followed by 0x0A , line feed. A complete GET request has been received when two <CRLF> have been found sequentially.
GET / HTTP/1.0<CRLF>
From: user@domain.com<CRLF>
User-Agent: HTTPTool/1.0<CRLF>
<CRLF>
The response of the server will be a transfer of the resource if it is available
HTTP/1.0 200 OK<CRLF>
Content-Type: text/html<CRLF>
<CRLF>
<html><title>Home page</title><body>.....
The HEAD request is very similar to the GET but instead of returning the actual resource, the server returns response headers only which may be useful in order to check characteristics of a resource without actually downloading it.
A POST request is more complicated since data will be sent after the <CRLF><CRLF> of the request. However, one of the header lines will contain the token Content-Length: followed by the number of data items in the request. It is the responsibility of the application to buffer the received data until the right number of byte have arrived. Often, data are transmitted in a number of segments after the receival of the request header.
POST /path/script.cgi HTTP/1.0<CR><LF>
From: user@domain.com<CR><LF>
User-Agent: HTTPTool/1.0<CR><LF>
Content-Type: application/x-www-form-urlencoded<CR><LF>
Content-Length: 32<CR><LF>
<CR><LF>
home=Cosby&favorite+flavor=flies
A method called HTTPServer::contentLength is provided in the file http.cc. It finds the token Content-Length: and converts the string representation of the integer into an integer of type udword.
Below, a few examples of web-transactions are presented. Remember that a new connection is established in every new transaction and that it is the responsibility of the server to terminate the connection when the transaction is finished. Required parameters are denoted by <...> whereas optional parameters are shown within [...].
The client establishes a new connection to port 80 on the server and requests the first page by sending
GET / HTTP/1.0<CRLF>
[Header line 1]<CRLF>
[Header line 2]<CRLF>
<CRLF>
The server answers the client with
HTTP/1.0 200 OK<CRLF>
Content-Type: text/html<CRLF>
<CRLF>
<html><title>Home page</title><body>.....
The server closes the connection.
The client establishes a new connection to port 80 on
the server and requests a private page by sending
GET /private/private.htm HTTP/1.0<CRLF>
[Header line 1]<CRLF>
[Header line 2]<CRLF>
<CRLF>
The server answers the client with
HTTP/1.0 401 Unauthorized<CRLF>
Content-Type: text/html<CRLF>
WWW-Authenticate: Basic realm="private"<CRLF>
<CRLF>
<html><head><title>401 Unauthorized</title></head><CRLF>
<body><h1>401 Unauthorized</h1></body></html>
The server closes the connection. At this point the web browser presents a dialog to the user in order to require a user name and a password.
The client establishes a second connection with the supplied user name and pasword
GET /private/private.htm HTTP/1.0<CRLF>
[Header line 1]<CRLF>
Authorization: Basic qWjfhjR124=<CRLF>
[Header line 3]<CRLF>
<CRLF>
The server answers the client with
HTTP/1.0 200 OK<CRLF>
Content-Type: text/html<CRLF>
<CRLF>
<html><title>Secret page</title><body>.....
The server closes the connection.
The implementation of the GET request requires a correct format of the path and file name. The root path / always translates into the path /index.htm.
Attempt to find the requested file in the file system. If it cannot be found by the server, the response
HTTP/1.0 404 Not found\r\n
Content-type: text/html\r\n
\r\n
<html><head><title>File not found</title></head>
<body><h1>404 Not found</h1></body></html>
must be sent. The characters \r\n are the C-language equivalent of the token <CRLF>, ASCII 0x0D, carriage return, followed by 0x0A , line feed. If the file is available, the response
HTTP/1.0 200 OK\r\n
Content-type: text/html\r\n
\r\n
<file content>
is sent. The content type header field is based
on the extension of the file
File extension | Content-type: |
.htm | text/html |
.gif | image/gif |
.jpg | image/jpeg |
and the token <file content> should be replaced with the actual content of the requested resource. A reply to a HEAD request would not send anything after the header lines and the blank line
HTTP/1.0 200 OK\r\n
Content-type: text/html\r\n
\r\n
Whenever a GET request is received by the server, a path that contains /private means authorisation control. The authorisation is performed in two steps.
Try to find the header field Authorization: Basic in the request. If it can not be found the response
HTTP/1.0 401 Unauthorized<CRLF>
Content-Type: text/html<CRLF>
WWW-Authenticate: Basic realm="private"<CRLF>
<CRLF>
<html><head><title>401 Unauthorized</title></head><CRLF>
<body><h1>401 Unauthorized</h1></body></html>
is sent back to the client. If the request contains the header field as in the example below,
GET /private/private.htm HTTP/1.0<CRLF>
From: user@domain.com<CRLF>
User-Agent: HTTPTool/1.0<CRLF>
Authorization: Basic qWjfhjR124=<CRLF>
<CRLF>
the characters qWjfhjR124= must be decoded with the provided method HTTPServer::decodeBase64 in the file http.cc. The result from the method is a string of the form user:password. Compare the string with a few invented users with passwords stored in the class HTTPServer and decide whether the resource should be sent.
When the user is authorised to require the resource, the file content is sent with the same response type as in a standard GET request. An unauthorised user must be sent another 401 Unauthorized response.
In order to implement the dynamic web page the file system must support storage of new files. The updated content of the file /dynamic/dynamic.htm is transferred to the server with a POST request when the form in the resource /private/private.htm is submitted. The file content in the POST request is URL encoded and must be decoded with the provided method HTTPServer::decodeForm before the file is updated. A typical POST request is shown below
POST /private/private.htm HTTP/1.0<CRLF>
[Header line 1]<CRLF>
Content-Length: 339<CRLF>
[Header line n]<CRLF>
<CRLF>
dynamic.htm=%0D%3Chtml%3E%0D%3Chead%3E%0D%3Ctitle%3EDynamic+page%3C%2Ftitl
e%3E%0D%3C%2Fhead%3E%0D%3Cbody+b....
A method called HTTPServer::contentLength is provided in the file http.cc. It finds the token Content-Length: and converts the string representation of the integer into an integer of type udword. Use the value of the field in order to determine whether the entire request is received by the server.
The string dynamic.htm= is removed in the method HTTPServer::decodeForm.
The response to a POST request is sent in the same manner as with GET requests
HTTP/1.0 200 OK\r\n
Content-type: text/html\r\n
\r\n
You can either include the content of the resource dynamic.htm or add a page which confirms the submission of the form and the update of the file
HTTP/1.0 200 OK\r\n
Content-type: text/html\r\n
\r\n
<html><head><title>Accepted</title></head>
<body><h1>The file dynamic.htm was updated successfully.</h1></body></html>
Unpacked LHarc archives are created with the command
lha ao5z1 <archive file> [file...]
on the server linus. The command a, add to archive, is complemented by the options
All files specified with [file...] will be added to an existing archive with new name <archive file>. If no previous archive exists with the name <archive file> a new file will be created.
The archive file is a binary file and may be translated into the C-language by the Perl-script command hex2char <archive file>. The result on stdout is redirected into a file called lhafile.bin which is included in the file fs.cc containing the file system class.
For example, the commands below create a file system of the sample web
cd kurs/www
lha ao5z1 ../src/lab6/wwwarc.lzh *
cd ../src/lab6
hex2char wwwarc.lzh > lhafile.bin
An absolute path in the file system must be
separated into the path and the file name. For example, the
absolute path /dynamic/dynamic.htm
is transformed into the path dynamic
and the file name dynamic.htm. The
integer -1, 0xff, is the delimiter and the termination token in
the path. If the full path is /,
the path is set to NULL. For all
other absolute paths the initial /
must be removed. For example,
Absolute path | Path | File name |
/pict/small.gif | pict<0xff> | small.gif |
/index.htm | NULL | index.htm |
An example of the transformation of the path name into something the file system will understand is presented below. The manual page for unfamiliar C-language functions may be read with the command man <function> on the computer <server name>. The method extractString is provided in the file http.cc.
The method findPathName expects a string like GET /private/private.htm HTTP/1.0
char*
HTTPServer::findPathName(char* str)
{
char* firstPos = strchr(str, ' '); // First space on line
firstPos++; // Pointer to first /
char* lastPos = strchr(firstPos, ' '); // Last space on line
char* thePath = 0; // Result path
if ((lastPos - firstPos) == 1)
{
// Is / only
thePath = 0; // Return NULL
}
else
{
// Is an absolute path. Skip first /.
thePath = extractString((char*)(firstPos+1),
lastPos-firstPos);
if ((lastPos = strrchr(thePath, '/')) != 0)
{
// Found a path. Insert -1 as terminator.
*lastPos = '\xff';
*(lastPos+1) = '\0';
while ((firstPos = strchr(thePath, '/')) != 0)
{
// Insert -1 as separator.
*firstPos = '\xff';
}
}
else
{
// Is /index.html
delete thePath; thePath = 0; // Return NULL
}
}
return thePath;
}
A string constant, or string literal, is a sequence of zero or more characters surrounded by double quotes, as in
"I am a string"
or
"" /* The empty string */
The quotes are not part of the string. Technically, a string constant is an array of characters. The internal representation of a string has a null character '\0' at the end. Thus, the physical storage required is one more than the number of characters written between the quotes. It is essential to be consistent with the terminating null character.
The standard string library is described in most books on the C-language and is also described in the man pages of UNIX systems. On linus, e.g.
man strcpy
provides all the information relevant to string copying. The most common string functions are
char* strcpy(s, ct);
// Copy string ct to string s, including '\0', return string s.char* strncpy(s, ct, n);
// Copy at most n characters of string ct to s, pad with '\0'
// if ct has fewer than n characters, return string s.char* strcat(s, ct);
// Concatenate string ct to end of string s, return string s.int strcmp(cs, ct);
// Compare string cs to string ct, return <0 if cs<ct, 0 if cs==ct, or >0 if cs>ct.int strncmp(cs, ct, n);
// Compare at most n characters of string cs to string ct,
// return <0 if cs<ct, 0 if cs==ct, or >0 if cs>ct.char* strchr(char* cs, char c);
// Return pointer to first occurrence of c in cs or NULL if not presentchar* strrchr (char* cs, char c);
// Return pointer to last occurrence of c in cs or NULL if not presentchar* strstr(char* cs, char* ct);
// Return pointer to first occurrence of string ct in cs, or NULL if not present
Make sure your present working directory is ~/kurs/src. Remove the subfolder lab6 if it exists in ~/kurs/src. Then, copy all files from your solution in project 5 with the command
cp -r lab5 lab6
and change your present working directory into ~/kurs/src/lab6. This project will be an extension of your solution in project 5. Add the skeleton of project 6 to your previous files with the command
cp -r ~inin/kurs/src/lab6/* .
There should be three new files in addition to the ones from project 5 in the directory lab6,
The class HTTPServer is not declared in a header file. The file http.hh must be created and defined by you. Make sure that your present working directory is ~/kurs/make and type the commands:
Use the browser of your preference and connect to your web server by specifying the IP address only. Analyse the network traffic with the network monitor in order to learn more about the HTTP protocol. It might be very instructive to study commercially available web servers in the local network.
With the experience from previous projects you should be able to verify the requirements of the solution on your own.
The browser uses a cache in order to store web pages. Use the reload functionality when you get unexpected results. Occasionally, the browser stores states of failure as well and the computer will have to be restarted in order to browse the pages of your web. The authentication is only performed once when the private page is referred to the first time. The browser has to be restarted in order to perform the authentication of user name and password again.