Dynamic Pages

The earliest web servers were primarily used to serve files local to the web server to web clients across the internet. Of course, these files were available through hyperlinks from html documents the researchers already knew of, and one of the more important kinds of these quickly became indices - html documents which listed where other html documents could be found.

While human-created (and curated) index pages were certainly valuable, they were also time-consuming to create. It is hardly surprising then, that some of the first dynamically created (i.e. created algorithmically by the web server) web pages were index pages (in fact, Apache and most other web servers in use today will automatically create an index page listing all the files it serves unless the user creates an index.html file for it to serve).

We’re going to create a page that fulfills a similar role today - a webserver that creates a gallery of images by reading a directory’s contents.

Accessing the File System Through Node

Node provides the fs module for accessing the file system based on posix (portable operating system interface) commands (the same as in Unix, Linux, and OS X). We’ve already used its readFile() and readFileAsync() method, but this is only one of a tremendous range of functionality - a full API listing is found here: https://nodejs.org/api/fs.html

One important note is that in Node we typically use asynchronous operations when interacting with the file system. This ensures that our servers continue to run and handle new requests even if the file system is busy (remember, only a single file can be read from the hard drive at a time!). Asynchronous functions will take some getting used to, but they are key in creating Node web servers that can scale well.

GOTCHA: While Posix has long been the basis of file system operations across many operating systems, this has not been the case in Windows. Occasionally, fs commands work slightly differently, perform poorly, or are unavailable in Windows - the documentation usually calls these issues out, so read it carefully.

Listing the Contents of a Directory

If we’re going to algorithmically serve images stored on a server, one of the first steps we need to take is to identify what files exist there. In most cases, we’ll want to isolate the files we intend to serve in a separate directory - in our case, let’s use a subdirectory of the working directory, images. We can use fs.readdir() to retrieve an array of all the items in the directory:

var fs = require(‘fs’); fs.readdir(‘images’, function(err, items) { if(err) { console.error(err); return; } console.log(items); });

The readdir() method follows the same pattern we’ve seen in other asynchronous Node functions - it passes a (possible) error and the result of the request to a callback. In this case, the result is an array of directory items, which we are then writing to the console.

GOTCHA: Not all of the items in a directory are going to be files in the traditional sense - Posix filesystems also use symbolic links - a kind of reference that points to a file in another location (similar to shortcuts in Windows), and occasionally the special symbols “.” and “..” (meaning this directory and up one directory) will show up (they are supposed to be filtered by readdir() but they have occasionally shown up in certain builds of Node).

Filtering Directory Contents

To get around this issue, we can filter the array of item names to hold onto only those which are files. The JavaScript Array object has a filter() method for doing just this. It takes a filter function for its argument and returns a new array with only the items for which the filter function returned true. The filter function takes a single argument - the item from the array to consider. We can combine this with the fs.statAsync() function which returns a fs.Stats object, which has a method, isFile() that returns true if the item is actually a file:

var fs = require(‘fs’); fs.readdir(‘images’, function(err, items) { if(err) {console.error(err); return; } var filenames = items.filter( function(item) { return item.statSync(‘images/’ + item).isFile(); }); console.log(filenames); });

This reads the directory contents, filters the contents for files, and then prints only those files to the console. However, you may notice that we let a synchronous file operation slip into our code - fs.statSync(). If the filesystem is busy serving another request when this code is encountered, this code will block until it can finish. Making the process asynchronous is possible, but requires a shift in strategy, and, more importantly, how we think about programming.

Programming as Data Transformation

The functional programming paradigm encourages us to think of programming as a process of transforming input into output. This certainly holds true for the role a web server plays. Remember the request/response cycle we spoke of earlier?

request-response-pattern.png

While the process of retrieving a web pages is cyclic, because http is stateless, to the web server itself, it is a simple, linear, transformational process, i.e.:

REQUEST => web server algorithms => RESPONSE

The request and all its associated data comes into our web server (in our earlier examples, through the handleRequest() method, which returns a response and all its associated data. The handleRequest method could pass the responsibility of creating the response further down the chain, through a series of functions, each representing a transformation of the original request, i.e.:

REQUEST => handleRequest() => enumerateImageFiles() => buildGallery() => RESPONSE

This also gives us a nice structure for architecting an increasingly complex program. Let’s see how an enumerateImageFiles() function could work asynchronously:

var fs = require(‘fs’); function handleError(req, res, err) { console.err(err, req, res); res.writeHead(500, {‘content-type’: ‘text/html’}); res.end(‘Server Error’); } function enumerateImageFiles(req, res) { // Get all files from our image directory fs.readdir(‘images’, function(err, items) { if(err) {handleError(req, res, err); return;} var toProcess = items.length; var files = []; items.each( function(item) { item.stats( function(err, stats) { if(err) console.error(err); else if(stats.isFile()) files.push(item); toProcess--; // have we processed the last item? if(toProcess == 0) buildGallery(req, res, files); } } }); }

The toProcess counter serves to track how many of the asynchronous fs.stats calls have to finish. In their callback, we first push the item into the files array - but only if it is a file. Then we decrement toProcess. If that value is 0, we’re ready to push our now-transformed data (the filtered file names) down the pipeline, to the buildGallery() function. Also, note that if we encounter an error in reading the directory, we never call buildGallery() and we send the response at that point - short-circuiting the transformation pipeline.

GOTCHA: If you’ve learned multithreaded programming using a different language, our use of the toProcess and files in the callbacks probably raised some red flags for you - and with good reason. If each of these operations was happening in parallel, and potentially writing to toProcess at the same time, we could end up in a state where we leave toProcess at one, and have no more callbacks to trigger. We’d need to protect these shared variables with mutexes or some other locking strategy (or use thread-safe containers).

None of these options are available to us in Node - those structures don’t exist, because they aren’t needed. Remember the event loop we discussed previously? In Node, it is single-threaded; only one part of the code is ever executing at a time. Built-in asynchronous functions like fs.stat() do spin off processes in a separate thread, but they don’t simply “pick up” execution in the callback when they finish - rather, they add an event to the event queue (which is thread-safe), and when Node returns to the event loop and finds the event, then it triggers the callback. In other words, the callbacks only ever run one at a time, though they may do so in any order (there are other design patterns we can use to enforce order as well).

Now, to build the gallery page itself. We’ve already suggested a function form to handle this - buildGallery(), which takes a request, response, and array of filenames as arguments. From this information we need to build a valid HTML5 document - which is effectively a string. We can do so with string concatenation. However, before we do, we also want to transform our file names into img tags. We’ll use another array function for this - Array.map. This function takes a function as an argument, applies it to each member of the array, and returns a new array consisting of all the transformed elements. Then, we can use join to concatenate these img tag strings into a single string, and insert it into the document we’re building as a response.

function buildGallery(req, res, images) { // Wrap file with img tag images.map( function(file) { return "<img src='/images/" + file + "'/>"; }).join("\n"); // Show a gallery page res.writeHead(200, {'content-type': 'text/html'}); res.end( '<!doctype html>\n' + '<html>\n' + ' <head>\n' + ' <title>Photo Gallery</title>\n' + ' </head>\n' + ' <body>\n' + ' <h1>Photo Gallery</h1>\n' + imageTags + ' </body>\n' + '</html>\n' ); }

The last step is our handleRequest() function, which starts our request down this pipeline:

function handleRequest(req, res) { enumerateImageFiles(req, res); }

And creating our webserver:

var http = require(“http”); new http.Server(handleRequest).listen(80);

Now, if we start our server:

>node server.js

And visit http://localhost, we’ll see our gallery page:

But there’s an obvious problem - our images aren’t showing up. If we open chrome’s developer tools and visit the network tab, we’ll see the reason:

The images are being served as html files! In fact, if we were to look at what they contain, it’s the same html that we are serving as our gallery. Remember, that img tags have a src attribute, and request the src from the associated host - in our case, our image tags don’t specify a host, so the browser assumes it is the same host as the original page came from, so it requests it from our web server... which only knows to serve our gallery page.

Distinguishing Between Resources

We can get around this issue by determining what resource the user is asking for, and serving it appropriately. If they just type localhost, the resource is empty (“/”), however, for the images it will be something like (“/images/city.jpg”), but the actual value will depend on what images the buildGallery() function found in our directory. But they should always begin with an “/image”, so we can use that knowledge to distinguish between the two. And, if it doesn’t match either pattern, we should send a 404 error.

function handleRequest(req, res) { if(req.url == “/”) buildGallery(req, res); else if(req.substring(0,8) == “/images/”) sendImage(req, res); else { res.writeHead(404, {“content-type”: “text/html”}); res.end(“not found”); } }

Now we just need our sendImage() function. We could blindly trust that the resource requested exists on our server - but remember the user can type in any resource they want to the address bar, not just those that show up in our gallery. And, if the resource no longer exists - say it was deleted, we could crash our program. So we’ll also use fs.fileExists() to make sure it is actually there. And, we’ll also set the content-type to image, extracting the extension with String.split().

function serveImage(req, res) { var filename = "./" + req.url; fs.stat(filename, function(err, stats) { if(err) {handleError(req, res, err); return;} if(stats.isFile()){ fs.readFile(filename, function(err, file) { if(err) { handleError(req, res, err); return;} res.writeHead(200, {'content-type': 'text/' + filename.split('.').last }); res.end(file); }); } }); }

Now when we visit http://localhost the gallery includes images.