UNIX and Linux are the predominant platforms for serving web applications. According to data from 67% of the top one million web sites are served by either Linux or FreeBSD. Above the OS level, open-source web server software commands more than 80% of the market.

At scale, web applications do not run on a single system. Instead, a collection of software components distributed through a meshwork of systems cooperate to answer requests as quickly and as flexibly as possible. Each piece of this architecture must be resilient to server failures, load spikes, network partitions,
and targeted attacks.

Cloud infrastructure helps address these needs. Its ability to provision capacity quickly in response to demand is an ideal match for the sudden and sometimes unexpected tidal waves of users that materialize on the web. In addition, cloud
providers’ add-on services include a variety of convenient recipes that meet common requirements, greatly simplifying the design, deployment, and operation of web systems.


HTTP is the core network protocol for communication on the web. Lurking beneath a deceptively simple facade of stateless requests and responses lie layers of refinements that bring both flexibility and complexity. A well-rounded understanding of HTTP is a core competency for all system administrators. In its simplest form, HTTP is a client/server, one-request/one-response protocol. Clients, also called user agents, submit requests for resources to an HTTP server. Servers receive incoming requests and process them by retrieving files from local disks, resubmitting them to other servers, querying databases, or performing any number of other possible computations. A typical page view on the web entails dozens or hundreds of such exchanges. As with most Internet protocols, HTTP has adapted over time, albeit slowly. The centrality of the protocol to the modern Internet makes updates a high-stakes proposition. Official revisions are a slog of committee meetings, mailing list negotiations, public review periods, and maneuvering by stakeholders with vested and conflicting interests. During the long gaps between official revisions documented in RFCs, unofficial protocol extensions are born from necessity, become ubiquitous, and are eventually included as features in the next specification.

HTTP versions 1.0 and 1.1 are sent over the wire in plain text. Adventurous administrators can interact with servers directly by running telnet or netcat. They can also observe and collect HTTP exchanges by using protocol-agnostic packet capture software such as tcpdump. See the page for general information about TLS.

The web is in the process of adopting HTTP/2, a major protocol revision that preserves compatibility with previous versions but introduces a variety of performance improvements. In an effort to promote the universal use of HTTPS (secure, encrypted HTTP) for the next generation of the web, major browsers such as Firefox and Chrome have elected to support HTTP/2 only over TLSencrypted connections.

HTTP/2 moves from plain text to binary format in an effort to simplify parsing and improve network efficiency. HTTP’s semantics remain the same, but because the transmitted data is no longer directly legible to humans, generic tools such as telnet are no longer useful. The handy h2i command-line utility, part of the Go language networking repository at, helps restore some interactivity and debuggability to HTTP/2 connections. Many HTTP-specific tools such as curl also support HTTP/2 natively.

Uniform Resource Locators (URLs)

A URL is an identifier that specifies how and where to access a resource. URLs are not HTTP-specific; they are used for other protocols as well. For example, mobile operating systems use URLs to facilitate communication among apps. You may sometimes see the acronyms URI (Uniform Resource Identifier) and URN (Uniform Resource Name) used as well. The exact distinctions and
taxonomic relationships among URLs, URIs, and URNs are vague and unimportant. Stick with “URL.” The general pattern for URLs is scheme:address, where scheme identifies the protocol or system being targeted and address is some string that’s meaningful
within that scheme. For example, the URL [email protected] encapsulates an email address. If it’s invoked as a link target on the web, most browsers will bring up a preaddressed window for sending mail. For the web, the relevant schemes are http and https. In the wild, you might also see the schemes ws (WebSockets), wss (WebSockets over TLS), ftp, ldap, and many others.

The address portion of a web URL allows quite a bit of interior structure. Here’s the overall pattern:

scheme://[username:p[email protected]]hostname[:port][/path][?query][#anchor]

All the elements are optional except scheme and hostname.

See page for more details about HTTP basic authentication.

The use of a username and password in the URL enables “HTTP basic authentication,” which is supported by most user agents and servers. In general, it’s a bad idea to embed passwords into URLs because URLs are apt to be logged, shared, bookmarked, visible in ps output, etc. User agents can get their credentials from a source other than the URL, and that is typically a better option. In a web browser, just leave the credentials out and let the browser prompt you for them separately. HTTP basic authentication is not self-securing, which means that the password is accessible to anyone who listens in on the transaction. Therefore, basic authentication should really only be used over secure HTTPS connections. The hostname can be a domain name or IP address as well as an actual hostname. The port is the TCP port number to connect to. The http and https schemes default to ports 80 and 443, respectively. The query section can include multiple parameters separated by ampersands. Each parameter is a key=value pair. For example, Adobe InDesign users may find the following URL eerily familiar:

As with passwords, sensitive data should never appear as a URL query parameter because URL paths are often logged as plain text. The alternative is to transmit parameters as part of the request body. (You can’t really control this in other people’s web software, but you can make sure your own site behaves properly.) The anchor component identifies a subtarget of a specific URL. For example, Wikipedia uses named anchors extensively as section headings, allowing specific parts of an entry to be linked to directly.
May 10, 1988 (Age: 31)


  1. 2

    Somebody Likes You

    Somebody out there liked one of your messages. Keep posting like that for more!
  2. 1

    First Message

    Post a message somewhere on the site to receive this.