![[ Register ]](/images/navbar/register.gif)
![[ Applications ]](/images/navbar/applications.gif)
![[ Documentation ]](/images/navbar/documentation.gif)
![[ Distributions ]](/images/navbar/distributions.gif)
![[ Download Info ]](/images/navbar/download.gif)
![[ General Info ]](/images/navbar/geninfo.gif)
![[ Book Store ]](/images/navbar/bookstore.gif)

![[ Courses ]](/images/navbar/courses.gif)
![[ News ]](/images/navbar/news.gif)
![[ People ]](/images/navbar/people.gif)
![[ Hardware ]](/images/navbar/hardware.gif)
![[ Vendors ]](/images/navbar/vendors.gif)
![[ Projects ]](/images/navbar/projects.gif)
![[ Events ]](/images/navbar/events.gif)
![[ User Groups ]](/images/navbar/usergroups.gif)
![[ User Area ]](/images/navbar/user_area.gif)

![[ About Us ]](/images/navbar/aboutus.gif)
![[ Home Page ]](/images/navbar/homepage.gif)
![[ Advertise ]](/images/navbar/advertise.gif) |

| Intermediate Level User Linux Course |
|---|
System ServicesLinux was born on the network of networks, the Internet, so it's no surprise that Linux's main strength lies in providing services over a network. These services would include providing a web environment, email services, file sharing and print services, databases along with a security system to make sure that everything stays air-tight within your organization. WebserversAlthough the Internet existed decades before it became popular with the public, this popularity is mainly due to the invention of the World Wide Web. The pages that make up the WWW are all served from machines running a type of software that has become known as a webserver. Apache webserverThe most popular web server by far is the Apache web server. It
originated as a set of patches to provide functionality to the
original httpd web server (the name
Apache comes from
"a patchy webserver"). It is released under its own
open source license (called, unsurprisingly the Apache license) and it is
available for a free download and comes with most major Linux distributions.
The combination of Linux and the Apache webserver account for over 60 percent
of the servers on the Internet. Most major Linux distributions come with Apache and they offer you the
possibility to install it. What's even better is that now most
distributions will even configure Apache during the install process to
work together with other complementary web development packages that
you may have chosen to install as well. These might include
PHP, mod_perl and
mod_python. These advances in the ease of install
are surely welcome. I remember installing by Apache from a tarball in
the early days of my Linux experience and it was
somewhat time consuming to get Apache to play well with all of these
add-ons. This should not be an issue anymore. You can, of course,
install from a tarball and get some really personalized configurations
- but that goes way beyond the scope of this course. Although I
normally don't like to use the expression way beyond the
scope of ..., it is a fact that entire books are dedicated
to Apache alone. What we will do is deal with ways to take advantage
of some of Apache's features that you can get "out of the box".  | The Apache version we will deal with in this section is 1.3.x, which is
the most widely used version out there. Apache has released 2.0 and
some of the configuration options are the same, but some have changed.
Do not use this as a guide to Apache 2.0 configuration. |
httpd.confThe main configuration file for the Apache webserver can be found, normally,
in /etc/httpd or /etc/apache - depending on where your distribution chooses
to place it. As I mentioned before, most distributions do a pretty good job
of configuring a working web server, but you may want to change some things
so Apache works more to your liking. Before making any changes though, I
recommend making a copy of httpd.conf. It's a fairly large file and it's
easy to make some change and then lose track of what you did. Then, if you
find Apache's not working right, you can always go back to the original
file. I usually do something like: cp httpd.conf httpd.conf.YYYYMMDD |
Where YYYYMMDD is the year, month and day. You are, of course, free to call it
httpd.conf.charlie if you choose. This is really a good policy to follow when you
change any config file, especially if you're dealing with
services that are crucial to a company or organization. You can quickly get back to a working server and then figure out what went wrong later. Let's look at some
things you can do to get Apache working to suit your needs. Some basic securityApache is designed so that every directory where you have created web content
should have an index file. This is normally index.html, but
you may also add other extensions, such as index.php, index.htm or others.
The part of httpd.conf that determines this is: #
# DirectoryIndex: Name of the file or files to use as a pre-written HTML
# directory index. Separate multiple entries with spaces.
#
<IfModule mod_dir.c>
DirectoryIndex index.html index.php3 index.php index.htm index.shtml index.cgi
</IfModule> |
Apache, by default, is going to show us the directory listing if we
don't have one of these files in a directory. That's probably not a
good idea from a security standpoint. We all get lazy and we may place
temporary files in a webserver that we don't mean for the world to see.
The best thing is to nip this problem in the bud and keep Apache from showing
directory listings. You need to find this line in httpd.conf: Options Indexes Includes FollowSymLinks MultiViews |
It's a good idea to remove the Indexes option here. This
will prevent a website visitor from seeing what's in the individual
directories. Document root and cgi-binThe document root means the directory where Apache serves the web pages from by default. You will see a line like this in your httpd.conf: #
# This should be changed to whatever you set DocumentRoot to.
#
<Directory /var/www/> |
You'll find that the Apache developers are good at explaining what things
mean. That is, if you prefer your web pages to be in another place, you
should change it here. Even if you want them in another place, you may
not want to change this right away. Further along, I'll explain the
concept of "virtual" websites, which means "hosting" more than one website.
However, if you're only going to be serving one set of pages, you may
change this to wherever you want. You may also want to have a look at
this line: ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ |
This is the directory where you can place your cgi-bin scripts. Those
of us who have some web development experience will know what a cgi-bin
script is. In case you don't, it's a program that's mean to be run
from a form on a web page. If you've submitted data to a website in the
past, chances are you've used a cgi-bin script. If you've created a
webpage with a form to submit information, you'll normally find
something like this on the page: <form action="/cgi-bin/myscript.cgi" method="get"> |
Your script is placed in the cgi-bin directory and Apache knows where
to find it when the form calls it. If you change the above line in Apache
to have the scripts located someplace else, you also need to change a
line a little farther below: <Directory /usr/lib/cgi-bin/>
AllowOverride None
Options ExecCGI
Order allow,deny
Allow from all
</Directory> |
Again, as I mentioned above, you may not even need to make these changes if you're going to be maintaining several websites on the same server. More on that further ahead. Personal user sitesIf you give somebody an account on the machine running Linux and Apache
this person has the ability to run his/her own personal website. I'm sure
many of you have seen sites like: http://www.domain.com/~larry/ . This
is because the UserDir module is activated in httpd.conf: LoadModule userdir_module /usr/lib/apache/1.3/mod_userdir.so |
And farther down you will find this section: #
# UserDir: The name of the directory which is appended onto a user's home
# directory if a ~user request is received.
#
<IfModule mod_userdir.c>
UserDir public_html
</IfModule> |
By default, Apache designates the directory where the public webfiles
(and remember, these are public!) are found to be public_html. There's no reason why you can't change this name to website
or any other meaningful name. You could even comment these lines out if
you don't want the users on your system to have a personal website. If
you do allow this, you may want to skip down to the next line: #
# Control access to UserDir directories. The following is an example
# for a site where these directories are restricted to read-only.
#
<Directory /home/*/public_html> |
There are some options here as well as to how the site will work. You
should remove the option Indexes from here as well,
as we did earlier. Server-side includesAs most of us webmasters know, websites, even fairly small ones, can become
unruly and hard to maintain. One of the ways to keep your website management
tasks to a minimum is to use server side includes. A server-side include is
a way to make other files part of the page. That way, for example, you can
have the same navigational buttons on every page of
your website. This keeps you from having to include HTML code for this
in every page you create. You would just insert something this in
your pages: <!--#include virtual="/navbar.shtml" --> |
Apache designates by default the extension .shtml to be used for server-side
includes. If you want plain .html files to be used as server-side includes as well, you need to add this to httpd.conf. Open the file and look for these lines: AddType text/html .shtml
AddHandler server-parsed .shtml |
Now add these lines: AddType text/html .html
AddHandler server-parsed .html |
Alias directoriesSome applications that run under Linux use the Apache webserver to
display some of its content. There are systems to display man pages
in the browser. Some Linux distributions use Apache to give you
a web-based help system and documentation. They will place their
documents outside of the root webserver directory. To access this
"outside" content, we need to create "Alias" lines in httpd.conf or
else it will be inaccessible from a web browser. In the following
example, I'll show you what I need to add to httpd.conf so that
visitors could see my mailman mailing list
public archives. I found the following line in httpd.conf: #
# Aliases: Add here as many aliases as you need (with no limit). The format is
# Alias fakename realname |
Then I added these lines: # Aliases for mailman
Alias /pipermail/ /var/lib/mailman/archives/public/
Alias /images/ /usr/share/doc/mailman/images/ |
This means that a person only has to type http://www.mydomain.ork/pipermail/
into a browser to see the mailing lists located in
/var/lib/mailman/archives/public/. If there are any images on the page, they
will also be displayed. Mailman also works by letting visitors sign up to the mailing list. The
whole system is based on Python scripts. These scripts are not in the
cgi-bin that Apache knows about. They are in a different place. So we
also need to add these lines below so that Apache can find these scripts. #ScriptAlias for mailman
ScriptAlias /mailman/ /usr/lib/mailman/cgi-bin/ This is known as a ScriptAlias and you will find this section below the
Alias section in httpd.conf. As you can see, Apache is very versatile - allowing us to configure it
to use web content from third-party applications with relative ease. The .htaccess fileTo help with website administration, Apache adds an additional configuration
file, called .htaccess (yes, with a dot (.) in front of it)
where you can add more options that effect how your website works. No more 404sAs a web surfer, nothing annoys me more than a "404 not found" page. This
is what Apache will show you by default when you request a page that has
disappeared.  | 404 is the Apache code for a request for a page that does not exist.
Web-savvy people now refer to a missing page as a "404". |
Not Found
The requested URL /bla.html was not found on this server.
Apache/1.3.26 Server at www.dominio.ork Port 80 |
As it's frustrating as a user to find this page, it's my job as a webmaster
to make sure it doesn't appear. There is really no excuse for this occurring.
The .htaccess provides a means to redirect users to content if you've moved it.
Let's say you have a site that talks about a club you have set up. You have
a page dedicated to your August 2002 barbecue. You've created a directory
called /bbq. The club is successful and another year goes by and you have
another barbecue - this time in August 2003. You decide to make the website
more manageable and so you create two
directories - bbq02 and bbq03 with pages about the festivities.
Now, a problem arises. People might have bookmarked the page dedicated to the
hilarious food fight at the 2002 shindig: http://www.ourclub.ork/bbq/foodfight.html. Now, of course, you've moved it. I would say that it's your duty as a good
webmaster to provide a re-direct. Since /bbq no longer exists, we can create
an .htaccess file in our webserver root directory and add the following entry. # redirects
RedirectPermanent /bbq/foodfight.html http://www.ourclub.ork/bbq02/foodfight.html |
You should add any and all web pages that you've moved to /bbq02 to your
.htaccess file as well. Friendly greetingsIf you've done your work diligently in providing re-directs for moved pages,
then you can be fairly confident that any 404s that are generated in your
web logs are probably the result of things beyond your control. Users will
often type bad URLs into their browsers and other webmasters may make mistakes
providing a link to one of your pages. In these cases, it's probably a good
idea to provide and alternative web page to replace Apache's standard 404
warning. Again, .htaccess provides you with this possibility. How elaborate
a substitute page you provide depends on you and your imagination (and perhaps
your good taste!)
 | It's a good idea to use grep to look for 404s in your
Apache access logs at least once a week or so. You may have re-directed
users to other pages but you may have overlooked the fact that people
may have bookmarked specific images as well. Apart from the ease-of-use issues,
it is also a basic security measure. You may find one IP address generating a
lot of 404s. This could be an individual checking out your site as a prelude
to a defacing or other attack on your website. You may then want to take
steps such as firewalling this IP from your network or, if the situation warrants,
contacting the owner of the netblock. |
First, as a website administrator, it's probably a good idea to create a
directory for administrative needs. Call it what you like - something
meaningful to you. Now you can create an alternative page for your 404s
and place it in this directory. The page normally has a simply greeting-
maybe something like: Oops! We can't find that.
and maybe a link back to your home page. If you have search capabilities
on the site, you may want to link to those. Again, it is up to you
as a web administrator to create something that works for you and your site. Password protectionApache also provides a means of keeping people out of certain directories.
Again, this depends on some lines placed in .htaccess. Let's go back to
your club's website. You may want to create a members-only section to the
website that's restricted to those to whom you've given a password. To do this,
you would first create the directory and then create an .htaccess file
in the directory. Then add the following lines: AuthUserFile /home/club/.htpasswd
AuthGroupFile /dev/null
AuthName "Our Club - Members Only"
AuthType Basic
<Limit GET>
Require valid-user
</Limit> |
Now you must create the file with the users and passwords in it, called .htpasswd. You will notice that we have placed it outside of the web
directories as a security precaution. Apache can read it just fine there and
there is no risk of it being read by a nasty spider. Here's how you create
the .htpasswd file: htpasswd -c /home/club/.htpasswd joe |
Where joe is the first user in the
file. That's important because the -c option creates the file. From now on, for
every user you want to add, you don't use the -c option.
Apache will ask you for the password twice, as is standard in Unix-type
applications. Now, when you go to http://www.ourclub.ork/members/secret.html
you will get this in your web browser: 
Scripts in alternative locationsAnother feature we can get via .htaccess is the ability to use scripts
outside our cgi-bin directory. This is another good way to increase
the manageability of your website. Let's say you have a section of
your website for news about your club . You have it in a directory
appropriately called /news. You may have a small Perl script that
takes news items out of a MySQL database. You could create a directory
in /news called /script and then create an .htaccess file with the
following lines in it: Options +ExecCGI
AddHandler cgi-script .cgi |
Now, any script with the .cgi (dot-cgi) extension can be executed as
a script. Normally Apache wouldn't allow that but these two lines
will override that behavior. Of course, there is a good reason for
this not being provided by default. It is a potential security risk.
Most websites place their cgi-bin directory outside
of the web directory - and for good reason. Any script can be executed
from it. It's much more difficult for someone to get at the cgi-bin directory
if it's in some other place. But if we place it inside a website's content
directories, the possibility of someone manipulating it increases. If
you do choose to use this feature, make sure that the scripts are
well-written and free from exploitable bugs, such as cross-site scripting
vulnerabilities and that few people - the fewer the better -have upload
privileges. robots.txtSearch engines like Google exist because the are able to make
inventories of websites. Yahoo started out with a few individuals
creating a directory of the limited number of pages that existed in
the early 1990's. At the time of this writing, there are literally
billions of pages now on the WWW, so it would be too costly to have
humans to this manually. What Google and other search engines employ
are automated robots. But you as a website maintainer may not want
parts of your site to be inventoried by search engines - or you may
not even want your site inventoried at all. To make sure that your
wishes are respected, popular search engines will have their robots
read a file called 'robots.txt' that is placed in the root
directory of every website. robots.txt contains instructions for
web crawlers, spiders and robots as to which directories are off limits
A robots.txt file that does not allow any prying robot eyes will look
like this: User-agent: *
Disallow: / |
The asterisk means any user agent. And the slash /
means the root directory and anything in it, which includes subdirectories.
In other words, the whole site is off limits to any robot. This is a bit
strict. This would definitely not do for a website maintainer who was
looking to increase search engine ranking. You probably want to be a bit
more lenient: User-agent: *
Disallow: /admin
Disallow: /reports |
This would allow robots to make an inventory of your site except for
the two directories /admin and /reports, which you have chosen to
restrict their access to. You can also specify the type of robots you want kept off the site by
naming them specifically after User-agent: . You can even have
several sections to your robots.txt file for different circumstances. User-agent: webcrawler
Disallow: /managers
Disallow: /docs
User-agent: lycos
Disallow: /managers
Disallow: /docs
Disallow: /how-to
User-agent: evilrobot
Disallow: /
User-agent: *
Disallow: /managers |
What you exclude is up to you (or your organization's policy making body).  | A couple of caveats about robots.txt
robots.txt is not intended as access restriction for humans. The
directories above are intended for viewing without restriction
by those who know about them. They just won't end up in any
search engine that knows how to play nice. The simple fact that
you put them in robots.txt will indicate that there is something
there that you don't really want the whole World Wide Web to know
about. Since robots.txt is available for public viewing, that
means that individuals can look at it to see what you consider
somewhat private. Curious people, of course, will immediately
start exploring. If you have anything moderately sensitive there,
protect it with a password in the way I explained above. If it's
really sensitive, it shouldn't even be on a public webserver,
regardless of password protection. Some robots don't play nice. There are robots that go looking
for email addresses on pages to sell to spammers. There are some that even go
looking for physical addresses that you might have listed somewhere to
sell them to telemarketers. Your robots.txt means nothing to them. They
laugh in your face. The best policy is to obfuscate email addresses
on pages (bill**AT**domain.ork) and not to put individuals' personal
info on pages.
|
Virtual hosting of websitesWhen the hosting service boom came toward the middle and late 1990's
most of this was due primarily to Apache and the ability to create
"virtual" hosts. This allowed multiple websites to be hosted on one
server - as long as bandwidth and load balancing allowed for it. For
sites on the public web, the most common way to do this is to take
advantage of Apache's NameVirtualHost directive. If you go to the end of the httpd.conf file, you'll find the
section for virtual hosting. It begins with these lines: # If you want to use name-based virtual hosts you need to define at
# least one IP address (and port number) for them.
#
#NameVirtualHost 12.34.56.78:80
NameVirtualHost 192.168.0.25 |
What I've done here is to choose the IP of the machine that will host
the websites and define it. Next, we need to set up space to house
the various websites. I normally choose /home/[website] (where [website]
is the name of the "user" who will be administering the site. This
may or may not be a real user. If you set up a website to sell trinkets,
say, www.trinkets.biz, you may create a user called 'trinkets'. This
is basically up to you. Nevertheless, you would have a directory called
/home/trinkets. In this directory, you should place a directory for
web content. Call it something meaningful like /www or /html or /web.
Then create a directory for your Apache access and error logs; /logs will
be fine. Then create a directory for your cgi-bin scripts /cgi-bin is
pretty much mandatory in this case. Next, you need to create the
virtual host section for Apache. If you notice, under the NameVirtualHost
directive you will find this line: # VirtualHost example:
# Almost any Apache directive may go into a VirtualHost container.
#
#<VirtualHost ip.address.of.host.some_domain.com>
# ServerAdmin webmaster@host.some_domain.com |
You can start putting your virtual hosts here inside httpd.conf if you want.
I personally don't do this. Apache provides the option use "includes" or, in other words, it lets you tack on other files to your httpd.conf. I prefer to do this
as it helps me maintain different sites better. So I would create a file
in the same directory as my httpd.conf called trinkets.conf. At the end of
httpd.conf I would add the line: Include /etc/apache/trinkets.conf |
Apache will now read that file as well when it starts up. In trinkets.conf
you need to place the following: ##############################################
# VIRTUAL HOST WWW.TRINKETS.BIS #
##############################################
<VirtualHost 192.168.0.25>
ServerAdmin trinkets@trinkets.bis
DocumentRoot /home/trinkets/www/
ServerName www.trinkets.bis
ErrorLog /home/trinkets/logs/error
TransferLog /home/trinkets/logs/access
ScriptLog /home/trinkets/logs/script
ScriptAlias /cgi-bin/ /home/trinkets/cgi-bin/
<Directory /home/trinkets/www/>
Options Includes ExecCGI FollowSymLinks
AllowOverride All
</Directory>
<Directory /home/trinkets/cgi-bin/>
AllowOverride None
Options ExecCGI FollowSymLinks
</Directory>
</VirtualHost>
##############################################
# END HOST WWW.TRINKETS.BIS #
############################################## |
This example supposes that I have a machine on my local network that
web requests are forwarded to from a router. This machine is at IP
192.168.0.25 locally. I have created either a user 'trinkets' or
a just directory /home/trinkets. In this directory I have created
a directory /www for web content, /cgi-bin for scripts and /log for
log files. You're pretty much all ready to go. You can add more of
these if you want. That's the whole point to virtual hosting. Of course,
you need to be able to handle the load. A head startAs I stated at the beginning, this wasn't meant to be an exhaustive guide
to using the Apache webserver. There are plenty of good books out there
that give you much more information about it. But this should give you
a head start in understanding some of the basic concepts about Apache.
|
 |
|