Compressing all HTML pages with Apache2 on AWS

The Apache2 web server has two mods which can be used to compress data sent to the client (ie browser); mod_deflate and mod_gzip. The gzip mod is more versatile but more challenging to setup. For simple compression of HTML, CSS and JavaScript files, the deflate mod works just file.

Compression is particularly important on Amazon Web Services (AWS) because:

  • HTML is very redundant and bulky
  • Smaller files are sent to the client faster
  • AWS charges you based upon OUTPUT bandwidth; smaller files = less bandwidth usage per file

Simple activation of mod_deflate

These instructions assume you have already setup an AWS instance and have an SSH client (like PuTTY) available and a SCP client (like WinSCP) to use when editing the configuration files.

  1. Log in to your instance via the SCP client then open the apache2 virtual hosts configuration file (“/etc/httpd/conf.d/vhosts.conf” for the default setup mentioned in other instructions here).
  2. Add the “AddOutputFilterByType DEFLATE text/html text/plain text/xml” Filter to each virtual host (virtual hosts are the groupings starting with “<VirtualHost “). You should inclose the filter in a conditional module statement (“<IfModule xxxx.x>”) to make sure your web server keeps running even if you happen to remove the deflate module.
  3. Save the virtual hosts configuration file.
  4. Open the SSH client and transfer to the root user (“sudo su”)
  5. Restart the apache2 service (“service httpd restart”).

The changes to the virtual hosts configuration file

  • <VirtualHost *:80>
  • ….
  • <IfModule mop_deflate.c>
  • AddOutputFilterByType DEFLATE text/html text/plain text/xml
  • </IFModule>
  • </VirtualHost>

Summary of command line inputs

  • $ sudo su
  • $ service httpd restart
Advertisements

Installing and Configuring Apache2 on AWS Amazon Linux AMI

Apache2 is the standard Linux web server. It deals with all of the http and https requests sent to the server. Apache2 modules are also used to compile php scripts.

Installing Apache2

These instructions assume you have already setup an AWS instance and have an SSH client (like PuTTY) available.

  1. Log in to your instance via the SSH client. Transfer to the root user.
  2. Use YUM to install httpd (the apache2 web server application)
  3. Press “Y” when it asks if you want to install Apache
  4. Verify the installation occurred correctly by starting the httpd service

Summary of command line inputs

  • $ sudo su
  • $ yum install httpd
  • …..
  • Do you want to install httpd (Y/N): Y
  • $ service httpd start

Configuring Apache2

Configuring Apache2 is easiest done with a visual text editor, like included in WinSCP rather than through the command line and vi. You will need to restart the httpd daemon after changing the configuration files in order for the settings to take effect.

Examples settings
Apache system user webserv
System group webcln
Domain 1 example.com
Domain 1 subdomain sub.example.com

Basic Configuration

These settings will need to be changed whether you use a single domain or virtual domains.

  1. Open the file “/etc/httpd/conf/httpd.conf”. Httpd uses the standard C-type commenting, so any line starting with a “#” is commented out and not used in configuring apache2
  2. Make sure “Listen 80” is uncommented.
  3. Change “User” to the desired linux user that you want apache to run as. The example user is “webserv”
  4. Change “Group” to the desired linux user that you want apache to run as. The example group is “webcln”
  5. Set the “ServerAdmin” to the server admin’s email address.
  6. Add any other index files to “DirectoryIndex” list. Apache will search for the files in order they are listed. Separate multiple file names with spaces.
  7. Finish the configuration via the Single Domain Configuration OR the Virtual Domains Configuration. I recommend using the Virtual Domains Configuration model, because it easily allows for adding subdomains or redirecting other domains.

Single Domain Configuration

  1. Open the file “/etc/httpd/conf/httpd.conf” A single domain is setup fully within the core configuration file.
  2. Uncomment and make the appropriate changes to the following directives.
  3. Log in to your instance via the SSH client. Transfer to the root user (“sudo su”).
  4. Verify the installation occurred correctly by starting the httpd service.
  5. Log in to your domain hosting account and change the DNS records to point to the correct IP address.

Virtual Domains Configuration

  1. Open the directory “/etc/httpd/conf.d/” and create a new file called “vhosts.conf”
  2. Copy the below configurations and exchange the example values for your server’s values. You should leave a copy of the ‘default’ server at the top of the vhosts file. The first listing of either port (80 for http and 443 for https) will be used when a request does not match any other server name or server alias.
    Meaning of each parameter
    • NameVirtualHost – Indicated that the particular IP:PORT combination is a virtual host. Need this to instigate the VirtualHost tags later. The value should be structured as IP:PORT. The wildcard “*” can be used to identify any IP address. Port 80 is used for http connections while port 443 is used for https (secure) connections.
    • IfModule – Checks to see if a module is installed and usable. Anything within the tags will be processed only if the module indicated in the open tag is installed and usable.
    • VirtualHost – This tag identifies a particular virtual host. The contents of the tag must contain the parameters ServerName, and DocumentRoot in order to work. The IP:PORT combination listed in the opening tag must be initiated using the NameVirtualHost parameter.
    • ServerName – The name of the webserver, which is normally the web address, in quotes. Apache will be asked for the ServerName by the user’s browser. Note: I use the value “default:80” as a catchall for incorrect inquiries to the server. If a user queries your server, on port 80, for ServerName which doesn’t exist, the first VirtualHost will be returned as a default. A DNS error can create this situation, but a user can intentionally create this situation. This is possible by directly accessing the server IP address then spoofing the HTTP header with a different web address. You can actually test your own settings this way.
    • UseCononicalName – This is a name allocation directive for self-referential URLs. Setting it to ‘on’ forces Apache to use the hostname and port specified by ServerName where setting it to “off” allows it to first try the hostname and port supplied by the user then use the server values. Setting it to “off” can be a slight security issue, but will generally allow for faster processing of complex situations, especially those involving intranets.
    • ServerAdmin – This is the email address of the admin for the particular server, in quotes. This is not essential, but should be included to control the distribution of spam.
    • DocumentRoot – This is the directory apache will look for the appropriate web files.
    • ErrorLog – This is the error log file to be used for errors occuring with this virtual host.
    • SSLEngine – This runs the Apache mod_ssl engine which allows for secure connection and encryption of the information set to the user. You have to use this if you want to use the https protocol.
    • SSLVerifyClient – This forces the client to provide the certificate confirmation before receiving any information. This is impractical for most situations, except when using a company intranet. The client must already have the correct certificate in order to authenticate with the server.
    • SSLCertificateFile – The location of the ssl certificate file.
    • SSLCertificateKeyFile – The location of the ssl certificate key file..
  3. Create the directories for each virtual account. The example uses the home directory of “/var/www/vhosts” for all of the virtual hosts. Within this directory there is a directory for each domain and within each of those is a directory for the http files (httpdocs), the https files (httpsdocs) and the server files (var). You also need to create a blank “index.html” file in the http and https directories and an error log in the logs directory.
    • /var/www/vhosts/example.com/httpdocs/
    • /var/www/vhosts/example.com/httpsdocs/
    • /var/www/vhosts/example.com/var/logs/
    • /var/www/vhosts/example.com/var/certificates/
  4. Log in to your instance via the SSH client (PuTTY). Transfer to the root user (“sudo su”).
  5. Verify the installation occurred correctly by starting the httpd service (“service httpd start”).
  6. Log in to your domain hosting account and change the DNS records to point to the correct IP address.
Example vhost.conf file
  • NameVirtualHost *:80
  • <IfModule mode_ssl.c>
    • NameVirtualHost *:443
  • </IfModule mode_ssl.c>
  • <VirtualHost *:80
    • ServerName “default:80”
    • UseCononicalName off
    • ServerAdmin “webmaster@example.com”
    • DocumentRoot “/var/www/vhosts/default/httpdocs”
    • ErrorLog “/var/www/vhosts/default/var/logs/error_log”
    • <IfModule mode_ssl.c>
      • SSLEngine off
    • </IfModule mode_ssl.c>
  • </VirtualHost>
  • <IfModule mode_ssl.c>
    • <VirtualHost *:443
      • ServerName “default:443”
      • UseCononicalName off
      • ServerAdmin “webmaster@example.com”
      • DocumentRoot “/var/www/vhosts/default/httpsdocs”
      • ErrorLog “/var/www/vhosts/default/var/logs/error_log”
      • SSLEngine on
      • SSLVerifyClient none
      • SSLCertificateFile/var/www/vhosts/default/var/certificates/default.crt
      • SSLCertificateKeyFile /var/www/vhosts/default/var/certificates/default.key
    • </VirtualHost>
  • </IfModule mode_ssl.c>

Setting up WinSCP for AWS access

I am assuming you have already setup PuTTY for AWS access. If haven’t yet, please follow the instructions at Setting up PuTTY for AWS access. Also, obviously, you need to have an AWS Instance setup. If you haven’t setup an AWS Instance, you can find help at “Setting up a Free Tier Amazon EC2 Instance.

These instructions assume you have already installed WinSCP on your computer. If you need WinSCP, it can be found at www.winscp.net. It is really easy to install on windows machines.

Configuration for AWS Instance access

You need to access your AWS dashboard as well as WinSCP.

  1. Open your AWS Console (go to http://aws.amazon.com and login)
  2. Go to “EC2” under “Compute and Networking”
  3. Click on “Instances” under the “Instances” section of the Navigation pane. This will display all of the instances you currently have running. Clicking on the name of the instance will show the details of that instance below. Select the instance you want to configure WinSCP for then find the “Key Pair Name” and “Security Groups” values under the “Description” tab. If you haven’t already done so for PuTTY, you will need to edit the security group in order to allow an SSH client (WinSCP in this case) to access your instance then confirm the security key with the key pair name.
  4. Find the value for “Public DNS” under the “Description” tab then highlight it (shift+ left click while selecting the text) and press CTRL+C to copy the text. You will need this value when setting up WinSCP and I find copy & pasting a whole lot easier than retyping something.
  5. Click on “Security Groups” under the “Networking & Security” section of the Navigation pane. This will show your security groups for this region. Click on the instance’s security group to see the details of that group.
  6. Click on the “Inbound” tab to edit the firewall associated with this security group.
  7. SSH clients use port 22 for access, so you will need to verify that TCP port 22 (SSH) is listed on the table to the right. If it is not listed, or there is no table, select “SSH” under for “Create a new rule” then add your computer’s ip address to the source line followed by “/32”. AWS security groups use CIDR notation for IP address ranges. Simply, “/32” limits the range to a single IP address. Click “Add Rule” then click “Apply Rule Changes”
  8. Click on “Key Pairs” under the “Networking & Security” section of the Navigation pane. The “Fingerprint” for the “Key Pair Name” will be needed later to confirm your connection to the AWS Instance.
  9. Open WinSCP.
  10. Click on “New” to add a new session. Note, if this is the first time you’ve used WinSCP, you will automatically be prompted for a new session.WinSCP Session Dialog
  11. Choose “SCP” as the “File protocol”
  12. Choose “22” for “Port number”. Note, you can actually use a different port than the default 22 to connect with the AWS Instance. You would have to make the appropriate adjustments to the ssh shell and the AWS Security Group. This can be good from a security standpoint, but is extremely risky from a setup standpoint. If you mess up the settings you will be permanently locked out of SSH access to the instance, generally making it worthless.
  13. Paste your instances’ “Public DNS” value in the “Host name” box.
  14. Enter “ec2-user” as the “User name” and leave the “Password” box blank..
  15. Click on the “…” button in the “Private key file” box and open your private key that corresponds to the Key Pair Name” you generated when setting up the instance. This was the same file you opened in the PuTTYGen program earlier.
  16. Click “Save”. There’s no point in reentering this info every time you want to login.
  17. The first time you log in you will get a security fingerprint confirmation. This value should be the same as the one provided through the AWS console.
  18. Click “Login”. This will log you in as the ec2-user user. This is fine for some stuff, but you won’t be able to change to the root user without completing the last few steps.
  19. Open the file “/etc/sudoers”
  20. Find the line “Defaults reguiretty” and add “Defaults:ec2-user !requiretty” as the next line. This will allow WinSCP to transfer itself to the root user after logging on by using sudo su, just like in PuTTY.
  21. Disconnect. The disconnect option can be found under the “Sessions” menu.
  22. Click on the session you just created then click “Edit”
  23. Click on “SCP/Shell” on the left options. Note, “SCP/Shell” isn’t listed under “Environment” check the “Advanced options” box at the bottom to display the option.
  24. For “Shell:” select “sudo su -” as the option. Make sure “Return code variable” is set to “Autodetect”.WinSCP SCP/Shell Dialog
  25. Click “Save”

When you log in, your shell access will automatically be changed to the root user allowing for complete access to all files. For most web development activities, root access isn’t needed, however it makes life easier AND is essential for installing and configuring most of the software.

Installing the necessary software on an AWS Amazon Linux AMI server

There is a variety of software you will need to get your new AWS web server up and running. You probably already have the desktop clients if you every did any server work previously, the core server software however will need to be installed, depending on your purposes for the server.

This page will be updated from time to time as new installation and configuration guides are added.

Desktop Clients

Software Description Available at: Documentation
PuTTy Free SSH client. Utilizes basic command line style interface www.putty.org TXT version | HTML version
WinSCP Free SCP/SFTP/FTP client for Windows. Offers a graphical user interface to move and edit files. www.winscp.net HTML version

I am bias to Windows software. All of these programs run on Windows XP and Windows 7 (32-bit & 64-bit systems). If you are running a Linux or Mac system….well…they may work. The program’s name link will go to instructions on configuring the software to access your AWS Instance.

Core Server Software

Software Usage Description Documentation
Apache2 Website hosting The basic web server which deals with internet (http/https) traffic to the server. http://httpd.apache.org/
PHP Dynamic Websites (optional)
Requires:Apache2
Scripting language for creating dynamic webpages. Used by most CMS, Wiki & Blog systems to manage content http://www.php.net/
MySQL Database The basic free SQL database server. Used by many CMS, Wiki & Blog systems to store content. http://www.mysql.com/
phpMyAdmin Database Administration (optional)
Requires:Apache2, PHP & MySQL
Graphical, HTML based admin tool for accessing and managing mySQL databases. http://www.phpMyAdmin.net/
Postfix Mail-Transfer-Agent (ie: email server) Accepts and sends email. Versatile and can be used with a variety of database structures. http://www.postfix.com/
Courier Email Client Portal (optional)
Requires:Postfix
Offers a portal to access email via any client, including MS Outlook, Thunderbird & smart phones. Offers IMAP and POP3 systems. http://www.courier-mta.org/
Spamassassin Email Spam filter (optional)
Requires:Postfix
Works with MTAs to prevent spam from arriving on server http://spamassassin.apache.org/
BIND9 DNS Server (optional) DNS server which allows you to create your own dns records. http://www.bind9.net

Note all of these programs are free, and most are open source. All of the installation instructions are specific to the Amazon Linux AMI. This stripped down version of Linux is a special Amazon derivative of Fedora. When I was originally setting up our servers, some of the differences between RedHat, Ubuntu, Debian and this version of Linux drove me crazy, therefore all of these instructions worked on the newest Amazon Linux AMI version (currently 2012.03).

Amazon Cloud Hosting

Amazon is a huge player in the cloud hosting space. Cloud hosting is basically where a company fills a server farm with racks upon racks of physical computers, hard drives and routers. The company then uses software to combine the individual computers into a super computer which is then partitioned off into a series of virtual servers of varying sizes and types. The company then resells usage of these virtual servers to their clients.

Amazon Web Services (the division which provides the service) offers a variety of different types of virtual servers, but the basic, and most flexible, is called Elastic Cloud Compute (EC2).

Amazon EC2

Instances

Instances can be thought of as the virtual processor, motherboard and RAM of the virtual server. Amazon offers three different types of Instances (On-Demand, Reserved, and Spot) and of varying different sizes.

On-Demand Instances

On-demand Instances are those you intend on using on a temporary basis. You are paying only for the amount of time you actually use the instance, so they are excellent for short-term projects and to get settings worked out.

Reserved Instances

Reserved Instances are instances which are dedicated to your account. They do not go away if you stop, or terminate them. Well, that is not quite correct. You are actually reserving usage of a particular type of instance, rather than a particular instance. The different levels of Reserved Instances are basically usage structures. You prepay to reserve an instance and in exchange get a discount on the hourly rate. Reserved Instances are ideal for long-term server applications, like website host, email servers, etc.

Reserved Instances Utilization Rates
  • Heavy Utilization – These instances are used 80%+ of the month.The core website and email servers.
  • Medium Utilization – These instances are used for 40-79% utilization rates. If you run a few heavy traffic websites, then these instances would be the load-balanced servers to support demand during peak times like the evenings and weekends.
  • Light Utilization – These instances are used for 17-30% utilization rates. This time frame corresponds really well with development servers that are started in the morning, run for 7-8 hours then turned off in the evening.

Spot Instances

Spot Instances are similar to on-demand instances, but are designed for special project type circumstances. Amazon obviously wants to keep all of their servers running all the time (ie. 100% utilization), however with the on-demand type structure, there are times when some servers are not being used. During these low slow times, Amazon would rather sell time on them temporarily for a discount rather than let them run empty. These temporary discounted servers are the spot instances. Spot instances work really well for periodically maintenance activities.

To use a spot instance, you indicate the size of instance and the maximum price you bid for usage of that instance. Once the price for that size of instance goes below the bid price, the instance starts up and you get it until the prices goes back over your max bid price. Note you are only charged the actual price, not your bid price, so you can often pay less per hour than your bid price for spot instances.

EC2 Resources

Elastic Block Store Volumes

Elastic Block Storage (EBS) volumes are the virtual hard drives of the virtual server. There are two types of EBS Volumes, Standard and Provisioned  IOPS (Input/output Operations Per Second).

Standard EBS Volumes

Standard EBS volumes correspond the best to physical media hard disks. You can read and write to them at average rates and deliver about 100 IOPS. Unless you need high writing/ reading capabilities, a standard EBS is what you’d use.

Provisioned IOPS Volumes

Provisioned IOPS are for high read/write type situations. The most common examples is a database server. These volumes are very powerful, but also very expensive (relatively).

There are other AWS Services offered, like S3, SES and RDP, but I currently don’t use them some will avoid going into detail on those services until I use them.