The Greased Turkey Document [1]
or
How to set up a load-sharing server
Release History: 0.01alpha - Rob Thomas - rob@rpi.net.au
[Bootstrap of the documentation]
This document was written with [homepage link] ippvs version 0.5
and Linux Kernel [kernel.org link] 2.0.35 in mind.
1: Overview
This document coveres the basics of what ippvs does, how it
works, and how to set it up. I expect it to expand to cover a
decent man(8) page, and a FAQ.
2: What does it do?
ippvs is a kernel modification that offers a NAT-style load
sharing for multiple virtual servers. What we mean by this is
that you have one 'listening' machine, that transparently (and
incredibly quickly) redirect clients connection requests to other
machines. The advantages of doing this is that it allows you to
have huge arrays of redundant and load sharing servers.
A good example of this (and the example that we will be following
through this entire document) is the setting up of a cluster of
load-sharing proxy servers, at a very, very, low cost-per-tps
rate. It's also perfectly suited to serving normal web traffic,
or allmost anything that can be served over TCP or UDP. The only
caveat is that it will NOT work with ftp services, because ftp
services are too smart for their own good. [quick overview of how
ftpd tells the client which ip and port to connect to, and how
that will break the NAT]
3: How does it work?
In this document, as mentioned above, we will be going through
how to set up an array of proxy servers, that appear to the
clients as one physical machine. The first thing you should
realise is how the machines should be wired together. [2]
[ --- HUB --- ]
[proxy server 1]<-eth0------------+ | | | | | | +--------eth0->[proxy server 4]
[proxy server 2]<-eth0--------------+ | | | | +----------eth0->[proxy server 5]
[proxy server 3]<-eth0----------------+ | | +------------eth0->[proxy server 6]
| |
| |
| +--eth1->[ippvs server 0]<-eth0-------...local network...
+----eth1->[ippvs server 1]<-eth0-------...local network...
[I realise that I use a -very- wide screen, so that'll probably
look like crap on a 80x24 display - looks good on a 128x24 8)]
You should have a look at this map, and take notice of a few
things:
1: The proxy servers are -not- connected to your LAN - they're on
their own seperate LAN
2: The machines are connected to the rest of the network THROUGH
the ippvs server. Make sure their default route is set up that
way
In this demonstration, the IP addresses of the machines are:
ippvs server 0:
eth0: 203.1.1.2 [Machine's IP address]
eth0:0 203.1.1.10 [Permanant load-sharing IP address]
eth0:1 203.1.1.11 [Only up if ippvs1 dies - usually DOWN]
eth1: 10.1.1.254 [Private LAN IP address - non routeable, as only the proxy servers see it]
eth1:0 10.1.1.253 [Only up if ippvs1 dies - usually DOWN]
ippvs server 1:
eth0: 203.1.1.3 [Machine's IP address]
eth0:0 203.1.1.11 [Permanant load-sharing IP address]
eth0:1 203.1.1.10 [Only up if ippvs0 dies - usually DOWN]
eth1: 10.1.1.253 [Private LAN IP address - non routeable, as only the proxy servers see it]
eth1:0 10.1.1.254 [Only up if ippvs0 dies - usually DOWN]
proxy server 1:
eth0: 10.1.1.1
default route to 10.1.1.254
proxy server 2:
eth0: 10.1.1.2
default route to 10.1.1.254
proxy server 3:
eth0: 10.1.1.3
default route to 10.1.1.254
proxy server 4:
eth0: 10.1.1.4
default route to 10.1.1.253
proxy server 5:
eth0: 10.1.1.5
default route to 10.1.1.253
proxy server 6:
eth0: 10.1.1.6
default route to 10.1.1.253
This looks a bit complex, but if you're not interested in setting
up a fault-tolerant network you don't need the second ippvs
server, or to have half the servers talking to one machine, and
the other half talking to the other machine.
[XXX - I'm aware that no auto-failover exists, but it'll only be
a few 'ping' scripts to make it work. - XXX]
[XXX - Should I take out the redunancy stuff until I write some
more documentation for it? - XXX]
Lets track a packet that's coming from a client machine, to port
8080 on 203.1.1.10.
Header: Request connection to port 8080 on 203.1.1.10 from
203.2.3.4 port 9999
The first thing that happens, is that 203.1.1.10 looks at the
headers, and realises that it's it's set up as a load sharing
port. ippvs0 picks a machine to send it to, and scribbles over
the headers, changing the DESTINATION address (the SOURCE address
stays the same) of the packet, and fires it out.
Header: Request connection to port 8080 on 10.1.1.2 from
203.2.3.4 port 9999
The machine 10.1.1.2 accepts the connection, and sends the data
back:
Header: Connection accept, 203.2.3.4 port 9999, and here's the
data, love from 10.1.1.2 port 8080.
The packet then heads back along the wire to the default route,
which is ippvs0. The machine then glues the original headers back
on and sends the packet on it's merry way
Header: Connection accept, 203.2.3.4 port 9999, and here's the
data, love from 203.1.1.10 port 8080.
All the client sees is a normal connection to 203.1.1.10:8080, as
though nothing magic was going on behind the scenes.
4: Wow. This rocks. How do I set it up?
The only 'setting up' is done on the actual ippvs server(s) - You
need to pick out your IP addresses for your private LAN,
obviously, and configure the machines. This document will pretend
that you're using the IP addresses specified above - and there's
no reason at all why you shouldn't. This is exactly what 10.x.x.x
and 192.168.x.x is set aside for.
On ippvs0:
ipfwadm -F -a m 10.1.2.0/24 -D 0.0.0.0/0 (?? No descrption of '-a m' in man ipfwadm?)
ippfvsadm -A -t 203.1.1.10:8080 -R 10.1.1.1:8080 - Redirect _T_CP connections to 203.1.1.10:8080 to 10.1.1.1:8080
ippfvsadm -A -t 203.1.1.10:8080 -R 10.1.1.2:8080 - and 10.1.1.2:8080
ippfvsadm -A -t 203.1.1.10:8080 -R 10.1.1.3:8080 - and 10.1.1.3:8080
On ippvs1:
ipfwadm -F -a m 10.1.2.0/24 -D 0.0.0.0/0 (?? No descrption of '-a m' in man ipfwadm?)
ippfvsadm -A -t 203.1.1.11:8080 -R 10.1.1.1:8080 - Redirect _T_CP connections to 203.1.1.11:8080 to 10.1.1.4:8080
ippfvsadm -A -t 203.1.1.11:8080 -R 10.1.1.2:8080 - and 10.1.1.5:8080
ippfvsadm -A -t 203.1.1.11:8080 -R 10.1.1.3:8080 - and 10.1.1.6:8080
That's all you have to do. Now, when you try to make a connection
to 203.1.1.10 or .11 on port 8080, it will be automatically, and
invisibly, redirected to a random machine. There are various
algorithims that are used to balance the load, which are out of
the scope of this document at this stage of play.
5: Things you should be aware of that will bite you if you're not
careful.
Allways make sure that the default route of the client machines
points to the ippvs server.
0.5 supports tunneling, which I haven't played with, so therefore
I don't know how it works yet 8-)
Allways make sure that the default route of the client machines
points to the ippvs server. (Yes, twice. Don't forget!)
6: That hi-av thing looks cool. How does that work?
Hi-av isn't all that hard. When I get some time I'm going to
whack together a couple of scripts and a database that can keep
track of machines and automatically remove them from the
redirection list, and have another machine (ala ippvs1) take over
from a failed other ippvs. It's easy to to it manually. Switch
ippvs0 off, run 'ifup eth0:1' and 'ifup eth1:0' on the other
machine (if you have it set up that way) and then run the
ippfvsadm commands that the other machine used to do, and it'll
take over invisibly. Go look at the IP addresses above if you
don't understand what I mean.
Questions, comments and suggestions about this document, please,
send to rob@rpi.net.au
The Virtual Server mailing list is currently hosted at
linux-virtualserver@iinchina.net - to subscribe to the maling
list, send a message to 'majordomo@iinchina.net' with the message
BODY (not subject) of 'subscribe' - it'll all be taken care of
from there. Any messages sent to the list saying 'I'm not
subscribed to this list, so can you email the reply to me
privately' will be ignored, as it's very, very bad manners.
--Robert Thomas - 28/11/98
[1] - Kernel versions 2.1.129 and 2.1.130 have earned themselves
the names of 'Greased Weasel' and 'Basted Turkey', due to some
light-hearted banter of Linus Torvalds in the kernel release
notes. This document was prepared over these two kernel
revisions!
[2] - This is an 'optimal' diagram. There's no -physical- reason
why the ippvs server, the clustered machines, and the clients
can't be on the same segment. It's just nicer this way. Go buy a
$50 hub. Trust us. It's better.