Availability and
efficiency
Efficiency:
I have just read the
following page, look at the the USL
(Universal Law of Computational
Scalability) of Dr. Gunther,
he wrote this: (See: http://en.wikipedia.org/wiki/Neil_J._Gunther )
--------------
The relative capacity C(N) of a
computational platform is given by:
C(N) =
N
------------------------------------------
1 + α (N - 1) + ß N
(N - 1)
where N represents either the
number of physical processors
in the hardware configuration or the number of users driving the
software application. The parameters α and ß represent
respectively
the levels of contention (e.g., queueing for shared resources)
and
coherency delay (i.e., latency for data to become consistent) in
the
system. The ß parameter also quantifies the retrograde
throughput
seen in many stress tests but not accounted for in either
Amdahl's law
or event-based simulations.
----------
His website: http://www.perfdynamics.com/
If you read carefully , you will
see that Dr. Gunther is using this
model to predict scalability after he simulates a relatively
small
number of vusers in LoadRunner ( because of licensing costs, it's
cost-effective) and after that he finds the coefficients of
the
2nd-degree polynomial (quadratic equation) and then transform
those coefficients back to the USL parameters using the
α = b - a
and ß = a.
And then he is extrapolating with
the USL model to higher loads
to predict scalability.
He is also applying the model to
webservers with heterogeneous
workloads. like in the following page:
http://perfdynamics.blogspot.com/2009/04/assessing-usl-scalability-wi...
Now my question follows:
Suppose we have obtained a small
number of measured load-points
with Loadrunner or others tools, and we calculated the USL
equation
to predict scalability of a webserver , how the USL model can
predict
if the scalability/performance is limited by the network
bandwidth
and not the server ? I think USL can not predict this.
When we are modeling webservers ,
we have to include
the network node in our network queuig model (that includes
the computer server queue) , and since the service in the
computer server is comprised of multiple services (when we
are using htmls , databases etc.) the network queue will not
be markovian in its service , and we have to model the network
queue as an M/G/1 and this will complicate the mathematical
analytical modeling...
So, i think the best way is
to use a webserver stress tool
like http://fwptt.sourceforge.net/
You can even test the webserver
with an heterogeneous
workloads by starting multiple fwtpp processes, and
you should increase the number of threads to 5 and after
that to 10 threads, 15 ... and so on until the webserver
applications stops responding propperly(and this will inform
on the maximum number of users that you can have in the same
time...)
and as you are stress testing you can even see (by
testing/measuring
it) if the network bandwidth is not the bottleneck... and this
can
not be done by the USL model.
We already know that to satisfy a
Poisson process we must
have that N(t1)- N(t0), N(t2)- N(t1) etc. must be independant
that means the counting increments must be independant.
We have the following relation
between the Poisson law
and Exponential law:
the expected value E(X exponential)
= 1 / E(X poisson)
and if the arrival is poissonian
then the interarrivals are
exponential..
Now what about a webserver ?
I have read the following paper:
http://docs.google.com/viewer?a=v&q=cache:JFYCp_dSPP4J:citeseerx.ist....
And it says that a simple model
with M/M/1/k with FCFS discipline
can predict webserver performance quite well..
Hence, i think we can model
our webserver over internet
with 3 queues connected as a Jackson Network like this
A -> M/M/1 Server Queue
-> M/M/1 Network queue -> M/M/1 Client -> A
A: is the arrival rate
and we have the following:
Ni: number of jobs in each queue
Ui: utilization of each queue
Ni = Ui / (1-Ui)
Adding all the Ni in each
individual queue will give the
average number of jobs in the entire queuing network.
After that we apply the Little
formula:
A: network arrival rate
T: average response time
N = A*T => T = N / A
And after that from the
mathematical analytical equation
we can simulate the jackson queuing network that model our
webservers...
Now there is still an important
question that i have:
Our analytical jackson network
model can give us insight
on the webservers behavivior.. but the difficulty that i
find in
practice is that: suppose we have found the right parametters
that we want to choose, like for example the service rate of
the M/M/1 Server Queue , how , from this service rate, can
we buy the right computer that satisfies the service rate
that we want ?
We say in OR that:
"Understanding the behavior of
a system is what Queueing Theory
and Littles Law is all about. But, managing a Queue
requires not
just understanding the behavior of a system, but also in-depth
knowledge of how to improve a system - improving both aspects
of Queueing will mean a better, more efficient and cost-effective
system and, more importantly, a much better customer
experience."
I wrote before that:
---
"It says that a simple model
with M/M/1/k with FCFS discipline
can predict webserver performance quite well..
Hence, i think we can model our webserver over internet
with 3 queues connected as a Jackson Network like this
A -> M/M/1 Server Queue -> M/M/1 Network queue -> M/M/1
Client -> A
A: the arrival rate
and we have the following:
Ni: number of jobs in each queue
Ui: utilization of each queue
Ni = Ui / (1-Ui)
Adding all the Ni in each
individual queue will give the
average number of jobs in the entire queuing network.
After that we apply the Little formula:
A: network arrival rate
T: average response time
N = A*T => T = N / A
And after that from the
mathematical analytical equation
we can simulate the jackson queuing"
--
As you have noticed , this
mathematical model of
this jackson network does in fact take into account
the M/M/1 Network queue node , the USL model can not
do this... and with this performance data from the mathematical
analytical model simulation we can for example validate
the performance data of the fwptt stress webserver simulation..
But you have to take into account
worst cases and the
peak traffic loads...
Let for example we have a a
webserver hosting html pages
and it is receiving 1000000 HTTP operations
per day with an average file size of 10 KB.
What would be the network bandwidth
required for this website
considering peak traffic if the peak traffic load from past
observations was four times greater than average loads?
Required bandwidth is solved by the
following equation:
HTTP op/sec x average file size or
1000000 HTTP ops per day
=1000000/24 = 41,667 op/hour =
41,667/3600= 11.6 HTTP ops/sec
The needed bandwidth is
11.6 HTTP ops/sec X 10 KB/HTTP op =
116 KB/sec = 928 Kbps.
If we assume a protocol overhead of
20% then the actual throughput
required is 928 Kbps X 1.2 = 1,114 Kbps.
However if peak loads, as i
said before, is as much as
4 times greater, the bandwidth required to handle spikes
would be 4 X 1,114 Kbps = 4.456 Mbps.
So you have to think also about the
cost of this line...
I will add the following:
As you have noticed i said that:
"As you have noticed , this
mathematical model of
this jackson network does in fact take into account
the M/M/1 Network queue node , the USL model can not
do this... and with this performance data from the mathematical
analytical model simulation we can for example validate
the performance data of the fwptt stress webserver
simulation.."
and i said that:
"Hence, i think we can
model our webserver over internet
with 3 queues connected as a Jackson Network like this
An M/M/1 Server Queue -> M/M/1 Network queue -> M/M/1
Client queue"
And of course on Capacity Planning
for Enterprise Datacenters
and Websites , you can mirror many computer servers and load
balance between them with a software... to make the system much
FASTER, and this will be modeled as a jackson network like this:
A -> M/M/n Server Queue ->
M/M/1 Network queue -> M/M/1 Client -> A
A: the arrival rate to the system
But there is still an important
thing , as i have showed before
on my calculations:
"However if peak loads, as i
said before, is as much as
4 times greater, the bandwidth required to handle spikes
it would be 4 X 1,114 Kbps = 4.456 Mbps.
So you have to think also about the
cost of this line..."
I think that you have also to take
into account the knee utilisation
of your M/M/n Servers Queues, if for example the number of
computer
servers is 8 and the Knee is 74% that means that in our
previous
example the bandwidth must equal to:
74% X 4.456 Mbps = 3.297
Mbps.
Cause as you know, above this Knee
of 74% for 8 servers
the curve of the waiting time does grow quickly ..
And we have to take into account
the cost of the line ...
Let us take another example:
In the network node of the Jackson
network, there is three
main parameters that are responsable for its performance:
latency, bandwidth and utilization.
Now, if for example you take a look
at my provider
http://www.videotron.com/service/internet-services/internet-access/hi...
My upload network speed is clearly
820 Kbs => 102.5 Kbytes/sec
Hence, if the average file size on
my web site is for
example 25 Kbyte and the protocol overhead is 20%, that
means that the file size will be in average:
25 * 120% = 30 Kbyte
And if the Knee of the M/M/1
queuing node in our Jackson network
is 50%, then the bandwidth available will be reduced to
102.5 Kbytes/sec * 50% = 51.25 Kbytes/sec
If the download speed of the client
is for example 7.5 Mbps,
that means that the file can be retrieved in no more than
30 / 51.25 = 0.585 second = 585 milliseconds.
But if we assume for example that a
typical web visitor
to my web site is spending 585 milliseconds retrieving the
page and 60 seconds reading the HTML page THEN each client
on average only requires:
0.585 / (0.585 + 60) = 9.6% of the
bandwidth that is allocated to
it..
but in practice i think we have to
consider the worst case
scenarios..
So, since we can support a
constraint of no more than 5 seconds
in the T (average response time), that means:
585 ms * X = 5000 milliseconds
=> X = 8.54
Hence, in worst case , my network
node can not support more
than 8.54 users in the same time...
So, on *average* my network node
can support:
(8.54 * 86400) / 5 =
147571 users per day
I wrote before that:
--
And of course on Capacity Planning for Enterprise Datacenters
and Websites , you can mirror many computer servers and load
balance between them with a software... to make the system much
FASTER, and this will be modeled as a Jackson network like this:
(1)
A -> M/M/n Server Queue ->
M/M/1 Network queue ->
-> M/M/1 Client queue -> A
A: the arrival rate to the
system"
--
We know also from an operational
law of queuing theory that:
(2) the rate of job leaving any
stable node must equal
its arrival rate.
(2) is from the Equivalence
property.
Equivalence property: Assume that a
service facility with
s servers and infinite queue has a Poisson input with
parameter lambda and the same exponential service-time
distribution with parameter Mu for each server (the M/M/s model),
where s * Mu > lambda. Then the steady-state output of
this service facility is also poisson process with parameter
lambda.
We know for an M/M/c queue that:
Note: try to add servers with
almost the same hardware
configuration...
C(c, U) = Erlang formula = P(c) /
(1 - Utilization)
note: c the number of
servers..
P(c): means the probability that
all the servers are busy
P(0): means the probability that
there is no waiting time in the
queue, that means also: AT LEAST one server among the C servers
are not busy...
The average waiting time in the
'queue' =
C(c,U) / (service rate x c x (1 - Utilization)) (3)
It's approximatly equal to:
Utilization^C/(service rate x (1 -
Utilization^C)
Note: ^ means power
This approximation is exact for the
M/M/1 and M/M/2 models,
but 'slightly' lower than the result in (3) if c > 2
and
Utilization = Density of
circulation / C (number of servers)
Note: ^ means power()
and C means the number of servers
Response time = The average waiting
time in the 'queue' +
(1 /
service rate)
average numbers of users in the
system = service rate x response time
average number of users in queue = service rate x average waiting
time
in the 'queue'
Now as i said before:
--
So the equation for
Ni: number of jobs in each queue
Ui: utilization of each queue
Ni = Ui / (1-Ui)
Adding all the Ni in each
individual queue will give the
average number of jobs in the entire queuing network.
After that we apply the Little
formula:
A: network arrival rate
T: average response time
N = A*T => T = N / A
And after that from the
mathematical analytical equation
we can simulate the jackson queuing network that model our
webservers...
---
If we try to calculate the Ni =
Ui / (1-Ui) in
the Jackson network, and from the operational law (2) above
this will give us:
Ns for the M/M/n Server queue is:
(DC / n) / (1 - (DC/n))
and DC = Ss / A => Ns =
((Ss/A)/n) / (1 -((Ss/A)/n))
Ss: service rate at the queuing
server.
A: Arrival rate to the jackson network
DC: is the Density of circulation
n: number of servers
And Nn for the M/M/1 Network queue
is:
and Utilization in the M/M/1
network queue = Sn / A
this imply that => Nn = (Sn/A) / (1 -(Sn/A))
Nn: number of jobs in the M/M/1
network queue node
Un: Utilization in the network queue node
Sn: service rate at the queuiNg server.
A: Arrival rate to the jackson network
And Nc for the M/M/1 Client queue
node is:
and Uc= Sc / (F/5)
this imply that => Nc = (Sc/(F/5)) / (1 -(Sc/(F/5)))
Note: F/5, if for example the rate is one file every 5 seconds.
Nc: number of jobs in the M/M/1
client queue node
Uc: Utilization in the M/M/1 client queue node
Sc: service rate at the queuiNg server.
A: Arrival rate to the Jackson network
F: Average file size
Adding all the Ni in each
individual queue will
give the average number of jobs in the entire
queuing network that is equal to:
Ni = Nn + Ns + Nc
= (((Ss/A)/n) / (1 -((Ss/A)/n)) + (Sn/A) / (1
-(Sn/A))
+ (Sc/(F/5)) / (1 -(Sc/(F/5)))
After that we apply the Little
formula:
A: network arrival rate
T: average response time
N = A*T => T = N / A
this imply that the T(the average
response time in the Jackson
network)
is:
Ni = Nn + Ns + Nc
= (((Ss/A)/n) / (1 -((Ss/A)/n)) + (Sn/A) / (1
-(Sn/A))
+ (Sc/(F/5)) / (1 -(Sc/(F/5)))
Now finally we have our
mathematical analytic equation
and we can begin to simulate our Enteprise webserver
and also validate the performance data with the fwptt stress
webserver simulation tool...
And don't forget what i have said
about USL:
"Suppose we have obtained a
small number of measured
load-points with Loadrunner or others tools, and we
calculated the USL equation to predict scalability of
a webserver , how the USL model can predict if the
scalability/performance is limited by the network bandwidth
and not the server ? I think USL can not predict
this."
Also i wrote about USL that:
"As you have noticed , this
mathematical model of
this jackson network does in fact take into account
the M/M/1 Network queue node , the USL model can not
do this... and with this performance data from the mathematical
analytical model simulation we can for example validate
the performance data of the fwptt stress webserver
simulation.."
If you have noticed i have tried to
show also that one
of the principal objectives is to control and manage
the *Quality* of the system or process, i wrote in my
post the following:
---
this imply that the T(the average
response time in the Jackson
network)
is:
Ni = Nn + Ns + Nc
= (((Ss/A)/n) / (1 -((Ss/A)/n)) + (Sn/A) / (1
-(Sn/A))
+ (Sc/(F/5)) / (1 -(Sc/(F/5)))
Now finally we have our
mathematical analytic equation
and we can begin to simulate our Enteprise webserver
and also validate the performance data with the fwptt stress
webserver simulation tool...
---
So, as you have noticed you can
simulate our enterprise
webserver with this mathematical analytical equation and
validate the data collected by fwptt stress webserrer tool
and try to keep T( the average response time of your HTTP
requests)
bellow or equal to 5 seconds, and see also if your webserver
is scaling, and we call this also Quality control using OR
(operational research).
In quality management, quality can
also be maintained and
improved through the detection and reduction of process
variation. Where statistical techniques as ANOVA and others
can be used..
The ANOVA method , for example,
analyses the variation in
the data set to discover the amount of variation that can
be attributed to each factor.. etc.
There is still an important thing..
As you have noticed, i have finally
arrived to the following
mathematical model of our Jackson network that model an
Enterprise
website:
Ni = Nn + Ns + Nc
= (((Ss/A)/n) / (1 -((Ss/A)/n)) + (Sn/A) / (1
-(Sn/A))
+ (Sc/(F/5)) / (1 -(Sc/(F/5)))
T: is the average response time..
So read it carefully an you will
notice that this mathematical
model also can simulate and predict an intranet website behavior
or an internet website..
First look at the first factor
(((Ss/A)/n) / (1 -((Ss/A)/n))
in our mathematical model, as you have noticed as
n (number of servers) grows this will make the denominator
grow rapidly and this will make (((Ss/A)/n) / (1 -((Ss/A)/n))
smaller, that's good for the response time...
Now let's look at the other factors
in this mathematical
model:
IF the system is an Intranet, and
for example the local
area network Sn is faster than the multiserver, that means:
We have Sn >> Ss, and that
means => Ss /A >> Ss /A => that the
Knee in the server side can be reached more quickly than the
Knee in the network node of our Jackson network, so that
the bottleneck becomes the server node.
In the other hand if:
Ss >> Sn => Ss /A >>
Sn / A => the Knee in the M/M/1 network
node can be reached more quickly, so that the network node in
the Jackson network becomes the bottleneck.
Availability:
The demand for high availability and reliability of computer and
software systems has led to a formal verification of such
systems.
There are two principal approaches
to formal verification:
model checking and theorem proving.
And i wrote about Petri Nets and
how to model parallel
programs and how to exploit verification tools as Tina to verify
the properties such as liveleness of the system.
Please take a look at my other
article about Petri Nets:
http://pages.videotron.com/aminer/PetriNet/formal.htm
And by using Petri Net and Tina you
can make your system
more reliable , and if it's more reliable then the system
will have higher availability.
There are important factors to take
into consideration when
building systems, factors such us availability and
efficiency(scalability etc....).
System availability is also
calculated by modeling the system
as an interconnection of parts in series and parallel.
The availability A of a
system is calculated like this
A = MTBF / (MTBF + MTTR)
MTBF: Mean time between failure
MTTR :Mean time to repair
and the MTTR is composed of:
time to respond to the failure +
time to find
the problem + time to correct the problem + time
to verify that the system is working corretly.
Now if the interconnected parts are
parallel , the equation
that we use to calculate the availability is:
A = 1- (U1 * U2 * U3 * ... * Un)
(1)
Ui: is the non-availability of the
component part of the system...
The availability of a system
composed of interconnected parts in
series is:
A = A1 * A2 * ... * An
(2)
Now as you have noticed i have
modeled an enterprise website
as a Jackson network, and i have said before that:
--
And of course on Capacity Planning
for Enterprise Datacenters
and Websites , you can mirror many computer servers and load
balance between them with a software... to make the system much
FASTER, and this will be modeled as a jackson network like this:
A -> M/M/n Server Queue ->
M/M/1 Network queue -> M/M/1 Client -> A
A: the arrival rate to the system
--
As you have noticed i wrote
"FASTER" but i didn't spook
about the AVAILABILITY, you will notice that you can not
just
make your system more EFFICIENT and FASTER , but also
you can higher the AVAILABILITY of your system by mirroring
many servers...
I have also wrote before that:
---
"If you have noticed i have
tried to show also that one
of the principal objectives is to control and manage
the *Quality* of the system or process, i wrote in my
previous post the following:
this imply that the T(the average
response time in the Jackson
network)
is:
T = Ni /A = (Nn + Ns + Nc) /A
= (((Ss/A)/n) / (1 -((Ss/A)/n)) + (Sn/A) / (1 -(Sn/A))
+ (Sc/A) / (1 -(Sn/A))) / A
Now finally we have our
mathematical analytic equation
and we can begin to simulate our Enteprise webserver
and also validate the performance data with the fwptt stress
webserver simulation tool...
So, as you have noticed you can
simulate our enterprise
webserver with this mathematical analytical equation and
validate the data collected by fwptt stress webserrer tool
and try to keep T( the average response time of your HTTP
requests)
bellow or equal to 5 seconds, and see also if your webserver
is scaling, and we call this also Quality control using OR
(operational research).
In quality management, quality can
also be maintained and
improved through the detection and reduction of process
variation. Where statistical techniques as ANOVA and others
can be used.. "
---
We have many mathematical tools
that we can use in Quality control,
example: Correlation, Regression , ANOVA etc.
ANOVA is a powerfull and important
mathematic tool in
Quality control , by calculating the p-value in Excel or other
statistical tools , this will give us an important information
like
if one or more variables have a significant effect on other
variables. If for example the p-value <= 0.05 (or
if F > critical F value)
this imply that the null hypothesis is rejected , and the factor
(or factors)
has in fact an effect..
Please click here to return to my homepage: homepage.
Regards,
Amine Moulay Ramdane.
http://pages.videotron.com/aminer/