STEAM GROUP
TF2 Outpost by Fanbyte
Membership by invitation only
STEAM GROUP
TF2 Outpost by Fanbyte
2,600
IN-GAME
22,733
ONLINE
Founded
7 August, 2011
Language
English
Showing 211-220 of 222 entries
8
Word Wrap not working?
2
Divide items into more sections, and more
1
BUYING AN EERIE CRATE KEY
8
Word Wrap not working?
6
Normal Hats
2
A little problem/doubt with the trade.
4
"hid their own post" typo
9
What are those % bars at the bottom?
Thought I would attempt to clear up what 'CPU load' is, since it is quite an interesting topic.
CPU load is the average number of processes that are queued over 5 minutes, divided by the number of cores.
Take this analogy: TF2OP is a famous fast-food chain. At this branch, it has 24 stands (we're just imagining web-1, which has 24 cores). The users come to get their burgers or whatever, hats. If when you look, there are 12 people at the stands, and nobody queueing behind them, that is 50% load.
This time, you decide to take a more accurate sample, because you're a bit weird. You stand there for 5 minutes, and note the number of cars that are being served, or waiting to be served every few seconds, so you would have, say 5 people being served or waiting to be in 30 seconds. If you divide that by 30 and by 24, and then multiply by one hundred, you would have a 30 second load average as a percentage of the total number of people who could have been served if none of them waited (we are assuming we can serve one per second). You then total that for 5 minutes and divide by the number of samples (60 times five, divided by 30- 10) to get the 5 minute load average.
If we can serve one person a second per stand (CPU core), in best case scenario for us and our customers, we would encounter 30 people being served or queueing per stand (our CPU frequency would be 1/30hz) - 24 times 30=720 people a second in 30 seconds, so the load calculation would be (720/30)/24) = 1 or 100%, then this is done and averaged over 5 minutes to get the 5 minute load average.
This gets more complex, because the things being served by the CPU are not the customers themselves, but worker threads who give the CPU their sums to do, and one worker thread may take several users at once, I'd have to look at how nginx handles connections to check.
In addition, if you examine this system, the server can always take new uses, they just have to wait, causing slowness, but not 503s.
The 503s are only caused when the server simply cannot take any more users, and that can happen for various reasons, I think the most recent of which was due to the fact that the server could not output results faster than it could process them, so internally, results would build up over time, rather like filling a bucket with a very small hole in it. The bucket would fill until it reaches the top, and the server would then start refusing new connections, because it couldn't remember any more results, causing 503s
This would be fine normally- just a nuisance, except with the requests spread over two servers, the two other servers would start to go out of sync if the high load continued, effectively pouring more water in even though the server said stop (the other servers trying frantically to tell the main server that they have processed the query), which would break things. Those things took a while to fix, causing some of the downtimes.
Luckily, it was Sneeza, not me who dealt with the problems, because man, I would explode with stress trying to get it back up again.

If you don't read the above, I don't blame you. I have actually started to get a headache trying to explain that all effectively There is a tl;dr below.

tl;dr - <100% everyone gets served immediately >100% some people have to wait.
If the value were 120%, 20% of people would have to wait

Further reading on server load: wikipedia's page on Load (computing)
Showing 211-220 of 222 entries