Diving into Statistics: Standard Deviation

I have a pretty damning admission: I know very, very little about statistics. I remember (or can quickly recall with the help of Wikipedia) what the mean, median, and mode are, and I've dealt with some distributions and statistical physics as a result of past coursework, but were I presented with a data set (say, sales as a function of time), I'd be hard pressed to give you much else.

Which leads me to Standard Deviation. So, first of all, the mean. Let's say I have a dataset of seven points: [1, 1.5, 4, 5, 6, 6, 9]. The mean is just the sum of the values, divided by how many there are - in other words, 32.5/7 or 4.6. That's your classes Algebra II (?) mean, and it's nifty

But let's say you have two datasets - [50, 50, 50] and [0, 50, 100] - they share the same mean, 50, but they look way different. One dataset is all 50's, while the other ranges from 0 to 50; were these temperatures over the course of a day, for instance, one could be a survivable, hot day, while the other would swing from freezing to boiling. Way, way, different. So, and this part I vaguely recalled, is there a way to quantify this spread? Among them, standard deviation presents an option

And apparently, it's not too wild a concept. First, determine the mean - we'll use the simple sets above, and set it to 50. Check. Next, things get spicier - we basically want to get a sense of the distance of the data points from this average, but, while you could in fact just average the absolute values of these distances (for the second set, that'd be 50+0+50/3 or 33), this method doesn't "point out" or amplify more egregious spreads - which is to say, it gives the same weight to small deviations from the mean as well as massive ones.

So, how do we "pick out" these more extreme outliers? One simple way would be to square these differences, which would disproportionately "amplify the signal" from far out points. So, instead of what is apparently called the MAD or Mean Absolute Deviation around the Mean (very literal), where we do as above and get a MAD of 33, we'd square those differences, and get what's called the Variance, or (50^2+50^2)/3, or 1,667. Finally, in an apparent effort to get this unwieldy, large number into both the units and the scale of the original set, we simply take the square root, yielding a Standard Deviation of ~41 (right?)

So, yeah, that's pretty cool. One question I'm left with is: is there a way to normalize that number, in order to get a sense of spread that applies to all sets? Like a number between 0 and 1? Word on the street is there's this thing called R^2, an attempt to do just that, but that's for another post