Given \(\sigma_a\) and \(\sigma_b\), Ask for \(\sigma\)

1. 簡化 \(\sigma\)

  • 將 \(\sigma\) 乘開
    • \(\sigma = \sqrt{\frac{\sum{(x_i-\bar x)^2}}{n}}\)
    • \(\sigma = \sqrt{\frac{\sum{x_i^2-2\bar x\sum{x_i}+n\bar x^2}}{n}}\)
  • 平均等於總和除以個數 \(\frac{\sum x_i}{n}=\bar x\),故
    • \(\sigma = \sqrt{\frac{\sum x_i^2}{n}-\frac{2\bar x\sum x_i}{n}+\frac{n\bar x^2}{n}}\)
    • \(\sigma = \sqrt{\frac{\sum x_i^2}{n}-2\bar x^2+\bar x^2}\)
    • 得 \(\boxed{\sigma = \sqrt{\frac{\sum x_i^2}{n}-\bar x^2}}-(1)\)

2. 求個別平方和

  • 由\((1)\)式可推得各別的標準差為
    • \(\boxed{\sigma_a = \sqrt{\frac{\sum x_{ai}^2}{n_a}-\bar x_a^2}}-(2)\)
    • \(\boxed{n = n_a+n_b}-(3)\)
    • \(\boxed{\sum x_i^2=\sum x_{ai}^2+\sum x_{bi}^2}-(4)\)
  • 欲求 \(\sum x_{ai}^2\),我們將\((2)\)式展開
    • \(\sigma_a^2 = \frac{\sum x_{ai}^2}{n_a}-\bar x_a^2\)
    • \(\sigma_a^2+\bar x_a^2= \frac{\sum x_{ai}^2}{n_a}\)
    • 得\(\boxed{\sum x_{ai}^2=n_a(\sigma_a^2+\bar x_a^2)}-(5)\)

3. 求總體標準差

  • 由\((1)\)式展開
    • 得 \(\boxed{\sigma = \sqrt{\frac{(\sum x_{ai}^2+\sum x_{bi}^2)}{n}-\bar x^2}}-(6)\)
  • 將\((5)\)代入\((6)\)
    • \(\boxed{\sigma=\sqrt{\frac{n_a(\sigma_a^2+\bar x_a^2)+n_b(\sigma_b^2+\bar x_n^2)}{n}-\bar x^2}}-(7)\)
  • 其中 \(\boxed{\bar x=\frac{n_a\bar x_a + n_b\bar x_b}{n}}-(8)\)
  • 故我們可以從上式輾轉得通式:
    • \(\boxed{\sigma=\sqrt{\frac{\sum(n_i(\sigma_i^2+\bar x_i^2))}{n}-\bar x^2}}-(9)\)
    • 或寫成
    • \(\boxed{\sigma=\sqrt{\frac{\sum(n_i(\sigma_i^2+\bar x_i^2))-\sum n_i\bar x_i}{n}}}-(9)\)

summary

  • 個數
    • \(\boxed{n=n_a+n_b=\sum n_i}\)
  • 平均數
    • \(\boxed{\bar x=\frac{n_a\bar x_a+n_b\bar x_b}{n_a+n_b}=\frac{\sum{n_i\bar x_i}}{\sum{n_i}}}\)
  • 標準差
    • \(\boxed{\sigma=\sqrt{\frac{n_{ai}(\sigma_{ai}^2+\bar x_{ai}^2)+n_{bi}(\sigma_{bi}^2+\bar x_{bi}^2)-(n_a\bar x_a+n_b\bar x_b)}{n_a+n_b}}=\sqrt{\frac{\sum(n_i(\sigma_i^2+\bar x_i^2))-\sum n_i\bar x_i}{\sum n_i}}}\)

4. sql

  • 現有一 table 存有
  • avg_value
  • std_value
  • site_count
with stats as (
    select
        ...
        sum(site_count*avg_value)/sum(site_count) as avg_value,
        sqrt((sum(site_count*(square(std_value)+square(avg_value)))-sum(site_count*avg_value))/sum(site_count)) as std_value,
        sum(site_count) as site_count
    from data
    where ...
    group by ...
)
select * from stats