although @ Leo Li Shiting's method can solve the problem, it is not efficient and does not make full use of the matrix computing power of the numpy class library.
the following provides a more concise and efficient method, from which students can understand the subtlety of numpy matrix operations.
suppose you already know how to calculate the standard deviation of a set of numbers, otherwise see https://zh.wikipedia.org/zh-h.
.
for a set of numbers [100,200,300] , and their corresponding numbers [1,2,3]
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'a': [100, 200, 300],
'b': [1, 2, 3], -sharp a
})
-sharp n m sd
n = df.b.sum()
m = (df.a * df.b).sum() / n
sd = ((df.b * ((df.a - m) ** 2)).sum() / n) ** 0.5
-sharp
plt.hist(df.a, weights=df.b)
about data
A B
100 2
200 3
300 4
...
can be seen as a list
that looks like this.
.
Standard distribution
you can use std ()
of numpy
to calculate the standard deviation, or you can write your own formula. For example,
import pandas as pd
df = pd.DataFrame({'A':[100,200,300],'B':[2,3,4]})
"""
df
A B
0 100 2
1 200 3
2 300 4
"""
l = []
for i, j in zip(df['A'],df['B']):
tmp = [i]*j
l.extend(tmp)
"""
l
[100, 100, 200, 200, 200, 300, 300, 300, 300]
"""