平方根指数型最大値分布による確率雨量計算

最近ある方から、昔作った文書http://civilyarou.web.fc2.com/WANtaroHP_html5_win/f90_ENGI/dir_HFA/suimon.pdfの平方根指数型最大値分布に関連して、メールを頂いた。そのころは、C あるいは Fortran90 を使ってプログラムを作っていたのだが、最近使っている Python で書いてみたくなった。このため、計算理論は焼き直しであるが、あたらしく Python でプログラムを作ってみた。２変数確率分布であるが、未知パラメータを算定するには工夫が必要である。

プログラムとしては、観測データからプロッティングポジションを通さずに直接未知パラメータを計算する従来法と、プロッティングポジションを通して確率分布関数に含まれる未知パラメータを、Scipy の非線形最小二乗法で求める方法の、２種類を紹介している。

計算理論

SQRT exponential-type distribution of maximum (SQRT-ET)

Probabulity Density Function

$\begin{equation} f(x)=\cfrac{a b}{2}\cdot \exp\left\{-\sqrt{b x}-a\left(1+\sqrt{b x}\right)\cdot\exp\left(-\sqrt{b x}\right)\right\} \qquad (x \geqq 0) \end{equation}$

Cumulative Distribution Function

$\begin{equation} F(x)=\exp\left\{-a\left(1+\sqrt{b x}\right)\cdot\exp\left(-\sqrt{b x}\right)\right\} \qquad (x \geqq 0) \end{equation}$

Quantile $x_p$ for non-exceedance probability $p$

$\begin{gather} p=\exp\left\{-a\left(1+\sqrt{b x}\right)\cdot\exp\left(-\sqrt{b x}\right)\right\}=\exp\left[-a(1+t)\cdot\exp(-t)\right] \qquad \left(t=\sqrt{b x}\right) \\ \rightarrow \quad x=\cfrac{t{}^2}{b} \qquad \ln(1+t)-t=\ln\left[-\cfrac{1}{a}\ln(p)\right] \end{gather}$

$\begin{equation} x_p=\cfrac{t_p{}^2}{b} \qquad \ln(1+t_p)-t_p=\ln\left[-\cfrac{1}{a}\ln(p)\right] \end{equation}$

In above, the conditions of $a \gt 0$ and $b \gt 0$ shall be satisfied, because the terms in logarithmic function and square-root have to be positive values.

Calculation Method of $t_p$

Since the relationship between $t$ and $p$ is non-linear, Newton-Raphson Method is used to obtain the value of $t$ at non-exceedance probability of $p$ . Following functions are considered, where $g'$ is the differential of function $g$ .

$\begin{align} &g(t)=\ln(1+t)-t-\ln\left[-\cfrac{1}{a}\ln(p)\right] \\ &g'(t)=\cfrac{1}{1+t}-1 \end{align}$

To obtain the value of $t$ at $g(t)=0$ , iterative calculation will be carried out using following equation.

$\begin{equation} t_{i+1}=t_{i}-\cfrac{g(t_{i})}{g'(t_{i})} \qquad i \text{ : counter for iterative calculation} \end{equation}$

As an initial value of $t$ , $t_0=\sqrt{b\cdot x_{max}}$ can be used.

Estimation of Parameters (Maximum Likelihood Method)

The parameters $a$ and $b$ are determined with a condition that a log-likelihoog function $L$ shown below has the maximum value.

$\begin{align} L(a,b)=&\sum_{j=0}^N \ln f(x_j) \\ =&N\ln a + N\ln b - N\ln 2 - \sum_{j=1}^N \sqrt{b x_j} - a \left\{\sum_{j=1}^N \exp\left(-\sqrt{b x_j}\right) + \sum_{j=1}^N \left[\sqrt{b x_j}\cdot \exp\left(-\sqrt{b x_j}\right)\right]\right\} \end{align}$

From the conditions that a partial differential of $L$ by $b$ becomes $0$ and a partial differential of $L$ by $a$ becomes $0$ , following relations can be obtained.

$\begin{align} &\cfrac{\partial L}{\partial b}=0 & \rightarrow & a=\cfrac{\sum_{j=1}^N \sqrt{b x_j} - 2 N}{\sum_{j=1}^N \left[(b x_j)\cdot \exp\left(-\sqrt{b x_j}\right)\right]} &= a_1 \\ &\cfrac{\partial L}{\partial a}=0 & \rightarrow & a=\cfrac{N}{\sum_{j=1}^N \exp\left(-\sqrt{b x_j}\right) + \sum_{j=1}^N \left[\sqrt{b x_j}\cdot \exp\left(-\sqrt{b x_j}\right)\right]} &= a_2 \end{align}$

Since a function $L$ has the maximum value at the condition of $a_1=a_2$ , the parameter $b$ can be obtained using Bisection Method considering following equation.

$\begin{equation} h(b)=a_1(b)-a_2(b) \end{equation}$

It shall be noted that a relation of $a_2 \gt 0$ is everytime true, but $b$ has to satisfy the following relation to ensure the condition of $a_1 \gt 0$ .

$\begin{equation} a_1 \gt 0 \quad \rightarrow \quad b \gt \left(\cfrac{2 N}{\sum_{j=1}^N \sqrt{x_j}}\right)^2 \end{equation}$

Above relation can be used as the lowest limit of $b$ in Bisection Method. Regarding the highest limit of $b$ for Bisection Method, it is no information. Therefore, following method in which searching range of solution is moved with an interval of 0.5 is applied for the calculation.

# Sample Code for Bisection method by Python
# (xx : vector of observed values)
# (bb : unknown scalar parameter to be solved)
        def func(bb,xx):
            # function for Bisection Method
            nn=len(xx)
            a1=(np.sum(np.sqrt(bb*xx))-2*nn)/np.sum((bb*xx)*np.exp(-np.sqrt(bb*xx)))
            a2=nn/(np.sum(np.exp(-np.sqrt(bb*xx)))+np.sum(np.sqrt(bb*xx)*np.exp(-np.sqrt(bb*xx))))
            return a1-a2
        imax=100    # maximum number of iterations
        err=1.0e-10 # allowable error of the zero value
        x1=(2*len(xx)/np.sum(np.sqrt(xx)))**2
        x2=x1+0.5
        for i in range(0,imax):
            xi=0.5*(x1+x2)
            f1=func(x1,xx)
            f2=func(x2,xx)
            fi=func(xi,xx)
            if 0<f1*f2:
                x1=x2
                x2=x1+0.5
            elif abs(x2-x1) < err:
                break # converged
            else:
                if f1*fi < 0.0: x2=xi
                if fi*f2 < 0.0: x1=xi
        return xi

結果出力

用いたデータは、「設計確率雨量の推定（３）」で用いた１つである、宮崎の日最大雨量データである。

従来法	Scipy 非線形回帰

Return Period and Maximum Daily Raimfall (unit: mm/day)
Year	2	5	10	20	50	100	200	500	1000	2000	5000	10000
Conventional	166.7	230.3	277.4	326.2	394.8	450.0	508.4	590.4	656.1	725.1	821.2	897.6
Scipy leasrsq	166.7	237.5	290.4	345.6	423.5	486.5	553.3	647.4	723.1	802.6	913.7	1002.2

従来法によるプログラム

上述した、二分法により未知パラメータ $a$ および $b$ を計算する方法によるプログラムを以下に示す。プロッティング・ポジション公式は、Hazen 公式としている。

import numpy as np
import matplotlib.pyplot as plt


def qqplot(xx,ye,ss):
    fsz=12
    xmin=0; xmax=700; dx=100
    ymin=0; ymax=700; dy=100
    plt.figure(figsize=(4,4),facecolor='w')
    plt.rcParams['font.size']=fsz
    plt.rcParams['font.family']='sans-serif'
    plt.xlim([xmin,xmax])
    plt.ylim([ymin,ymax])
    plt.xlabel('Observed (mm/day)')
    plt.ylabel('Predicted (mm/day)')
    plt.xticks(np.arange(xmin,xmax+dx,dx))
    plt.yticks(np.arange(ymin,ymax+dy,dy))
    plt.grid(color='#999999',linestyle='solid')
    plt.gca().set_aspect('equal',adjustable='box')
    plt.plot(xx,ye,'o',clip_on=False)
    plt.plot([xmin,xmax],[ymin,ymax],'-',color='#ff0000',lw=1)
    xs=xmin+0.05*(xmax-xmin)
    ys=ymin+0.95*(ymax-ymin)
    plt.text(xs,ys,ss,va='top',ha='left',fontsize=fsz,fontname='Ricty Diminished')
    plt.tight_layout()
    fnameF='fig_qq_M_org.jpg'
    plt.savefig(fnameF, dpi=100, bbox_inches="tight", pad_inches=0.1)
    plt.show()    

    
def sqrtet(xx,pp,pyear):
    def est_sqrtet(aa,bb,pp,x0):
        # estimation of quantile by Newton-Raphson Method
        def gx(x,aa,p):
            ggx=np.log(1+x)-x-np.log(-1/aa*np.log(p))
            gdx=1/(1+x)-1
            return ggx,gdx
        imax=100
        eps=1.0e-10
        tp=np.zeros(len(pp),dtype=np.float64)
        x1=x0
        for i,p in enumerate(pp):
            for j in range(imax):
                ggx,gdx=gx(x1,aa,p)
                x2=x1-ggx/gdx
                if np.abs(x2-x1)<eps: break
                x1=x2
            tp[i]=x2
        return tp**2/bb
    
    def bisection(xx):
        # estimation of coefficient bb by Bisection Method
        def func(bb,xx):
            # function for Bisection Method
            nn=len(xx)
            a1=(np.sum(np.sqrt(bb*xx))-2*nn)/np.sum((bb*xx)*np.exp(-np.sqrt(bb*xx)))
            a2=nn/(np.sum(np.exp(-np.sqrt(bb*xx)))+np.sum(np.sqrt(bb*xx)*np.exp(-np.sqrt(bb*xx))))
            return a1-a2
        imax=100    # maximum number of iterations
        err=1.0e-10 # allowable error of the zero value
        x1=(2*len(xx)/np.sum(np.sqrt(xx)))**2
        x2=x1+0.5
        for i in range(0,imax):
            xi=0.5*(x1+x2)
            f1=func(x1,xx)
            f2=func(x2,xx)
            fi=func(xi,xx)
            if 0<f1*f2:
                x1=x2
                x2=x1+0.5
            elif abs(x2-x1) < err:
                break # converged
            else:
                if f1*fi < 0.0: x2=xi
                if fi*f2 < 0.0: x1=xi
        return xi    
    
    bb=bisection(xx)
    aa=(np.sum(np.sqrt(bb*xx))-2*len(xx))/np.sum((bb*xx)*np.exp(-np.sqrt(bb*xx)))
    # estimation
    x0=np.sqrt(bb*xx[-1])
    ye=est_sqrtet(aa,bb,pp,x0)
    rr=np.corrcoef(xx, ye)[0][1]
    # probable value
    prob=1.0-1.0/pyear
    yp=est_sqrtet(aa,bb,prob,x0)
    res=np.array([aa,bb,rr])
    ss='SQRT-ET\na={0:8.3f}\nb={1:8.3f}\nr={2:8.3f}\nn={3:4.0f}'.format(res[0],res[1],res[2],len(xx))
    return res,ye,yp,ss


def main():
    pyear=np.array([2,5,10,20,50,100,200,500,1000,2000,5000,10000])
    xx=np.array([490.2, 175.7, 285.6, 152.9, 199.1,  98.4, 135.3, 188.3, 143.1, 157.2,
                 165.1, 119.5, 175.9, 198.7, 293.0, 122.4, 104.3, 154.8, 112.0, 305.8,
                 108.6, 178.8, 135.3, 136.5, 142.9, 175.4, 197.2, 165.0, 232.5,  97.4,
                 328.9, 316.7, 125.1, 194.9, 232.2, 182.8, 298.1, 115.9, 175.4, 180.1,
                  99.8, 321.7, 146.7, 156.7, 191.5, 163.2,  83.6, 217.9, 430.4, 143.7,
                 137.6, 132.8, 202.9, 587.2, 157.5, 170.1, 213.4, 301.9, 112.9, 205.2,
                 168.0, 129.5, 175.8, 155.6, 221.4, 341.9, 171.5, 181.1, 173.0, 151.5,
                 116.3, 145.4, 124.6, 172.7, 135.0, 197.0, 124.8, 172.6, 207.7, 131.1,
                 387.3,  98.9, 242.0, 112.5, 100.0, 200.0, 186.5, 108.0, 132.5, 162.5,
                 121.0, 155.5,  93.5, 286.5, 167.0, 103.5, 141.5, 315.5, 102.5, 207.5,
                  86.0, 233.5, 146.5, 228.5, 437.5, 108.5, 120.0, 317.5, 139.5, 203.0,
                 132.5, 234.0, 144.5, 217.5, 133.0, 217.0,  91.0, 135.5, 188.0, 310.0,
                 155.0, 199.5, 280.5, 137.5, 125.5, 150.0, 247.5, 119.0, 200.0, 124.5,
                 222.0, 268.0]) # Miyazaki

    # coefficient for plotting position formula
    #alpha=0.0 # Weibull
    #alpha=0.4 # Cunnane
    alpha=0.5 # Hazen
    xx=np.sort(xx)
    # setting protting position
    n=len(xx)
    ii=np.arange(n)+1
    pp=(ii-alpha)/(n+1-2*alpha)
    res,ye,yp,ss=sqrtet(xx,pp,pyear)    
    qqplot(xx,ye,ss)

    s0='{0:4s}'.format('Year')
    s1='{0:4s}'.format('SQRT')
    for i in range(len(pyear)):
        s0=s0+',{0:6.0f}'.format(pyear[i])
        s1=s1+',{0:6.1f}'.format(yp[i])
    print(s0)
    print(s1)
        
    
#==============
# Execution
#==============
if __name__ == '__main__': main()

Scipyの非線形最小二乗法による回帰プログラム

未知パラメータ $a$ および $b$ を、Scipy の非線形最小二乗法による回帰計算で求めるプログラムを示す。ここで、非線形回帰を行うために $a$ および $b$ の初期値を入力する必要があるが、ここでは以下のように初期値を定めた。

$b$ の初期値

$b$ の初期値としては小さめの値を与えたいため、二分法の解の検索区間の最小値である以下の値を $b$ の初期値とした。

$\begin{equation} b_0=\left(\cfrac{2 N}{\sum_{j=1}^N \sqrt{x_j}}\right)^2 \end{equation}$

$a$ の初期値

クオンタイル $x_p$ を求めるための $t$ と $p$ および $a$ の関係式より、以下の関係を導ける。

$\begin{equation} \ln(1+t_p) - t_p = \ln\left[ -\cfrac{1}{a}\ln(p) \right] \quad \rightarrow \quad a = -\cfrac{\ln(p)}{\exp[\ln(1+t)-t]} \end{equation}$

上式において、大き目の $a$ を初期値として与えたいため、 $p=1\times 10^{-6}$ 、 $t=\sqrt{b_0 \cdot x_{max}}$ としたときの $a$ を初期値とした。すなわち、

$\begin{equation} a_0 = -\cfrac{\ln(1\times 10^{-6})}{\exp[\ln(1+t_0)-t_0]} \qquad t_0=\sqrt{b_0 \cdot x_{max}} \end{equation}$

プログラム全文

プログラム全文を以下に示す。プロッティング・ポジション公式は、Hazen 公式としている。

import numpy as np
from scipy import optimize # for use of leastsq
import matplotlib.pyplot as plt


def qqplot(xx,ye,ss):
    fsz=12
    xmin=0; xmax=700; dx=100
    ymin=0; ymax=700; dy=100
    plt.figure(figsize=(4,4),facecolor='w')
    plt.rcParams['font.size']=fsz
    plt.rcParams['font.family']='sans-serif'
    plt.xlim([xmin,xmax])
    plt.ylim([ymin,ymax])
    plt.xlabel('Observed (mm/day)')
    plt.ylabel('Predicted (mm/day)')
    plt.xticks(np.arange(xmin,xmax+dx,dx))
    plt.yticks(np.arange(ymin,ymax+dy,dy))
    plt.grid(color='#999999',linestyle='solid')
    plt.gca().set_aspect('equal',adjustable='box')
    plt.plot(xx,ye,'o',clip_on=False)
    plt.plot([xmin,xmax],[ymin,ymax],'-',color='#ff0000',lw=1)
    xs=xmin+0.05*(xmax-xmin)
    ys=ymin+0.95*(ymax-ymin)
    plt.text(xs,ys,ss,va='top',ha='left',fontsize=fsz,fontname='Ricty Diminished')
    plt.tight_layout()
    fnameF='fig_qq_M_reg.jpg'
    plt.savefig(fnameF, dpi=100, bbox_inches="tight", pad_inches=0.1)
    plt.show()    

    
def sqrtet(xx,pp,pyear):
    def est_sqrtet(aa,bb,pp,x0):
        # estimation of quantile by Newton-Raphson Method
        def fx(tp,aa,p):
            return np.log(1+tp)-tp-np.log(-1/aa*np.log(p))
        tp=np.zeros(len(pp),dtype=np.float64)
        for i,p in enumerate(pp):
            root = optimize.newton(fx,x0,args=(aa,p))
            tp[i]=root
        return tp**2/bb

    def func_sqrtet(parameter,xx,pp):
        # function for leastsq
        aa=parameter[0]
        bb=parameter[1]
        x0=np.sqrt(bb*xx[-1])
        est=est_sqrtet(aa,bb,pp,x0)
        residual=xx-est
        return residual
    
    # parameter calculation by scipy.optimize.leastsq
    b0=(2*len(xx)/np.sum(np.sqrt(xx)))**2
    t0=np.sqrt(b0*xx[-1])
    a0=-np.log(1e-6)/np.exp(np.log(1+t0)-t0)
    npara=2; parameter0 = np.array([a0,b0])
    result = optimize.leastsq(func_sqrtet,parameter0,args=(xx,pp))
    aa=result[0][0]
    bb=result[0][1]
    # estimation
    x0=np.sqrt(bb*xx[-1])
    ye=est_sqrtet(aa,bb,pp,x0)
    rr=np.corrcoef(xx, ye)[0][1]
    # probable value
    prob=1.0-1.0/pyear
    yp=est_sqrtet(aa,bb,prob,x0)
    res=np.array([aa,bb,rr])
    ss='SQRT-ET\na={0:8.3f}\nb={1:8.3f}\nr={2:8.3f}\nn={3:4.0f}'.format(res[0],res[1],res[2],len(xx))
    return res,ye,yp,ss


def main():
    pyear=np.array([2,5,10,20,50,100,200,500,1000,2000,5000,10000])
    xx=np.array([490.2, 175.7, 285.6, 152.9, 199.1,  98.4, 135.3, 188.3, 143.1, 157.2,
                 165.1, 119.5, 175.9, 198.7, 293.0, 122.4, 104.3, 154.8, 112.0, 305.8,
                 108.6, 178.8, 135.3, 136.5, 142.9, 175.4, 197.2, 165.0, 232.5,  97.4,
                 328.9, 316.7, 125.1, 194.9, 232.2, 182.8, 298.1, 115.9, 175.4, 180.1,
                  99.8, 321.7, 146.7, 156.7, 191.5, 163.2,  83.6, 217.9, 430.4, 143.7,
                 137.6, 132.8, 202.9, 587.2, 157.5, 170.1, 213.4, 301.9, 112.9, 205.2,
                 168.0, 129.5, 175.8, 155.6, 221.4, 341.9, 171.5, 181.1, 173.0, 151.5,
                 116.3, 145.4, 124.6, 172.7, 135.0, 197.0, 124.8, 172.6, 207.7, 131.1,
                 387.3,  98.9, 242.0, 112.5, 100.0, 200.0, 186.5, 108.0, 132.5, 162.5,
                 121.0, 155.5,  93.5, 286.5, 167.0, 103.5, 141.5, 315.5, 102.5, 207.5,
                  86.0, 233.5, 146.5, 228.5, 437.5, 108.5, 120.0, 317.5, 139.5, 203.0,
                 132.5, 234.0, 144.5, 217.5, 133.0, 217.0,  91.0, 135.5, 188.0, 310.0,
                 155.0, 199.5, 280.5, 137.5, 125.5, 150.0, 247.5, 119.0, 200.0, 124.5,
                 222.0, 268.0]) # Miyazaki

    # coefficient for plotting position formula
    #alpha=0.0 # Weibull
    #alpha=0.4 # Cunnane
    alpha=0.5 # Hazen
    xx=np.sort(xx)
    # setting protting position
    n=len(xx)
    ii=np.arange(n)+1
    pp=(ii-alpha)/(n+1-2*alpha)
    res,ye,yp,ss=sqrtet(xx,pp,pyear)    
    qqplot(xx,ye,ss)

    s0='{0:4s}'.format('Year')
    s1='{0:4s}'.format('SQRT')
    for i in range(len(pyear)):
        s0=s0+',{0:6.0f}'.format(pyear[i])
        s1=s1+',{0:6.1f}'.format(yp[i])
    print(s0)
    print(s1)
        
    
#==============
# Execution
#==============
if __name__ == '__main__': main()

Thank you.

記事の先頭に行く

damyarou

python, GMT などのプログラム

設計確率雨量の算定（４）追補

平方根指数型最大値分布による確率雨量計算

計算理論

SQRT exponential-type distribution of maximum (SQRT-ET)

Probabulity Density Function

Cumulative Distribution Function

Quantile $x_p$ for non-exceedance probability $p$

Calculation Method of $t_p$

Estimation of Parameters (Maximum Likelihood Method)

結果出力

従来法によるプログラム

Scipyの非線形最小二乗法による回帰プログラム

$b$ の初期値

$a$ の初期値

プログラム全文

Thank you.

平方根指数型最大値分布による確率雨量計算

計算理論

SQRT exponential-type distribution of maximum (SQRT-ET)

Probabulity Density Function

Cumulative Distribution Function

Quantile for non-exceedance probability

Calculation Method of

Estimation of Parameters (Maximum Likelihood Method)

結果出力

従来法によるプログラム

Scipyの非線形最小二乗法による回帰プログラム

の初期値

の初期値

プログラム全文

Thank you.

Quantile $x_p$ for non-exceedance probability $p$

Calculation Method of $t_p$

$b$ の初期値

$a$ の初期値