版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、Pattern Recognition & artificial IntelligenceLecture 10: 聚類算法(六),1,EXPECTATION MAXIMIZATION (EM) Jensen’s inequality Single Gaussian Model (SGM) Maximum Likelihood (ML) estimation EM estimatio
2、n Gaussian Mixture Model (GMM) Difference between K-means and EM Applications,Model-based clustering (1),2,3,Jensen’s inequality,Mathematical Foundation (1),Convex Functions,Definition: Let f be a real val
3、ued function defined on an interval I =[a, b], f is said to be convex on I, if,4,Jensen’s inequality,Mathematical Foundation (1),Convex Functions,Theorem: If f(x) is twice differentiable on [a, b] and f’’(x)≥0 on [a, b]
4、, then f(x) is convex on [a, b].,f’(x) increases gradually, which means f’’(x) ≥0,5,Jensen’s inequality,Mathematical Foundation (1),Concave Functions,Definition: Let f be a real valued function defined on an interval I
5、=[a, b], f is said to be concave on I, if,6,Jensen’s inequality,Mathematical Foundation (1),Concave Functions,6,Theorem: If f(x) is twice differentiable on [a, b] and f’’(x)<0 on [a, b], then f(x) is concave on [a, b]
6、.,f’(x) increases gradually, which means f’’(x)≤0,7,Jensen’s inequality,Mathematical Foundation (2),Expectation of a function,7,Theorem: If X is a random variable, and Y=g(X), then:,Where:,is the probability density of X
7、,discrete,Continuous,8,Jensen’s inequality,8,,Jensen’s inequality: For convex functionFor concave function,Generalized convex function,,Generalized concave function,,Equality holds if and only if
8、 or f is linear.,9,Single Gaussian Model (SGM),Sampling,10,Single Gaussian Model (SGM),Sampling,,Given x, it is a function of ? and ?2,We want to maximize it.(LIKELIHOOD FUNCTION),Independent,Likelihood Func
9、tion,Parameter space Θ X1 ,…, Xn : i.i.d. observations θ: an unknown parameter Θ: The set of all possible values of θ Joint p.d.f. or p.m.f. of X1 ,…, Xn,Maximum Likelihood Estimation,11,Likelihood Function
10、 of θ For observed x1 ,…, xn :,,The joint p.d.f. or p.m.f. is a function of χ1,…,χn for given θ .The likelihood function is a function of θ for given χ1,…,χn .,Likelihood Function,Maximum Likelihood Estimation,12
11、,Calculation of MLE,Maximum Likelihood Estimation: Need to find which maximizes the likelihood function .Simple Example:Two independent Bernoulli trials with suc
12、cess probability θ. θ is known : 1/4 or 1/3 (Θ).The probabilities of observing x = 0, 1, 2 successes can be calculated. Let’s look at the following table.,Maximum Likelihood Estimation,13,Calculation of MLE,Maximum Li
13、kelihood Estimation,Probability of Observing x Successes The MLE is chosen to maximize for observed χ. χ=0: χ =1 or 2:,14,Calculation of MLE,Maximum Likelihood Est
14、imation,Log-likelihood function: Setting the derivative of L(θ) equal to zero and solving for θ.Note: The likelihood function must be differentiable and then this method can be used.,15,Example : Nor
15、mal Distribution,Suppose x1,…,xn is a random sample from a normal distribution with p.d.f.: a vector parameter θ: Likelihood Function:,Maximum Likelihood Estimation,16,Maximum Likelihood Estimation,17,Maximizethis
16、 instead,By setting,and,Example : Normal Distribution,Maximum Likelihood Estimation,18,Example : Normal Distribution,19,IF THERE IS MISSING DATA, HOW TO ESTIMATE THE MODEL PARAMETERS?IF IT IS NOT SGM BUT THE GAUSSIAN MI
17、XTURE MODEL, HOW TO ESTIMATE THE PARAMETERS FOR EACH MODEL?,Problems,What is EM?,In statistics, an expectation–maximization (EM) algorithm is an iterative method for finding maximum likelihood or maximum a posteriori (MA
18、P) estimates of parameters in statistical models, where the model depends on unobserved latent variables Motivation: to deal with Incomplete dataData with latent variable (hidden). For example, clustering the people i
19、n our lab, a latent variable exists: if it is the height, then Gaussian distribution; if it is gender, then Bernoulli distribution,20,EM estimation,Basic setting in EM,21,EM estimation,X is a set of data points: observed
20、 dataΘ is a parameter vector.EM is a method to find θML whereCalculating P(X | θ) directly is hard.Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or “missing” data).,Recall: maximum likelihood,22,
21、EM estimation,,Latent variables or missing data,23,EM estimation,,Incomplete Data,Complete Data,Missing data,Complete Data Likelihood,EM estimation,,Complete Data Likelihood,EM estimation,,Log likelihood,In many cases, w
22、e cannot find the solution directly. An alternative is to find a sequence:,s.t.,Complete Data Likelihood,EM estimation,Define: : the probability of y : function of y, namely g(y),0,Ln(x) is str
23、ictly concave on (0, ),,Complete Data Likelihood,EM estimation,We need to choose the maximum lower bound,How to choose Q(y)?,If is given, Q(y) and P(xi, y) will determine the value of l(𝛩),Θ,Step1: find Q(
24、y) (Expectation)Step2: according to Q(y), maximize l(𝛩) by adjusting 𝛩 (Maximization),EM estimation,Expectation Step: how to choose Q(y),According to the Jensen inequality, when g(y) is a constant, the l
25、ower bound is maximum.,We know that 𝑄(𝑦 is the probability of y, so,It is easily obtained:,EM estimation,Expectation Step: how to choose Q(y),EM estimation,Maximization Step: maximize l(𝛩) by ad
26、justing 𝛩,EM algorithm is an iterative procedure for maximizing . Assume that after the Iteration the current estimation for is given by . Since the objective is to maximize , we wish t
27、o compute an update estimate such that: Equivalently, we want to maximize the difference,EM estimation,How to guarantee the likelihood function L(θ) is increased at each step?,Denote the hidden random vector by ,
28、 the total probability is,EM estimation,How to guarantee the likelihood function L(θ) is increased at each step?,EM estimation,How to guarantee the likelihood function L(θ) is increased at each step?,EM est
29、imation,Our objective is to choose a value of so that is maximized.,Gaussian Mixture Model,35,Single Gaussian,Solution: MLE by maximizing,,Gaussian Mixture Model,36,Multiple Gaussians,Which component do
30、es each point belong to?Solution: ???,EM for GMM,37,The aim is to estimate the unknown parameters representing between the Gaussians and the means and covariances of each:,let x=(x1,x2,…,xn) be a sample of n independent
31、 observations from a mixture of multiple normal distributions of dimension d, and let z=(z1,z2,…,zn) be the latent variables that determine the component from which the observation originates.,where,with,EM for GMM,38,Th
32、e incomplete-data likelihood function is:,The complete-data likelihood function is:,where𝕀 is an indicator function and f is the probability density function of a multivariate normal.,EM for GMM,39,This may be re
33、written in log likelihood function form:,E-step:,,,EM for GMM,40,M-step:,,,Using MLE to find solution of,,,EM for GMM,41,M-step:,,,Eg:,SMM:,GMM:,EM for GMM,42,M-step:,,,Eg:,SMM,GMM,EM for GMM,43,M-step:,,,Eg:,,S.t.,,Diff
34、erence between K-means and EM,44,K-means,Classifier(K-Means),Classification Resultsx1?C(x1)x2?C(x2)…xi?C(xi)…,Cluster Parametersm1 for C1m2 for C2…mk for Ck,,Difference between K-means and EM,45,K-means,x1={r1,
35、 g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Classification Results (1)C(x1), C(x2), …, C(xi),Initial Guess of Cluster Parametersm1 , m2 , …, mk,Input (Known),Output (Unknown),Cluster Parameters(1)m1 , m2 , …, mk,Cla
36、ssification Results (2)C(x1), C(x2), …, C(xi),Cluster Parameters(2)m1 , m2 , …, mk,Classification Results (ic)C(x1), C(x2), …, C(xi),Cluster Parameters(ic)m1 , m2 , …, mk,,,???,???,???,???,Difference between K-means
37、and EM,46,K-means,Boot Step:Initialize K clusters: C1, …, CKEach Cluster is represented by its mean mjIteration Step:Estimate the cluster of each dataRe-estimate the cluster parameters,Difference between K-means a
38、nd EM,47,Kmeans ->EM,Boot Step:Initialize K clusters: C1, …, CKIteration Step:Estimate the cluster of each dataRe-estimate the cluster parameters,(?j, ?j) and P(Cj) for each cluster j.,For each cluster j,Expect
39、ation,Maximization,,,Difference between K-means and EM,48,Classifier(EM),Classification Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,Cluster Parameters(?1,?1),p(C1) for C1(?2,?2),p(C2) for C2…(?k,?k),p(Ck) for Ck,,EM,Dif
40、ference between K-means and EM,49,EM,x1={r1, g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Cluster Parameters(?1,?1), p(C1) for C1(?2,?2), p(C2) for C2…(?k,?k), p(Ck) for Ck,Input (Known),Output (Unknown),Classificati
41、on Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,,Difference between K-means and EM,50,EM(E-step),x1={r1, g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Cluster Parameters(?1,?1), p(C1) for C1(?2,?2), p(C2) for C2…(?k,?k), p(
42、Ck) for Ck,Input (Known),Output,Classification Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,Estimation,+,,Difference between K-means and EM,51,EM (M-step),x1={r1, g1, b1}x2={r2, g2, b2}…xi={ri, gi, bi}…,Cluster Parameters
43、(?1,?1), p(C1) for C1(?2,?2), p(C2) for C2…(?k,?k), p(Ck) for Ck,Input (Known),Input (Estimation),Output,+,,Classification Resultsp(C1|x1)p(Cj|x2)…p(Cj|xi)…,Difference between K-means and EM,52,EM,Boot Step:Ini
44、tialize K clusters: C1, …, CKIteration Step:Expectation StepMaximization Step,(?j, ?j) and P(Cj) for each cluster j.,53,Case Study (1),We have a data set with 1000 2-D data points (three clusters, k=3), how can we
45、 cluster it?,54,Case Study (2),Step 1: Initialize the partitionRandomly select 3 points as initial class centers;Partition the data with the 3 mean pointsStep 2: E stepStep 3: EM algorithmdo M StepE Stepuntil con
46、vergence,55,Step 1: Initialize the partition,56,57,58,Image Segmentation using EM,Step 1: Feature ExtractionStep 2: Image Segmentation using EM,Symbols,The feature vector for pixel i is called xi.There are going to be
47、K segments; K is given.The j-th segment has a Gaussian distribution with parameters ?j=(?j,?j).?j's are the weights (which sum to 1) of Gaussians. ? is the collection of parameters:? =(?1, …, ?k, ?1, …, ?k),Init
48、ialization,Each of the K Gaussians will have parameters ?j=(?j,?j), where?j is the mean of the j-th Gaussian.?j is the covariance matrix of the j-th Gaussian.The covariance matrices are initialed to be the identity ma
49、trix.The means can be initialized by finding the average feature vectors in each of K windows in the image; this is data-driven initialization.,E-Step,M-Step,Sample Results,KeyPoints:,Maximum likelihood estimationGMME
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 模式識別與人工智能之八
- 模式識別與人工智能之四
- 模式識別與人工智能之五
- 模式識別與人工智能
- 模式識別與人工智能整理
- 模式識別與人工智能之七-part1
- 模式識別與人工智能之七-part2
- 人工智能與模式識別的發(fā)展
- 模式識別在人工智能方面的應(yīng)用
- 模式識別、人工智能與醫(yī)學(xué)專家系統(tǒng)之間的關(guān)系
- 人工智能虹膜識別
- 大數(shù)據(jù)與人工智能-解惑
- 人工智能十宗罪
- 講座名稱大數(shù)據(jù)與人工智能
- 計算機(jī)科學(xué)與人工智能
- 煤氣流分布模式識別與布料指導(dǎo)人工智能系統(tǒng)的設(shè)計與實現(xiàn).pdf
- 圍棋人機(jī)大戰(zhàn)背后與人工智能發(fā)展
- 第十講 句法模式識別
- 人工智能原理人工智能概述
- 論述機(jī)械電子工程與人工智能的關(guān)系
評論
0/150
提交評論