wirelesssensornetworksanoverview-uml軟件工程組_第1頁
已閱讀1頁,還剩170頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、云計算與云數(shù)據(jù)管理,陸嘉恒中國人民大學(xué)www.jiahenglu.net,《先進數(shù)據(jù)管理》前沿講習(xí)班,主要內(nèi)容,2,云計算概述 Google 云計算技術(shù):GFS,Bigtable 和MapreduceYahoo云計算技術(shù)和Hadoop云數(shù)據(jù)管理的挑戰(zhàn),人民大學(xué)新開的《分布式系統(tǒng)與云計算》課程,3,分布式系統(tǒng)概述分布式云計算技術(shù)綜述分布式云計算平臺分布式云計算程序開發(fā),第一篇分布式系統(tǒng)概述,4,第一章:分布式系

2、統(tǒng)入門 第二章:客戶-服務(wù)器端構(gòu)架 第三章:分布式對象 第四章:公共對象請求代理結(jié)構(gòu) (CORBA),第二篇 云計算綜述,5,第五章:云計算入門 第六章:云服務(wù) 第七章:云相關(guān)技術(shù)比較7.1網(wǎng)格計算和云計算7.2 Utility計算(效用計算)和云計算 7.3并行和分布計算和云計算 7.4集群計算和云計算,第三篇 云計算平臺,6,第八章:Google云平臺的三大技術(shù) 第九章:Yahoo云平臺的技

3、術(shù) 第十章:Aneka 云平臺的技術(shù)第十一章:Greenplum云平臺的技術(shù)第十二章:Amazon dynamo云平臺的技術(shù),第四篇 云計算平臺開發(fā),7,第十三章:基于Hadoop系統(tǒng)開發(fā) 第十四章:基于HBase系統(tǒng)開發(fā) 第十五章:基于Google Apps系統(tǒng)開發(fā) 第十六章:基于MS Azure系統(tǒng)開發(fā) 第十七章:基于Amazon EC2系統(tǒng)開發(fā),,Cloud computing,,Why we use

4、cloud computing?,Why we use cloud computing?,Case 1:Write a fileSaveComputer down, file is lostFiles are always stored in cloud, never lost,Why we use cloud computing?,Case 2:Use IE --- download, install, useUse Q

5、Q --- download, install, useUse C++ --- download, install, use……Get the serve from the cloud,What is cloud and cloud computing?,CloudDemand resources or services over Internetscale and reliability of a data center.

6、,What is cloud and cloud computing?,Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a serve over the Internet. Users need not have knowledge of, e

7、xpertise in, or control over the technology infrastructure in the "cloud" that supports them.,Characteristics of cloud computing,Virtual. software, databases, Web servers, operating systems, storage

8、and networking as virtual servers. On demand. add and subtract processors, memory, network bandwidth, storage.,IaaSInfrastructure as a Service,PaaSPlatform as a Service,SaaSSoftware as a Service,Types of cloud se

9、rvice,Software delivery modelNo hardware or software to manageService delivered through a browserCustomers use the service on demandInstant Scalability,SaaS,ExamplesYour current CRM package is not managing the loa

10、d or you simply don’t want to host it in-house. Use a SaaS provider such as Salesforce.comYour email is hosted on an exchange server in your office and it is very slow. Outsource this using Hosted Exchange.,SaaS,Platf

11、orm delivery modelPlatforms are built upon Infrastructure, which is expensiveEstimating demand is not a science!Platform management is not fun!,PaaS,ExamplesYou need to host a large file (5Mb) on your website and m

12、ake it available for 35,000 users for only two months duration. Use Cloud Front from Amazon.You want to start storage services on your network for a large number of files and you do not have the storage capacity…use Am

13、azon S3.,PaaS,Computer infrastructure delivery modelA platform virtualization environmentComputing resources, such as storing and processing capacity. Virtualization taken a step further,IaaS,ExamplesYou want to ru

14、n a batch job but you don’t have the infrastructure necessary to run it in a timely manner. Use Amazon EC2.You want to host a website, but only for a few days. Use Flexiscale.,IaaS,,Cloud computing and other computing

15、 techniques,The 21st Century Vision Of Computing,Leonard Kleinrock , one of the chief scientists of the original Advanced Research Projects Agency Network (ARPANET) project which seeded the Internet, said: “As of now,

16、computer networks are still in theirinfancy, but as they grow up and become sophisticated, we will probably see the spread of ‘computer utilities’ which, like present electric and telephone utilities, will service indiv

17、idual homes and offices across the country.”,The 21st Century Vision Of Computing,Sun Microsystemsco-founder Bill Joy He also indicated “It would take time until these markets to mature to generate this kind ofvalue.

18、Predicting now which companies will capture the value is impossible. Many of them have not even been created yet.”,The 21st Century Vision Of Computing,Definitions,,utility,Definitions,,utility,,Utility computing is the

19、packaging of computing resources, such as computation and storage, as a metered service similar to a traditional public utility,Definitions,,utility,,A computer cluster is a group of linked computers, working together cl

20、osely so that in many respects they form a single computer.,Definitions,,utility,,Grid computing is the application of several computers to a single problem at the same time — usually to a scientific or technical proble

21、m that requires a great number of computer processing cycles or access to large amounts of data,Definitions,,utility,,Cloud computing is a style of computing in which dynamically scalable and often virtualized resources

22、are provided as a service over the Internet.,Grid Computing & Cloud Computing,share a lot commonality intention, architecture and technology Difference programming model, business model, compute model, app

23、lications, and Virtualization.,Grid Computing & Cloud Computing,the problems are mostly the samemanage large facilities;define methods by which consumers discover, request and use resources provided by the central

24、 facilities; implement the often highly parallel computations that execute on those resources.,Grid Computing & Cloud Computing,VirtualizationGriddo not rely on virtualization as much as Clouds do, each individua

25、l organization maintain full control of their resources Cloudan indispensable ingredient for almost every Cloud,,2024/2/28,36,Any question and any comments ?,主要內(nèi)容,37,云計算概述 Google 云計算技術(shù):GFS,Bigtable 和MapreduceYahoo云

26、計算技術(shù)和Hadoop云數(shù)據(jù)管理的挑戰(zhàn),,Google Cloud computing techniques,The Google File System,The Google File System(GFS),A scalable distributed file system for large distributed data intensive applicationsMultiple GFS clusters are

27、currently deployed.The largest ones have:1000+ storage nodes300+ TeraBytes of disk storageheavily accessed by hundreds of clients on distinct machines,Introduction,Shares many same goals as previous distributed file

28、systemsperformance, scalability, reliability, etcGFS design has been driven by four key observation of Google application workloads and technological environment,Intro: Observations 1,1. Component failures are the norm

29、constant monitoring, error detection, fault tolerance and automatic recovery are integral to the system2. Huge files (by traditional standards)Multi GB files are commonI/O operations and blocks sizes must be revisit

30、ed,Intro: Observations 2,3. Most files are mutated by appending new dataThis is the focus of performance optimization and atomicity guarantees4. Co-designing the applications and APIs benefits overall system by increas

31、ing flexibility,The Design,Cluster consists of a single master and multiple chunkservers and is accessed by multiple clients,The Master,Maintains all file system metadata.names space, access control info, file to chunk

32、mappings, chunk (including replicas) location, etc.Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state,The Master,Helps make sophisticated chunk placement and replicati

33、on decision, using global knowledgeFor reading and writing, client contacts Master to get chunk locations, then deals directly with chunkserversMaster is not a bottleneck for reads/writes,Chunkservers,Files are broken

34、into chunks. Each chunk has a immutable globally unique 64-bit chunk-handle.handle is assigned by the master at chunk creationChunk size is 64 MBEach chunk is replicated on 3 (default) servers,Clients,Linked to apps u

35、sing the file system API.Communicates with master and chunkservers for reading and writingMaster interactions only for metadataChunkserver interactions for dataOnly caches metadata informationData is too large to ca

36、che.,Chunk Locations,Master does not keep a persistent record of locations of chunks and replicas.Polls chunkservers at startup, and when new chunkservers join/leave for this.Stays up to date by controlling placement o

37、f new chunks and through HeartBeat messages (when monitoring chunkservers),Operation Log,Record of all critical metadata changesStored on Master and replicated on other machinesDefines order of concurrent operationsAl

38、so used to recover the file system state,System Interactions: Leases and Mutation Order,Leases maintain a mutation order across all chunk replicasMaster grants a lease to a replica, called the primaryThe primary chose

39、s the serial mutation order, and all replicas follow this orderMinimizes management overhead for the Master,Atomic Record Append,Client specifies the data to write; GFS chooses and returns the offset it writes to and ap

40、pends the data to each replica at least onceHeavily used by Google’s Distributed applications.No need for a distributed lock managerGFS choses the offset, not the client,Atomic Record Append: How?,Follows similar cont

41、rol flow as mutationsPrimary tells secondary replicas to append at the same offset as the primaryIf a replica append fails at any replica, it is retried by the client. So replicas of the same chunk may contain differe

42、nt data, including duplicates, whole or in part, of the same record,Atomic Record Append: How?,GFS does not guarantee that all replicas are bitwise identical.Only guarantees that data is written at least once in an atom

43、ic unit.Data must be written at the same offset for all chunk replicas for success to be reported.,Detecting Stale Replicas,Master has a chunk version number to distinguish up to date and stale replicasIncrease version

44、 when granting a leaseIf a replica is not available, its version is not increasedmaster detects stale replicas when a chunkservers report chunks and versionsRemove stale replicas during garbage collection,Garbage coll

45、ection,When a client deletes a file, master logs it like other changes and changes filename to a hidden file.Master removes files hidden for longer than 3 days when scanning file system name spacemetadata is also erase

46、dDuring HeartBeat messages, the chunkservers send the master a subset of its chunks, and the master tells it which files have no metadata.Chunkserver removes these files on its own,Fault Tolerance:High Availability,F

47、ast recoveryMaster and chunkservers can restart in secondsChunk ReplicationMaster Replication“shadow” masters provide read-only access when primary master is downmutations not done until recorded on all master repli

48、cas,Fault Tolerance:Data Integrity,Chunkservers use checksums to detect corrupt dataSince replicas are not bitwise identical, chunkservers maintain their own checksumsFor reads, chunkserver verifies checksum before se

49、nding chunkUpdate checksums during writes,Introduction to MapReduce,MapReduce: Insight,”Consider the problem of counting the number of occurrences of each word in a large collection of documents”How would you do it

50、in parallel ?,MapReduce Programming Model,Inspired from map and reduce operations commonly used in functional programming languages like Lisp.Users implement interface of two primary methods:1. Map: (key1, val1) → (ke

51、y2, val2)2. Reduce: (key2, [val2]) → [val3],Map operation,Map, a pure function, written by the user, takes an input key/value pair and produces a set of intermediate key/value pairs. e.g. (doc—id, doc-content)Draw a

52、n analogy to SQL, map can be visualized as group-by clause of an aggregate query.,Reduce operation,On completion of map phase, all the intermediate values for a given output key are combined together into a list and give

53、n to a reducer.Can be visualized as aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute.,Pseudo-code,map(String input_key, String input_value): // input_key: document

54、 name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list

55、of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));,MapReduce: Execution overview,,MapReduce: Example,,MapReduce in Parallel: Example,,MapReduce: Fault Toleran

56、ce,Handled via re-execution of tasks.Task completion committed through master What happens if Mapper fails ?Re-execute completed + in-progress map tasksWhat happens if Reducer fails ?Re-execute in progress reduce

57、 tasksWhat happens if Master fails ?Potential trouble !!,MapReduce:,Walk through of One more Application,,MapReduce : PageRank,PageRank models the behavior of a “random surfer”.C(t) is the out-degree of t, and

58、(1-d) is a damping factor (random jump)The “random surfer” keeps clicking on successive links at random not taking content into consideration.Distributes its pages rank equally among all pages it links to.The dam

59、pening factor takes the surfer “getting bored” and typing arbitrary URL.,PageRank : Key Insights,Effects at each iteration is local. i+1th iteration depends only on ith iterationAt iteration i, PageRank for individua

60、l nodes can be computed independently,PageRank using MapReduce,Use Sparse matrix representation (M)Map each row of M to a list of PageRank “credit” to assign to out link neighbours.These prestige scores are reduced

61、 to a single PageRank value for a page by aggregating over them.,PageRank using MapReduce,Source of Image: Lin 2008,Phase 1: Process HTML,Map task takes (URL, page-content) pairs and maps them to (URL, (PRinit, list-of

62、-urls))PRinit is the “seed” PageRank for URLlist-of-urls contains all pages pointed to by URLReduce task is just the identity function,Phase 2: PageRank Distribution,Reduce task gets (URL, url_list) and many (URL, va

63、l) valuesSum vals and fix up with d to get new PREmit (URL, (new_rank, url_list))Check for convergence using non parallel component,MapReduce: Some More Apps,Distributed Grep.Count of URL Access Frequency.Cluster

64、ing (K-means)Graph Algorithms.Indexing Systems,MapReduce Programs In Google Source Tree,MapReduce: Extensions and similar apps,PIG (Yahoo)Hadoop (Apache)DryadLinq (Microsoft),Large Scale Systems Architecture usin

65、g MapReduce,BigTable: A Distributed Storage System for Structured Data,Introduction,BigTable is a distributed storage system for managing structured data.Designed to scale to a very large sizePetabytes of data across t

66、housands of serversUsed for many Google projectsWeb indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …Flexible, high-performance solution for all of Google’s products,Motivation,Lots of (

67、semi-)structured data at GoogleURLs:Contents, crawl metadata, links, anchors, pagerank, …Per-user data:User preference settings, recent queries/search results, …Geographic locations:Physical entities (shops, restau

68、rants, etc.), roads, satellite image data, user annotations, …Scale is largeBillions of URLs, many versions/page (~20K/version)Hundreds of millions of users, thousands or q/sec100TB+ of satellite image data,Why not j

69、ust use commercial DB?,Scale is too large for most commercial databasesEven if it weren’t, cost would be very highBuilding internally means system can be applied across many projects for low incremental costLow-level

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論