軟件數(shù)據(jù)庫的面向?qū)ο蟮囊暯峭馕奈墨I(xiàn)翻譯_第1頁
已閱讀1頁,還剩16頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、<p><b> ?。?lt;/b></p><p>  二〇一三 年 六 月</p><p>  A HISTORICAL PERSPECTIVE</p><p>  From the earliest days of computers, storing and manipulating data have been a major a

2、pplication focus. The first general-purpose DBMS was designed by Charles Bachman at General Electric in the early 1960s and was called the Integrated Data Store. It formed the basis for the network data model, which was

3、standardized by the Conference on Data Systems Languages (CODASYL) and strongly influenced database systems through the 1960s. Bachman was the first recipient of ACM’s Turing Award (the computer</p><p>  In

4、1970, Edgar Codd, at IBM’s San Jose Research Laboratory, proposed a new data representation framework called the relational data model. This proved to be a watershed in the development of database systems: it sparked rap

5、id development of several DBMSs based on the relational model, along with a rich body of theoretical results that placed the field on a firm foundation. Codd won the 1981 Turing Award for his seminal work. Database syste

6、ms matured as an academic discipline, and the popularity </p><p>  In the 1980s, the relational model consolidated its position as the dominant DBMS paradigm, and database systems continued to gain widesprea

7、d use. The SQL query language for relational databases, developed as part of IBM’s System R project, is now the standard query language. SQL was standardized in the late 1980s, and the current standard, SQL-92, was adopt

8、ed by the American National Standards Institute (ANSI) and International Standards Organization (ISO). Arguably, the most widely used form o</p><p>  In the late 1980s and the 1990s, advances have been made

9、in many areas of database systems. Considerable research has been carried out into more powerful query languages and richer data models, and there has been a big emphasis on supporting complex analysis of data from all p

10、arts of an enterprise. Several vendors (e.g., IBM’s DB2, Oracle 8, Informix UDS) have extended their systems with the ability to store new data types such as images and text, and with the ability to ask more complex quer

11、ies</p><p>  An interesting phenomenon is the emergence of several enterprise resource planning(ERP) and management resource planning (MRP) packages, which add a substantial layer of application-oriented fea

12、tures on top of a DBMS. Widely used packages include systems from Baan, Oracle, PeopleSoft, SAP, and Siebel. These packages identify a set of common tasks (e.g., inventory management, human resources planning, financial

13、analysis) encountered by a large number of organizations and provide a general applica</p><p>  INTRODUCTION TO PHYSICAL DATABASE DESIGN</p><p>  Like all other aspects of database design, physi

14、cal design must be guided by the nature of the data and its intended use. In particular, it is important to understand the typical workload that the database must support; the workload consists of a mix of queries and up

15、dates. Users also have certain requirements about how fast certain queries or updates must run or how many transactions must be processed per second. The workload description and users’ performance requirements are the b

16、asis on whic</p><p>  To create a good physical database design and to tune the system for performance in response to evolving user requirements, the designer needs to understand the workings of a DBMS, espe

17、cially the indexing and query processing techniques supported by the DBMS. If the database is expected to be accessed concurrently by many users, or is a distributed database, the task becomes more complicated, and other

18、 features of a DBMS come into play. </p><p>  DATABASE WORKLOADS</p><p>  The key to good physical design is arriving at an accurate description of the expected workload. A workload description

19、includes the following elements:</p><p>  1. A list of queries and their frequencies, as a fraction of all queries and updates.</p><p>  2. A list of updates and their frequencies.</p>&l

20、t;p>  3. Performance goals for each type of query and update.</p><p>  For each query in the workload, we must identify:</p><p>  Which relations are accessed.</p><p>  Which att

21、ributes are retained (in the SELECT clause).</p><p>  Which attributes have selection or join conditions expressed on them (in the WHERE clause) and how selective these conditions are likely to be. Similarly

22、, for each update in the workload, we must identify:</p><p>  Which attributes have selection or join conditions expressed on them (in the WHERE clause) and how selective these conditions are likely to be.&l

23、t;/p><p>  The type of update (INSERT, DELETE, or UPDATE) and the updated relation.</p><p>  For UPDATE commands, the fields that are modified by the update.</p><p>  Remember that que

24、ries and updates typically have parameters, for example, a debit or credit operation involves a particular account number. The values of these parameters determine selectivity of selection and join conditions.</p>

25、<p>  Updates have a query component that is used to find the target tuples. This component can benefit from a good physical design and the presence of indexes. On the other hand, updates typically require addition

26、al work to maintain indexes on the attributes that they modify. Thus, while queries can only benefit from the presence of an index, an index may either speed up or slow down a given update. Designers should keep this tra

27、de-offer in mind when creating indexes.</p><p>  NEED FOR DATABASE TUNING</p><p>  Accurate, detailed workload information may be hard to come by while doing the initial design of the system. Co

28、nsequently, tuning a database after it has been designed and deployed is important—we must refine the initial design in the light of actual usage patterns to obtain the best possible performance.</p><p>  Th

29、e distinction between database design and database tuning is somewhat arbitrary.</p><p>  We could consider the design process to be over once an initial conceptual schema is designed and a set of indexing a

30、nd clustering decisions is made. Any subsequent changes to the conceptual schema or the indexes, say, would then be regarded as a tuning activity. Alternatively, we could consider some refinement of the conceptual schema

31、 (and physical design decisions affected by this refinement) to be part of the physical design process.</p><p>  Where we draw the line between design and tuning is not very important.</p><p>  

32、OVERVIEW OF DATABASE TUNING</p><p>  After the initial phase of database design, actual use of the database provides a valuable source of detailed information that can be used to refine the initial design. M

33、any of the original assumptions about the expected workload can be replaced by observed usage patterns; in general, some of the initial workload specification will be validated, and some of it will turn out to be wrong.

34、Initial guesses about the size of data can be replaced with actual statistics from the system catalogs (althou</p><p>  Continued database tuning is important to get the best possible performance. </p>

35、<p>  TUNING THE CONCEPTUAL SCHEMA</p><p>  In the course of database design, we may realize that our current choice of relation schemas does not enable us meet our performance objectives for the give

36、n workload with any (feasible) set of physical design choices. If so, we may have to redesign our conceptual schema (and re-examine physical design decisions that are affected by the changes that we make).</p><

37、;p>  We may realize that a redesign is necessary during the initial design process or later, after the system has been in use for a while. Once a database has been designed and populated with data, changing the concep

38、tual schema requires a significant effort in terms of mapping the contents of relations that are affected. Nonetheless, it may sometimes be necessary to revise the conceptual schema in light of experience with the system

39、. We now consider the issues involved in conceptual schema (re)desig</p><p>  Several options must be considered while tuning the conceptual schema:</p><p>  We may decide to settle for a 3NF de

40、sign instead of a BCNF design.</p><p>  If there are two ways to decompose a given schema into 3NF or BCNF, our choice should be guided by the workload.</p><p>  Sometimes we might decide to fur

41、ther decompose a relation that is already in BCNF.</p><p>  In other situations we might denormalize. That is, we might choose to replace a collection of relations obtained by a decomposition from a larger r

42、elation with the original (larger) relation, even though it suffers from some redundancy problems. Alternatively, we might choose to add some fields to certain relations to speed up some important queries, even if this l

43、eads to a redundant storage of some information (and consequently, a schema that is in neither 3NF nor BCNF).</p><p>  This discussion of normalization has concentrated on the technique of decomposition, whi

44、ch amounts to vertical partitioning of a relation. Another technique to consider is horizontal partitioning of a relation, which would lead to our having two relations with identical schemas. Note that we are not talking

45、 about physically partitioning the cuples of a single relation; rather, we want to create two distinct relations (possibly with different constraints and indexes on each).</p><p>  Incidentally, when we rede

46、sign the conceptual schema, especially if we are tuning an existing database schema, it is worth considering whether we should create views to mask these changes from users for whom the original schema is more natural. &

47、lt;/p><p>  TUNING QUERIES AND VIEWS</p><p>  If we notice that a query is running much slower than we expected, we have to examine the query carefully to end the problem. Some rewriting of the que

48、ry, perhaps in conjunction with some index tuning, can often ?x the problem. Similar tuning may be called for if queries on some view run slower than expected. </p><p>  When tuning a query, the first thing

49、to verify is that the system is using the plan that you expect it to use. It may be that the system is not finding the best plan for a variety of reasons. Some common situations that are not handled efficiently by many o

50、ptimizers follow:</p><p>  A selection condition involving null values.</p><p>  Selection conditions involving arithmetic or string expressions or conditions using the or connective. For examp

51、le, if we have a condition E.age = 2*D.age in the WHERE clause, the optimizer may correctly utilize an available index on E.age but fail to utilize an available index on D.age. Replacing the condition by E.age/2=D.age wo

52、uld reverse the situation.</p><p>  Inability to recognize a sophisticated plan such as an index-only scan for an aggregation query involving a GROUP BY clause. </p><p>  If the optimizer is not

53、 smart enough to and the best plan (using access methods and evaluation strategies supported by the DBMS), some systems allow users to guide the choice of a plan by providing hints to the optimizer; for example, users mi

54、ght be able to force the use of a particular index or choose the join order and join method. A user who wishes to guide optimization in this manner should have a thorough understanding of both optimization and the capabi

55、lities of the given DBMS.</p><p>  (8)OTHER TOPICS</p><p>  MOBILE DATABASES</p><p>  The availability of portable computers and wireless communications has created a new breed of n

56、omadic database users. At one level these users are simply accessing a database through a network, which is similar to distributed DBMSs. At another level the network as well as data and user characteristics now have sev

57、eral novel properties, which affect basic assumptions in many components of a DBMS, including the query engine, transaction manager, and recovery manager.</p><p>  Users are connected through a wireless link

58、 whose bandwidth is ten times less than Ethernet and 100 times less than ATM networks. Communication costs are therefore significantly higher in proportion to I/O and CPU costs.</p><p>  Users’ locations are

59、 constantly changing, and mobile computers have a limited battery life. Therefore, the true communication costs is connection time and battery usage in addition to bytes transferred, and change constantly depending on lo

60、cation. Data is frequently replicated to minimize the cost of accessing it from different locations.</p><p>  As a user moves around, data could be accessed from multiple database servers within a single tra

61、nsaction. The likelihood of losing connections is also much greater than in a traditional network. Centralized transaction management may therefore be impractical, especially if some data is resident at the mobile comput

62、ers. We may in fact have to give up on ACID transactions and develop alternative notions of consistency for user programs.</p><p>  MAIN MEMORY DATABASES</p><p>  The price of main memory is now

63、 low enough that we can buy enough main memory to hold the entire database for many applications; with 64-bit addressing, modern CPUs also have very large address spaces. Some commercial systems now have several gigabyte

64、s of main memory. This shift prompts a reexamination of some basic DBMS design decisions, since disk accesses no longer dominate processing time for a memory-resident database:</p><p>  Main memory does not

65、survive system crashes, and so we still have to implement logging and recovery to ensure transaction atomicity and durability. Log records must be written to stable storage at commit time, and this process could become a

66、 bottleneck. To minimize this problem, rather than commit each transaction as it completes, we can collect completed transactions and commit them in batches; this is called group commit. Recovery algorithms can also be o

67、ptimized since pages rarely have to be </p><p>  The implementation of in-memory operations has to be optimized carefully since disk accesses are no longer the limiting factor for performance.</p><

68、;p>  A new criterion must be considered while optimizing queries, namely the amount of space required to execute a plan. It is important to minimize the space overhead because exceeding available physical memory would

69、 lead to swapping pages to disk (through the operating system’s virtual memory mechanisms), greatly slowing down execution.</p><p>  Page-oriented data structures become less important (since pages are no lo

70、nger the unit of data retrieval), and clustering is not important (since the cost of accessing any region of main memory is uniform).</p><p> ?。ㄒ唬臍v史的角度回顧</p><p>  從數(shù)據(jù)庫的早期開始,存儲和操縱數(shù)據(jù)就一直是主要的應(yīng)用焦點(diǎn)。第

71、一個通用的DBMS是由Charles Bechman于20世紀(jì)60年代早期在通用電器公司設(shè)計的,稱為集成數(shù)據(jù)存儲(Integrated Data Store).它奠定了網(wǎng)狀數(shù)據(jù)模型的基礎(chǔ)。網(wǎng)狀數(shù)據(jù)模型由數(shù)據(jù)系統(tǒng)語言協(xié)會(CODASYL)標(biāo)準(zhǔn)化,并在整個20世紀(jì)60年代對數(shù)據(jù)庫系統(tǒng)產(chǎn)生了巨大的影響。由于Bachman在數(shù)據(jù)庫領(lǐng)域的貢獻(xiàn),他成為第一個ACM圖靈獎(相當(dāng)于計算機(jī)科學(xué)界的諾貝爾獎)的獲得者,并于1973年接受了這一獎勵。<

72、/p><p>  20世紀(jì)60年代末期,IBM成功開發(fā)了信息管理系統(tǒng)(IMS)DBMS。直至今天,它還在許多系統(tǒng)中使用。IMS奠定了另一個數(shù)據(jù)表達(dá)框架——層次數(shù)據(jù)模型的基礎(chǔ)。同時,美國航空公司和IBM聯(lián)合開發(fā)出用于飛機(jī)訂票的SABRE系統(tǒng),它允許多個用戶通過計算機(jī)網(wǎng)絡(luò)存取相同數(shù)據(jù)。有趣的是,今天SABRE系統(tǒng)被用于支持廣為流行的基于Web的旅游服務(wù),如Travelocity。</p><p>

73、  1970年,Edgar Codd在IBM的San Jose研究實(shí)驗(yàn)室推出了一種新的,稱為關(guān)系數(shù)據(jù)模型的數(shù)據(jù)表達(dá)框架。這后來被證明是數(shù)據(jù)庫系統(tǒng)開發(fā)中的分水嶺:它推進(jìn)了幾個基于關(guān)系模型的數(shù)據(jù)庫管理系統(tǒng)的快速開發(fā),并取得大量的理論成果,從而為數(shù)據(jù)庫領(lǐng)域奠定了堅實(shí)的基礎(chǔ)。Coff因?yàn)槠浣艹龅墓ぷ鞫@得了1981年圖靈獎。數(shù)據(jù)庫系統(tǒng)作為學(xué)術(shù)學(xué)科已經(jīng)成熟了,而且關(guān)系型DBMS的普及改變了商業(yè)應(yīng)用前景。其益處被廣泛認(rèn)同,使用DBMS來管理公司數(shù)據(jù)

74、變得很普遍。</p><p>  在20世紀(jì)80年代,關(guān)系模型鞏固了它作為主導(dǎo)DBMS模式的地位,而數(shù)據(jù)庫系統(tǒng)繼續(xù)被廣泛使用。作為IBM的 System R項目的一部分而開發(fā)的關(guān)系數(shù)據(jù)庫SQL查詢語言,現(xiàn)在成為了標(biāo)準(zhǔn)查詢語言。SQL于20世紀(jì)80年代末期得到標(biāo)準(zhǔn)化,目前的標(biāo)準(zhǔn)SQL:1999被美國國家標(biāo)準(zhǔn)協(xié)會(ANSI)和國際標(biāo)準(zhǔn)組織(ISO)接受。并發(fā)編程使用最廣的形式就是數(shù)據(jù)庫程序(稱為事務(wù))的并發(fā)執(zhí)行。用戶

75、編寫程序時不用考慮其他程序的運(yùn)行,并發(fā)執(zhí)行操作由DBMS管理。James Gray因他對DBMS事務(wù)處理領(lǐng)域的貢獻(xiàn)而獲得了1999圖靈獎。</p><p>  在20世紀(jì)80年代末期和90年代,數(shù)據(jù)庫系統(tǒng)在很多方面得到發(fā)掌。相當(dāng)多的研究側(cè)重于功能強(qiáng)大的查詢語言和更豐富的數(shù)據(jù)模型,其重點(diǎn)也放在了支持對企業(yè)各部分?jǐn)?shù)據(jù)的復(fù)雜分析上。很多數(shù)據(jù)庫提供商(如IBM的DB2,Oracle 8,Informix UDS)櫻井?dāng)U展

76、了它們的系統(tǒng),使之具有存儲諸如圖像,文本等新數(shù)據(jù)類型的能力,以及回答更復(fù)雜查詢的能力。大量的廠商已經(jīng)為創(chuàng)建數(shù)據(jù)倉庫,繼承多個數(shù)據(jù)庫的數(shù)據(jù)以及實(shí)現(xiàn)專業(yè)化分析而開發(fā)了專用系統(tǒng)。</p><p>  一個有趣的現(xiàn)象是隨著一些企業(yè)資源規(guī)劃(ERP)和管理自愿規(guī)劃(MRP)軟件包的出現(xiàn),他們在DBMS之上增加了一層面向應(yīng)用的特征。廣泛使用的軟件包有Baan,Oracle,PeopleSoft,SAP和Siebel等系統(tǒng),它

77、們先確定大多數(shù)組織機(jī)構(gòu)所遇到的共同任務(wù)(例如,庫存管理,人力資源規(guī)劃,財務(wù)分析等),并提供一個通用的應(yīng)用層以完成這些任務(wù)。數(shù)據(jù)存儲在關(guān)系型DBMS中,可以為不同公司分別定制應(yīng)用層。與從頭開始創(chuàng)建應(yīng)用層的開銷相比,這樣可以降低公司的總體開銷。</p><p>  也許,在DBMS的發(fā)展中,最重要的事是DBMS已經(jīng)進(jìn)入了因特網(wǎng)時代。第一代Web站點(diǎn)是把數(shù)據(jù)存儲在操作系統(tǒng)的文件中,而現(xiàn)在,使用DBMS存儲數(shù)據(jù)并通過We

78、b瀏覽器瀏覽數(shù)據(jù)已變得越來越普遍。通過Web可存取的表單界面來產(chǎn)生查詢請求,并使用諸如HTML的標(biāo)記語言將查詢結(jié)果格式化,從而便于在瀏覽器中顯示。所有數(shù)據(jù)庫提供商都在增加它們的DBMS功能,使之更適于在因特網(wǎng)上部署。</p><p>  隨著越來越多在線數(shù)據(jù)的產(chǎn)生,并且通過計算機(jī)網(wǎng)絡(luò)越來越容易獲得,數(shù)據(jù)庫也變得更加重要了。今天,眾多領(lǐng)域的發(fā)展需求,例如,多媒體數(shù)據(jù)庫,互動視頻,流數(shù)據(jù),數(shù)字圖書館等精彩視頻節(jié)目,人

79、類基因圖和NASA的地球觀測系統(tǒng)等科學(xué)項目,以及公司對鞏固它們的決策支持處理和有用信息挖掘的渴望,正推動著數(shù)據(jù)庫領(lǐng)域的發(fā)展。在商業(yè)上,數(shù)據(jù)庫管理系統(tǒng)代表著最大和最具活力的市場之一。所以,有關(guān)數(shù)據(jù)庫系統(tǒng)的研究回報豐厚!</p><p>  (二)物理數(shù)據(jù)庫設(shè)計簡介</p><p>  與數(shù)據(jù)庫設(shè)計的其他方面一樣,我們要根據(jù)數(shù)據(jù)的性質(zhì)和用途來進(jìn)行物理數(shù)據(jù)庫設(shè)計。特別是,我們必須了解數(shù)據(jù)庫所必須

80、支持的典型的工作負(fù)載,工作負(fù)載是查詢和更新的混合體。用戶有一些特定的要求,如,某些查詢或更新的執(zhí)行速度應(yīng)該有多快,或者每秒鐘必須處理多少個事務(wù)等。在物理數(shù)據(jù)庫設(shè)計過程中,工作負(fù)載的描述和用戶的需求是作出許多決策的基礎(chǔ)。</p><p>  為了獲得一個好的物理數(shù)據(jù)庫設(shè)計,我們還要調(diào)整系統(tǒng)的性能以滿足用戶的需求。設(shè)計者需要明白DBMS工作的細(xì)節(jié),特別是DBMS所支持的索引和查詢處理技術(shù)。如果數(shù)據(jù)庫允許多個用戶并發(fā)訪

81、問,或者是分布式數(shù)據(jù)庫,那么這是設(shè)計任務(wù)就變得更復(fù)雜了,還需要考慮DBMS的其他特點(diǎn)。</p><p><b> ?。ㄈ?shù)據(jù)庫負(fù)載</b></p><p>  一個好的數(shù)據(jù)庫設(shè)計的關(guān)鍵是對所希望的負(fù)載有準(zhǔn)確的描述。一個工作負(fù)載的描述包括以下幾個部分:</p><p>  1.一個查詢及其出現(xiàn)的頻率的列表,一個查詢的頻率指該查詢在所有的查詢和更

82、新中所占的比例。</p><p>  2.更新及其出現(xiàn)的頻率列表。</p><p>  3.每一種查詢和更新類型所對應(yīng)的性能目標(biāo)。</p><p>  對于在工作負(fù)載中的每個查詢,我們必須確定:</p><p><b>  需要訪問哪些關(guān)系。</b></p><p>  需要保留那些屬性(在SEL

83、ECT子句中)。</p><p>  在那些屬性上有選擇或連接條件(在WHERE子句中),以及這些條件具有多大的選擇性。</p><p>  類似地,對工作負(fù)載中每個更新,我們必須確定:</p><p>  在哪些屬性上有選擇或連接條件(在WHERE語句中),以及有多大的選擇性。</p><p>  更新的類型(INSERT,DELETE,U

84、PDATE)以及所要更新的關(guān)系。</p><p>  對于UPDATE命令,要更新哪些字段。</p><p>  典型的查詢和更新都帶有參數(shù),例如,借款或存款操作都涉及某個特定的帳號。這些參數(shù)的值決定了選擇和連接條件的選擇性。</p><p>  更新中包括一個查詢部分,用來找到目標(biāo)元組。這個部分可以得益于一個好的物理設(shè)計和索引。另一方面,更新操作一般還要做一些額外

85、的工作,以維護(hù)所修改的屬性上的索引。這樣,盡管查詢總可以從索引受益,但是索引也可能使一個給定的更新加快或變慢。在生成索引時,設(shè)計者應(yīng)該在頭腦中進(jìn)行一下權(quán)衡。</p><p> ?。ㄋ模?shù)據(jù)庫調(diào)整的必要性</p><p>  準(zhǔn)確地講,在系統(tǒng)設(shè)計的初始階段,我們很難得到工作負(fù)載的詳細(xì)信息。所以在系統(tǒng)設(shè)計完以后,對數(shù)據(jù)庫的調(diào)整就變得很重要,我們必須按照實(shí)際的使用模式來對初始的設(shè)計進(jìn)行求精,以便

86、獲得好的性能。</p><p>  對于如何區(qū)別數(shù)據(jù)庫設(shè)計和數(shù)據(jù)庫調(diào)整,人們有不同的看法。一種看法認(rèn)為,一旦初始模式、索引和聚簇決策已經(jīng)確定,那么設(shè)計過程也就結(jié)束了。接下去對概念模式或索引的任何改變,都被認(rèn)為是對數(shù)據(jù)庫進(jìn)行調(diào)整的活動。另一種看法是,對于概念模式的進(jìn)一步求精(和受這些改進(jìn)影響的物理設(shè)計決策)也應(yīng)該是物理設(shè)計過程的一部分。</p><p>  如何區(qū)分設(shè)計和調(diào)整并不是很重要的&

87、lt;/p><p> ?。ㄎ澹?shù)據(jù)庫調(diào)整簡介</p><p>  當(dāng)數(shù)據(jù)庫初始設(shè)計完成后,數(shù)據(jù)庫的實(shí)際使用提供了一些有用的詳細(xì)信息,它們可以用來對初始設(shè)計進(jìn)行進(jìn)一步求精。先前對工作負(fù)載的很多假設(shè)都可以用觀察到的模式來代替;一般來講,一些初始的關(guān)于工作負(fù)載的說明將得到驗(yàn)證,其中有一些可能是錯誤的。關(guān)于數(shù)據(jù)大小的初始猜測可以用實(shí)際的數(shù)據(jù)庫的統(tǒng)計數(shù)字來代替(盡管這個信息會隨著系統(tǒng)的不斷進(jìn)化而變化)。

88、對于查詢的仔細(xì)監(jiān)測可龕發(fā)現(xiàn)一些預(yù)測不到的問題,例如,優(yōu)化器可能不使用某些索引,盡管這些索引可以產(chǎn)生好的計劃。 </p><p>  為了獲得可能的最好的性能,對數(shù)據(jù)庫進(jìn)行連續(xù)的調(diào)整是很重要的。</p><p><b> ?。┱{(diào)整概念模式</b></p><p>  在數(shù)據(jù)庫設(shè)計期間,我們也許會意識到,在給定工作

89、負(fù)載和任何一組可行的物理設(shè)計選擇的情況下,當(dāng)前選擇的關(guān)系模式并不能滿足性能目標(biāo)。如果是這樣,我們也許必須重新設(shè)計概念模式(而且還要重新檢查那些受到影響的物理設(shè)計決策)。</p><p>  在系統(tǒng)已經(jīng)運(yùn)行了一段時間后,我們也許會認(rèn)識到在初始設(shè)計期間或之后重新設(shè)計的必要性。一旦數(shù)據(jù)庫設(shè)計完成并且已經(jīng)被裝載數(shù)據(jù)了,如果要改變概念模式,就需要做出很大的努力去映射受到影響的關(guān)系的內(nèi)容。然而,有時需要根據(jù)使用系統(tǒng)的經(jīng)驗(yàn)來對

90、概念模式進(jìn)行修正?,F(xiàn)在,我們從性能的角度來考慮概念模式(重新)設(shè)計中的一些問題。</p><p>  在對概念模式進(jìn)行調(diào)整時我們必須考慮以下幾點(diǎn):</p><p>  我們也許應(yīng)當(dāng)采用3NF設(shè)計來代替BCNF設(shè)計。</p><p>  如果將一個關(guān)系分解為3NF或BCNF有兩種方式,那么應(yīng)該根據(jù)工作負(fù)載來進(jìn)行選擇。</p><p>  有時我

91、們需要對一個應(yīng)景是BCNF的關(guān)系進(jìn)一步分解。</p><p>  在某些情況下可能進(jìn)行反規(guī)范化。也就是,可能將一組由一個大關(guān)系分解而得到的關(guān)系用它們的原大關(guān)系代替,盡管這樣會引起一些冗余的問題。而且,我們可能在特定的關(guān)系上加上一些字段來加速一些重要的查詢,即使這樣會導(dǎo)致對一些信息的冗余存儲(從而使得模式既不是3NF也不是BCNF)。</p><p>  關(guān)于規(guī)范化的討論集中在分解技術(shù)上,實(shí)

92、際上就是對關(guān)系的垂直劃分。另一個技術(shù)是對關(guān)系進(jìn)行水平劃分,這將導(dǎo)致兩個具有相同模式的關(guān)系。這里需要注意的是,這里討論的不是一個關(guān)系元組的物理劃分;而是想創(chuàng)建兩個不同的關(guān)系(可能具有不同的約束和索引)。</p><p>  另外,在重新設(shè)計概念模式時,特別是如果調(diào)整一個現(xiàn)存數(shù)據(jù)庫的模式時,我們需要考慮是否定義視圖來向用戶隱藏這些改變,因?yàn)閷τ谟脩魜碚f原始的模式可能更自然一些。</p><p>

93、; ?。ㄆ撸┱{(diào)整查詢和視圖</p><p>  如果一個查詢比預(yù)計的要慢得多,那么我們就必須仔細(xì)檢查并找出問題。通過對查詢進(jìn)行重寫,并且進(jìn)行一些索引的調(diào)整,常常能夠解決問題。如果在視圖上的一些查詢運(yùn)行得很慢,也可以進(jìn)行類似的調(diào)整。</p><p>  當(dāng)調(diào)整一個查詢時,第一件事就是確定系統(tǒng)是否使用了你所希望的執(zhí)行計劃。由于一些原因,有時系統(tǒng)可能沒有找到最好的執(zhí)行計劃。下面是很多優(yōu)化器都不能

94、有效處理的一些情況:</p><p>  含有空值的選擇條件;</p><p>  選擇條件含有算數(shù)或字符串表達(dá)式,或者使用OR進(jìn)行條件連接。例如,如果在WHERE語句中有一個條件E.age=2*D.age,那么優(yōu)化器可能會正確利用現(xiàn)有的在E.age上的索引,但是不能正確利用在D.age上的索引。當(dāng)將條件變?yōu)镋.age/2=D.age時,將會出現(xiàn)相反的情況。</p><

95、p>  不能識別出一個復(fù)雜的執(zhí)行計劃,例如不能發(fā)現(xiàn)在GROUP BY語句中含有的聚集操作的查詢中的只讀索引掃描計劃。</p><p>  如果優(yōu)化器不太聰明,不能發(fā)現(xiàn)最好的執(zhí)行計劃(使用DBMS支持的一些訪問方法和執(zhí)行策略),一些系統(tǒng)允許用戶通過提供給優(yōu)化器一些提示來指導(dǎo)計劃的選擇。例如,用戶可能強(qiáng)迫系統(tǒng)使用特定的索引或連接的方法和順序。如果一個用戶希望以這種方式來指導(dǎo)優(yōu)化,那么他應(yīng)該對優(yōu)化和給定的DBM

96、S的能力有一個全面的理解。</p><p><b> ?。ò耍┢渌麑n}</b></p><p><b>  移動數(shù)據(jù)庫</b></p><p>  便攜計算機(jī)和無線通信的應(yīng)用創(chuàng)建了一批移動數(shù)據(jù)庫用戶。一方面,這些用戶只是簡單地通過網(wǎng)絡(luò)來訪問數(shù)據(jù)庫,類似于分布式DBMS。另一方面,不論是網(wǎng)絡(luò),還是數(shù)據(jù)和用戶,都有一些新的特性

97、,這就影響了DBMS中的許多構(gòu)件,包括查詢引擎,事務(wù)管理程序和虎伏管理程序。</p><p>  通過無線連接的用戶帶寬是以太網(wǎng)的1/10左右,是ATM網(wǎng)的1/100左右。因此通信開銷比I/O和CPU開銷更高。</p><p>  用戶的位置通常是變動的,而且移動計算機(jī)的電池壽命是有限的。因此,除內(nèi)容傳輸開銷以及因位置的頻繁變動而產(chǎn)生的開銷外,真正的通信開銷體現(xiàn)在連接時間和電池的使用上。通

98、常將數(shù)據(jù)生成多個副本,以使從不同位置的訪問開銷最小化。</p><p>  當(dāng)一個用戶移動的時候,一個事務(wù)可能需要從多個數(shù)據(jù)庫服務(wù)器中訪問數(shù)據(jù),此時丟失連接的可能性比傳統(tǒng)的網(wǎng)絡(luò)要大。因此,集中式的事務(wù)管理也是不實(shí)際的,尤其是有些數(shù)據(jù)存儲在移動計算機(jī)上更是這樣。事實(shí)上,我們可能不得不放棄具有ACID特性的事務(wù),而是為用戶程序開發(fā)其他的一致性方法。</p><p><b>  主存數(shù)

99、據(jù)庫</b></p><p>  由于主存的價格已經(jīng)很便宜了,許多應(yīng)有可以購買足夠的主存來保存整個數(shù)據(jù)庫。而且,現(xiàn)代的CPU使用64位的尋址,有了很大的尋址空間。一些商用的數(shù)據(jù)庫已有幾個GB的主存。這促使重新考慮一些基本的DBMS設(shè)計決策,因?yàn)閷τ隈v留內(nèi)存的數(shù)據(jù)庫,磁盤訪問不再是主要的處理時間。</p><p>  主存也不能幸免于系統(tǒng)崩潰,所以仍然需要實(shí)現(xiàn)日志和恢復(fù)機(jī)制,以保

100、證事務(wù)的原子性和持久性。在提交事務(wù)的時候,日志記錄必須寫到固定的存儲中。這個處理可能會變成一個瓶頸。為了使該問題最小化,不是每完成一個事務(wù)就進(jìn)行提交,而是收集已完成的事務(wù),然后批量提交它們,這稱為組提交?;謴?fù)算法同樣也需要優(yōu)化,因?yàn)楹苌龠@種情況:某頁因需要為其他頁省出空間而不得不被移出。</p><p>  應(yīng)認(rèn)真優(yōu)化主存中的操作實(shí)現(xiàn),因?yàn)榇疟P存取不再是限制性能的要素。</p><p> 

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論