計(jì)算機(jī)專業(yè)畢業(yè)設(shè)計(jì)外文翻譯資料--sql 查詢中的語意錯(cuò)誤_第1頁
已閱讀1頁,還剩28頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、<p><b>  專業(yè)外文翻譯</b></p><p> 題 目Semantic errors in SQL queries:</p><p> A quite complete list</p><p> 系 (院)計(jì)算機(jī)科學(xué)技術(shù)系</p><p> 專 業(yè)計(jì)算機(jī)科學(xué)與技術(shù)</p>

2、<p> 班 級</p><p> 學(xué)生姓名</p><p> 學(xué) 號</p><p> 指導(dǎo)教師</p><p> 職 稱副教授</p><p>  Semantic errors in SQL queries:</p><p>  A quite comple

3、te list </p><p><b>  Abstract </b></p><p>  We investigate classes of SQL queries which are syntactically correct, but certainly not intended, no matter for which task the query was

4、written. For instance, queries that are contradictory, i.e. always return the empty set, are obviously not intended. However, current database management systems (DBMS) execute such queries without any warning. In this p

5、aper, we give an extensive list of conditions that are strong indications of semantic errors. Of course, questions like the satisfiability are</p><p>  Keywords: Databases; SQL; Queries; Bugs; Errors; Semant

6、ic errors; Logical errors; </p><p>  1. Introduction </p><p>  SQL is today the standard language for relational and object-relational databases. Application programs typically contain a relativ

7、ely large number of SQL queries and updates, which are sent to the DBMS for execution. As any program code, SQL queries can contain errors (Updates are not considered in this paper, but they are usually much simpler than

8、 queries.). </p><p>  Errors in SQL queries can be classified into syntactic errors and semantic errors. A syntactic error means that the entered character string is not valid SQL. </p><p>  The

9、n any DBMS will print an error message because it cannot execute the query. Thus, the error is certainly detected and usually easy to correct. </p><p>  A semantic error means that a legal SQL query was ente

10、red, but the query does not or not always produce the intended results, and is therefore incorrect for the given task. Semantic errors can be further classified into cases where the task must be known in order to detect

11、that the query is incorrect, and cases where there is sufficient evidence that the query is incorrect no matter what the task is. Our focus in this paper is on this latter class, since there is often no independent speci

12、ficati</p><p>  SELECT * </p><p><b>  FROM EMP </b></p><p>  WHERE JOB = ‘CLERK’ AND JOB = ‘MANAGER’</p><p>  This is a legal SQL query, and it is executed

13、 e.g. in Oracle 9i and DB2 V8.1 without any warning. But the condition is inconsistent: The query result will be always empty. Since nobody would use a database in order to geta certainly empty result, we can state that

14、this query is incorrect without actually knowing the task of the query. Such cases do happen, e.g. in one exam exercise that we analyzed, 10 out of 70 students wrote an inconsistent condition. </p><p>  It i

15、s well known that the consistency of formulas is undecidable, and that this applies also to database queries. However, although the task is in general undecidable, many cases that occur in practice can be detected with r

16、elatively simple algorithms. </p><p>  Our work is also inspired by the program lint, which is or was a semantic checker for the ‘‘C’’ programming language. Today C compilers do most of the checks that lint

17、was developed for, but in earlier times, C compilers checked just enough so that theycould generate machine code. We are still at this development stage with SQL today. Printing warnings for strange SQL queries is very u

18、ncommon in current database management systems. </p><p>  We currently develop a similar tool for SQL queries (called sqllint). We believe that such a tool would be useful not only in teaching, but also in a

19、pplication software development. At least, a good error message could speed up the debugging process. Furthermore, runtime errors are possible in SQL, e.g., in some contexts, SQL queries or subqueries must return not mor

20、e than one row. The occurrence of this error depends on the database state (the data), therefore it is not necessarily found during</p><p>  While the title of this paper is ‘‘Semantic Errors’’, people from

21、compiler construction have advised us that what we compute are really ‘‘warnings’’, since the queries are still executable, andin some cases only ‘‘notices’’ about bad style. Nevertheless, this information would help to

22、improve the software quality of database application programs, which are a significant subset of the software developed today. </p><p>  The main contribution of this paper is a list of semantic errors that

23、represents years of experience while correctin hundreds of exams that contained SQL queries. However, we have also tried to explain the general principles from which these errors can be derived (as far as possible). Ther

24、efore, it is not simply by chance whether an error appears on our list, but the list has a certain degree of completeness (except possibly in Section 4). </p><p>  While our experience so far has only been w

25、ith errors made by students, not professional programmers, most of the students will become programmers, and they will not immediately make fewer errors. In the exam solutions that we analyzed, 24% contained a syntax err

26、or, 18% contained a semantic error of the type we consider in this paper (detectable without knowledge of the task), 11% contained both, and 10% contained a semantic error that was only detectable if the task was known (

27、for more details,</p><p>  The paper is structured by reasons why SQL queries can be suspicious: Unnecessary complications (Section 2), inefficient formulations (Section 3), violationsof standard patterns (S

28、ection 4), many duplicates (Section 5), andthe possibilityof runtime errors (Section 6). Furthermore we suggest some style checksin Section 7.While we do not have space to give algorithms for all errors (and many are qui

29、te obvious), we show how to detect two representative errors in Section 8. Section 9 contains some s</p><p>  In the examples, we use a database schema for storing information about employees and departments

30、: </p><p>  EMP(EMPNO, ENAME, JOB, SAL, COMM, MGR ->EMP, DEPTNO->DEPT) </p><p>  DEPT(DEPTNO, DNAME, LOC) </p><p>  This is a slightly modified version of an example schema that com

31、es with the Oracle DBMS. The column MGR can be null. </p><p>  2. Unnecessary complications </p><p>  Queries can be considered as ‘‘probably not intended’’ when they are unnecessarily complica

32、ted. Suppose the user wrote a query Q, and there is an equivalent query that is significantly simpler, and can be derived from Q by deleting certain parts. There might be the following reasons why the user did not write

33、 :</p><p>  The user knew that is not a correct formulation of the task at hand. In this case, Q is of course also not correct, but the error might be hidden in the more complicated query, so that the user d

34、id not realize this. A warning would certainly be helpful in this case. </p><p>  The user did not know that is equivalent. Since is not a completely different query, but results from Q by deleting certain

35、 parts, this shows that the user does not yet master SQL. Again, a warning would be helpful. Often, the simpler query will actually run faster (at least one widely used commercial DBMS does not remove unnecessary joins).

36、 </p><p>  The user knew that is equivalent, but he or she believed that Q would run faster. Since SQL is a declarative language this should only be the last resort.With modern optimizers, this should not h

37、appen often in practice. If it is necessary, there probably should be some comment, and this could also be used to shut off the warning. Although we know at least one case where a more complicated query actually does run

38、 faster on Oracle 9i, SQL does not make any guarantees about how a query is evaluate</p><p>  Note that in this point SQL differs from usual programming languages: Of course, a C program of 100 lines might b

39、e better than a program of 10 lines for the same task, because the longer program runs significantly faster. But as just explained, for SQL the relative speed is not forseeable, and any such code optimizations are at lea

40、st fragile. </p><p>  The user knew that is equivalent, but thought that Q would be clearer for the human reader and easier to maintain. One must be careful to define the possible transformations from Q to

41、 such that this does not happen. For instance, it might be clearer to use explicit tuple variables in attribute references, even if the attribute name is unique. Removing the tuple variable in this case cannot be conside

42、red as producing a different, shorter query. Obviously, we would also not require that meaningf</p><p>  Actually, ‘‘equivalence’’ in the sense of requiring exactly the same query result in all database stat

43、es would make the condition still too strict. </p><p>  (1) First, we not only want to minimize the query, but also the query result. The following query is quite typical for beginning SQL programmers: </

44、p><p>  SELECT EMPNO, ENAME, JOB </p><p><b>  FROM EMP </b></p><p>  WHERE JOB = ‘MANAGER’ </p><p>  The last column in the query result is superfluous, we kno

45、w that it must always be ‘‘MANAGER’’. Therefore, no information is lost when this column is removed. Of course, trying to minimize the query result without loss of information does not mean that we apply compression algo

46、rithms or difficult encodings. The important requirementis that from the shorter query result, the user can reconstruct the original query result with ‘‘very little intellectual effort’’—less than what would be required

47、fo</p><p>  OLD_RESULT = </p><p>  (2) Furthermore, it is betterto exclude certain unusual states when we require that the result of both queries(Q and )is the same. For example, it happens som

48、etimes that students declare a tuple variable, and then do not use it and forget to delete it: </p><p>  SELECT DISTINCT DNAME </p><p>  FROM DEPT, EMP </p><p>  The ‘‘DISTINCT’’ is

49、 also a typical example where the wrong patch was applied to a problem noticed by the student (many duplicates). The above query returns always the same result as this one, except when EMP = ; </p><p>  SELE

50、CT DISTINCT DNAME </p><p>  FROM DEPT </p><p>  Therefore, we will require the equivalence onlyfor states in which all relations are non-empty. It might even be possible to assume that all colum

51、ns contain at least two different values. </p><p>  (3) Some types of errors produce many duplicates. More powerful query simplifications can be used if these duplicates are not considered as important for t

52、he equivalence (at least if the simpler query produces less duplicates than the more difficult query Q). E.g. in the above example, we would want to delete the unused tuple variable even if DISTINCT were not specified.

53、Duplicates are further considered in Section 5. </p><p>  Now we give a list of all cases in which a query can be obviouslysimplified under this slightlyweakened notion of equivalence. In each of these cases

54、, a warning should be given to the user. </p><p>  2.1. Entire query unnecessary </p><p>  Error 1. Inconsistent condition </p><p>  Nobody would pose a query if he or she knew befo

55、rehand that the query result is empty, no matter what the database state is. An example was given in the introduction. </p><p>  Note that it is also possible that the WHERE-clause itself is consistent, but

56、it contradicts a constraint in the database schema. For instance, since SQL has no enumeration types, one typcially restricts columns of type ‘‘CHAR’’ with constraints like </p><p>  CHECK(SEX = ‘M’ OR SEX =

57、 ‘F’) </p><p>  Now the condition SEX = ‘W’ as part of a large conjunction in the WHERE-clause would immediately make the query inconsistent. Yet, no DBMS we are aware of would give a warning. </p>&l

58、t;p>  As mentioned before, the problem is in general undecidable (Abiteboul et al., 1994). But e.g. for SQL queries that contain only one level of subqueries and no aggregations, it is decidable (see, e.g., Brass and

59、Goldberg, 2005). In general, one could apply heuristic assumptions (e.g., that a certain number of tuples per relation suffices) to generate warnings that are not always accurate, but still useful. Thereisa large bodyof

60、literature about satisfiability tests and model construction, alsoin </p><p>  In general, one could also construct other queries that have a constant result for all database states (under the assumption tha

61、t relations are not empty). But this is at least very uncommon (it did not occur in the analyzed exams). </p><p>  2.2. SELECT clause </p><p>  Error 2. Unnecessary DISTINCT </p><p>

62、;  One should use an explicit duplicate elimination only if necessary. It sometimes can be proven that a query cannot return duplicates, e.g. </p><p>  SELECT DISTINCT EMPNO, ENAME, JOB </p><p>

63、<b>  FROM EMP </b></p><p>  Because EMPNO isa key of EMP, the query could not return duplicates, even without DISTINCT. Then DISTINCT should not be used, because the query then will run slower (m

64、any optimizers of current DBMS do not remove the unnecessary duplicate elimination). </p><p>  Theoretitions sometimes recommend to write always ‘‘DISTINCT’’, but that shadows possible errors:Whena query doe

65、s produce duplicates, it is often helpful to understand why. </p><p>  Algorithms for this error are discussed in Section 8. </p><p>  Error 3. Constant output column </p><p>  An o

66、utput column is unnecessary if it contains a single value that is constant and can be derivedfrom the query without any knowledge about the database state. This was already illustrated at the beginning of this section. &

67、lt;/p><p>  Of course, if a constant (datatype literal) is written as a term in the SELECT-list, this is obviously intended, and no warning should be given. Also SELECT * might yield a constant column, yet is s

68、horter than listing the remaining columns explicitly, and again, no warning should be printed. </p><p>  However, at least the followingtypical case should be caught: The conjunctive normal form of the WHERE

69、-clause contains A = c with an attribute A and a constant c as one part of the conjunction, and A also appears as a term in the SELECT-list. Of course, if A = B appears in addition to A = c, also the value of B is obviou

70、s, and using B in the SELECT-list should generate a warning. And so on. </p><p>  With a higher warning level, one should run a model generator, and for each output column A run the model generator again wit

71、h the additional condition A <> c, where c is the value of A in the first model. This would ensure that every output column can generate at least two different values (in possibly different database states). </p

72、><p>  Error 4. Duplicate output column </p><p>  An output column is also unnecessary if it is always identical to another output column. </p><p>  2.3. FROM clause: unnecessary tuple

73、 variables </p><p>  The next three errors are cases where tuple variables are declared under FROM that are not really necessary. </p><p>  Error 5. Unused tuple variable </p><p>  

74、It is a simple syntax check to ensure that all tuple variables declared under FROM are really accessed somewhere in the query. If this condition is violated, one would typically also get Error 27 (missing join condition)

75、, but since ‘‘forgetting’’ a declared tuple variable happens quite often (it is Number 5 on our Top 10 list), a more specific error message might be preferable. (See also the discussion about equivalence at the beginning

76、 of this section.) </p><p>  Error 6. Unnecessary join </p><p>  If only the key attributes of a tuple variable X are accessed, and thiskeyis equated with the foreignkey of another tuple variabl

77、e Y, X is not needed. </p><p>  Since joins are an expensive operation, there is a large body of literature about join elimination in query optimization (see, e.g., Aho et al., 1979; Popa et al., 2000). Howe

78、ver, at least one widely used commercial system does not seem to do a join elimination. Furthermore, one could argue that this is not the purpose of the query optimizer, at least if the join explicitly appears in the que

79、ry (and not as part of a view definition that is used in the query). An optimizer might well work with th</p><p>  But even when the query optimizer solves the performance problem, the query will look simple

80、r with one fewer tuple variable, therefore that formulation is preferable. This error is number 4 on our Top 10 list. </p><p>  Another case of unnecessary join is that two tuple variables are declared over

81、the same relation, but one can merge them without changing the semantics of the query. In this case, the unnecessaryjoin is very likely an indication for a real error. An extreme case of this is the next error type. <

82、/p><p>  Error 7. Tuple variables are always identical </p><p>  If the key attributes of two tuple variables X and Y over the same relation are equated, the two tuple variables must always point t

83、o the same tuple. Then the two tuple variables can be merged. In a higher warning level, one could use a model generator to check for every pair of distinct tuple variables over the same relation that there are solutions

84、 with different values for the primary key. </p><p>  Of course, this is formally a special case of Error 6 (unnecessary join), and it leads often to Error 1 (inconsistent condition). However, since it still

85、 appears quite often, it deserves a more specific warning. </p><p>  2.4. WHERE clause </p><p>  Error 8. Implied, tautological, or inconsistent sub-condition </p><p>  The WHERE-co

86、ndition is unnecessarily complicated if a subcondition (some node in the operator tree) can be replaced by TRUE or FALSE and the condition is still equivalent. E.g. it happens sometimes that a condition is tested under W

87、HERE that is actually a constraint on the relation. </p><p>  This condition becomes even more strict if it is applied not to the given formula, but to the DNF of the formula. Then the check for unnecessary

88、logical complications can be easily reduced to a series of consistency tests. Let the DNF of the WHERE condition be with, Then no part of the condition is unnecessary if the following formulas are all consistent: </p&

89、gt;<p>  (1) , the negation of the entire formula (otherwise the entire formula could be replaced by TRUE), </p><p>  (2) , for (otherwise could be replaced by FALSE), </p><p>  (3) fo

90、r , , (otherwise Ai,j could be replaced by TRUE). </p><p>  Let us explain this a bit. There are really six cases to consider: </p><p>  (a) The entire formula can be replaced by FALSE, i.e. it

91、 is inconsistent. This is Error 1 above. </p><p>  (b) The entire formula can be replaced by TRUE, i.e.itisa tautology. Thisis checked with condition (1) above: The negation of the formula must have a model

92、. </p><p>  (c) One of the can be replaced by FALSE, i.e. removed from the disjunction. For example, </p><p>  SAL > 500 OR SAL > 700 is equivalent to SAL > 500. This is checked with c

93、ondition (2) above: For each Ci, there must be a model in which only is true, but all other, are false. </p><p>  (d) One of the can be replaced by TRUE. In this case, the entire disjunction can be replace

94、d by TRUE, thus condition (1) checks also this case. </p><p>  (e) One of thecan be replaced by FALSE. In this case, the entire conjunctioncan be replaced by FALSE. Thus condition (2) also checks this case.

95、</p><p>  (f) One of the can be replaced by TRUE, i.e. it can be removed from the conjunction. For instance, consider (SAL < 500 AND COMM > 1000)OR SAL >= 500. This can be simplified to ‘‘COMM >

96、1000 OR SAL >= 500’’, i.e. the underlined condition can be replaced by TRUE. Checking for such cases is the purpose of condition (3): It must be possible that makes the conjunction false, when the rest of the conjun

97、ction is true, and the rest of the disjunction is false. This ensures that the truth value ‘‘FALSE</p><p>  Error 9. Comparison with NULL </p><p>  At least in Oracle, it is syntactically valid

98、to write A = NULL, but this condition has a constant truth value (null/unknown). In other systems, this would be a syntax error. </p><p>  Error 10. NULL value in IN/ANY/ALL subquery </p><p>  I

99、Nconditions and quantified comparisons with subqueries (ANY, ALL) can have a possibly surprisingbehaviour if the subquery returns a null value (among other values). An IN and ANY condition is then null/ unknown, when it

100、otherwise would be false, and an ALL condition is null/unknown, when it otherwise would be true (ALL is treated like a large conjunction, and IN/ ANYlike a large disjunction). For instance, the result of the following qu

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論