C語言的副作用與序列點－主要步驟的部落格

副作用與序列點

int a = i++;

變量a取得i在自增1之前的值, 表達式i++的正作用是產生i的值(左值), 副作用是要保證i自增1. 地球人都知道. 下面是地球人不一定都知道的: mov eax, [esp-12] ; 取變量i的值

mov [esp-16], eax ; 將取得的變量i的值存入變量a

inc [esp-12] ; 讓變量i自增1

第三條語句與第二條語句互換也是完全可以的. 這在一條簡單語句裏沒什麽稀奇; int i = 1;

int a = (i++) + (i++);

即使我沒有寫很變態的 i+++++i, 即使我多余地加了括號以正視聽, 這個語句的值還是暗藏玄機, 問題在於它不止有一個符合C語言標準的值. 它可以是2, 3. 因為語言標準對這種情況副作用於何時發生未作規定, 編譯器可以任意決定. mov eax, [esp-12] ; 取變量i的值

inc [esp-12] ; 讓變量i自增1

mov ebx, [esp-12] ; 取變量i的值

inc [esp-12] ; 讓變量i自增1

add eax, ebx ; 相加

mov [esp-16], eax ; 結果存入a

這樣得到3. mov eax, [esp-12] ; 取變量i的值

mov ebx, [esp-12] ; 取變量i的值

inc [esp-12] ; 讓變量i自增1

add eax, ebx ; 相加

mov [esp-16], eax ; 結果存入a

這樣得到2.

C99標準中這樣說:

2 Accessing a volatile object, modifying an object, modifying a file, or calling a function

that does any of those operations are all side effects,11) which are changes in the state of

the execution environment. Evaluation of an expression may produce side effects. At

certain specified points in the execution sequence called sequence points, all side effects

of previous evaluations shall be complete and no side effects of subsequent evaluations

shall have taken place. (A summary of the sequence points is given in annex C.)

對sequence point的定義是:

Sequence points

1 The following are the sequence points described in 5.1.2.3:

--- The call to a function, after the arguments have been evaluated (6.5.2.2).

--- The end of the first operand of the following operators: logical AND && (6.5.13);

logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).

--- The end of a full declarator: declarators (6.7.5);

--- The end of a full expression: an initializer (6.7.8); the expression in an expression

statement (6.8.3); the controlling expression of a selection statement (if or switch)

(6.8.4); the controlling expression of a while or do statement (6.8.5); each of the

expressions of a for statement (6.8.5.3); the expression in a return statement

(6.8.6.4).

--- Immediately before a library function returns (7.1.4).

--- After the actions associated with each formatted input/output function conversion

specifier (7.19.6, 7.24.2).

--- Immediately before and immediately after each call to a comparison function, and

also between any call to a comparison function and any movement of the objects

passed as arguments to that call (7.20.5).

為什麽叫序列點, 我猜想這個術語的選擇是基於這樣的考慮: 底層最終負責執行的機器(或C假想中的一個C語言執行機)需要以更原始的操作來實現C語言中的一條語句. 這些操作當然與C的高級語句不一定是一一對應的關系, 所以需要確定在這些原始操作操作中的一些特殊的點, 當執行流到這樣的特殊點時, 恰好對應一個C語言語句或表達式完成了它的全部語意(包括副作用). 在這些點上C語言的表達式是意義完整的. 如 int a = i++; mov eax, [esp-12] ; 取變量i的值

mov [esp-16], eax ; 將取得的變量i的值存入變量a

inc [esp-12] ; 讓變量i自增1

執行點不能是在第1或第2條語句之後, 因為此時無法確定i++的狀態.

對序列點的精確定義確定了在什麽樣的範圍內同一個對象的副作用發生多次時其結果是標準未加規定的: 標準規定，在兩個序列點之間，一個對象所保存值最多只能被修改一次。

對這個"只能被修改一次", 可以做下面的理解:

如果通過副作用在兩個序列點之間修改了同一個對象兩次, 程序執行時只會修改它一次.

程序員只能修改一次, 修改多次時(1)編譯時會報錯(2)運行時會怎樣?

應該是, 在兩個序列點之間, 如果對一個對象進行了多次修改, 則其行為是未定義的(undefined) Between two sequence points, an object is modified more than once, or is modified

and the prior value is read other than to determine the value to be stored (6.5).

包括了這樣的情況 a = i + i++;

上面的意思是在兩個序列點之間, 一個對象被修改多於一次, 或者(雖然只被修改一次)被修改的同時還被讀取了, 而且這個讀取並非用於修改該對象. 上面i++的實現如果是通過(1)讀取i的值到寄存器中(2)將寄存器中的值加1(3)將寄存器中的值存回變量i所在的存儲單元實現的, 則步驟(1)中的讀取就是"用於修改該對象"的讀取. 標準中的那句話即指如果在兩個序列點之間, 除了這次讀取還有其它的讀取, 那麽即使只修改對象一次, 其行為也是未定義的. 在i + i++中, 作為+號運算符的第一個運算子i值的獲取就需要一次"讀", 而這次讀不是用於修改i值的那次.

我相信這一點對於即使是了解序列點概念的人來說, 比起相鄰兩個序列點之間對同一對象的多於一次修改更為陰險.

註意上面的(1)(2)(3)步驟實現i++是完全可能的, 這是因為並非所有的機器指令集都如Intel的那樣支持對一個內存單元的內容直接增1, 很有可能對內存單元的任何修改都必需通過寄存器(比如典型的RISC指令集 MIPS)