numactl 使用說明－主要步驟的部落格

Name
numactl - Control NUMA policy for processes or shared memory
numactl - 用於控制進程與共享存儲的 NUMA 技術機制
NUMA -- (Non Uniform Memory Access Architecture) 非統一內存訪問架構
Synopsis
概述
numactl [--interleave nodes] [--preferred node] [--membind nodes]
[--cpunodebind nodes] [--physcpubind cpus] [--localalloc] [--] {arguments ...}
numactl --show
numactl --hardware
numactl [--huge] [--offset offset] [--shmmode shmmode] [--length length] [--strict]
[--shmid id] --shm shmkeyfile | --file tmpfsfile
[--touch] [--dump] [--dump-nodes] memory policy
Description
使用方法介紹
numactl runs processes with a specific NUMA scheduling or memory placement policy.
使用 numactl 命令啟動的進程會按照特定的 NUMA 調度機制或是在指定的內存空間位置上運行.
The policy is set for command and inherited by all of its children .
讓進程按照NUMA 調度機制運行是通過命令行來實現的，而被設定進程得子進程也會遵照 NUMA 調度機制運行.
In addition it can set persistent policy for shared memory segments or files.
除了將 NUMA 調度機制設置在進程上之外，還可通過將該 NUMA 調度機制作用在共享內存段或是文件對象上來實現 NUMA策略的持久化.
/ 也就是把可持久化的對象(內存空間或是文件實體)設定成符合 NUMA調度機制的實體，這樣運行在該塊空間上的所有數據的調度都可以按照 NUMA 策略來執行
Use
-- before command if using command options that could be confused with numactl options.
numactl 命令的使用方法
-- 在使用該命令之前，如果不了解每個命令選項的確切含義的話，只會是使用命令使用者徒增麻煩
Policy settings are :
關於 numactl 命令的不同選項解釋如下
--interleave=nodes, -i nodes
Set a memory interleave policy. Memory will be allocated using round robin on nodes.
這個選項用於設定內存的交織分配模式。也就是說系統在為多個節點分配內存空間的時候，將會以輪詢分發的方式被分配給這多個節點.
When memory cannot be allocated on the current interleave target fall back to other nodes.
如果在當前眾多的交織分配內存節點中的目標節點無法正確的分配內存空間的話，內存空間將會由其他的節點來分配。
Multiple nodes may specified on --interleave, --membind and --cpunodebind.
多節點可以通過 --interleave , --membind 和 --cpunodebind 命令來指定.
You may specify "all" , which means all nodes in the current cpuset. nodes may be specified as N,N,N or N-N or N, N-N or N-N , N-N and so forth.
也可以通過使用 'all' 這個選項， 'all' 選項是用來選定在當前運行的 cpu 集合中的所有節點. 節點可以通過 N,N,N 或是 N-N 或是 N,N-N 或者是 N-N , N-N 諸如此類的
方式來指定
/ 其中 N,N,N --> node1,node2,node0 可以看做是將 numa 調度機制施加到 node1, node2, node3 的上面
/ 而 N-N or N ---> node1-node7, node12 可以看做是將 numa 機制施加到 [node1,node7] 和單獨的 node12 節點上面，
/   後面的 N-N 等等都是在此基礎上演變而來的
Relative nodes may be specified as +N,N,N or +N-N or +N,N-N and so forth.
如果節點之間具有相關性的話，是可以通過 +N,N,N 或者是 +N-N 或者是 +N,N-N 這種設定方式來表明這些節點之間是有相關性的.
The +indicates that the node numbers are relative to the process' set of allowed cpu in its current cpuset .
上面命令中使用 '+' 符號作為修飾的節點與被準許留在在當前 cpu 集合中的cpu 上所運行的進程是一一對應的.
A !N-N notation indicates the inverse of N-N , in other words all nodes except N-N.
!N-N 這種表述方式表明的是 N-N 含義的翻轉，也就是說，某個命令將會被施加在除了 N-N 這一區間內的節點以外的所有節點集上面
If used with + notation , specify !+N-N .
如果在使用 '!' 符號的同時，也需要使用 '+' 符號的話，按照 '!+N-N' 這個順序來使用。
--membind=nodes, -m nodes
Only allocate memory from nodes. Allocation will fail when there is not enough memory available on these nodes.
選項 '--membind' 僅用來從節點中分配內存空間所用。如果在這些節點中無法分配出所請求的空間大小的話該分配操作將會失敗.
nodes may be specified as noted above .
上述命令中指定需要分配空間的 nodes 的方式可以遵照上述 N,N,N , N-N ,N 這種方式來指定.
--cpunodebind=nodes, -N nodes
Only execute process on cpus. This accepts cpu numbers as shown in the processor fields of /proc/cpuinfo , or relative cpus as in relative to the current cpuset.
上述命令僅用於施加在運行與 cpu 上的進程。這個命令用於顯示 cpu 的個數，cpu 數目信息同樣記錄在系統中的存放處理器領域信息的 /proc/cpuinfo 文件夾下，
或者是按照關聯的中央處理器信息在當前的中央處理器集中所存放.
You may specify "all" , which means all cpus in the current cpuset .
使用者可以通過制定 "all" 這個選項來選定當前中央處理器集合中的所有中央處理器對象.
Physical cpus may be specified as N,N,N or N-N or N, N-N or N-N,N-N and so forth .
物理(而非程序中所抽象的) cpu 實體也可以按照 N,N,N 或是 N-N 或者是 N,N-N 或是 N-N,N-N 諸多選定方式進行選定.
Relative cpus may be specified as +N,N,N or +N-N or +N,N-N and so forth.
相互關聯的 cpu 實體也可以按照 +N,N,N 或是 +N-N , 或 +N,N-N 來選定並表明他們之間的關系.
The +indicates that the cpu numbers are relative to the process' set of allowed cpu in its current cpuset.
上述命令中的 '+' 符號用於表明中央處理器的標號與當前其所在的中央處理器集合中某個 cpu 上運行的進程集合中的進程之間是有一定的關聯的。
A !N-N notation indicates that the cpu inverse of N-N , in other words all cpus except N-N if used with + notation , specify !+N-N .
!N-N 這種表述方式表明的是 N-N 含義的翻轉，也就是說，某個命令將會被施加在除了 N-N 這一區間內的節點以外的所有節點集上
--localalloc , -l
Always allocate on the current node.
這個命令選項通常是為當前的節點分配內存的
--preferred=node
Preferably allocate memory on node, but if memory cannot be allocated there fallback to other nodes.
該命令由於指定優先分配內存空間的節點，如果無法將空間分配給該節點的話，應該分配給該節點上的空間將會被分發到其他的節點上
This option takes only a single node number . Relative notation may be used.
該命令選項後面僅接收一個單獨的節點標號. 相關的表示方式也可以使用.
--show,-s
Show NUMA policy setting of the current process.
該命令用於顯示 NUMA 機制作用在當前運行的那些進程上
--hardware , -H
Show inventory of available nodes on the system.
該命令用於顯示當前系統中有多少個可用的節點.
Numactl can set up policy for a SYSV shared memory segmemnt or a file in shmfs/hugetlbfs.
numactl 命令也可以作用在系統級的共享內存段上或者是施加在 shmfs/hugetlbfs 的文件上
This policy is persistent and will be used by all mappings from that shared memory .
numa 的這種機制是可以被持久化的，並且將被施加 numa 機制的共享內存映射到自己地址空間中的進程都具有 numa 機制
The order of options matters here.
調用命令時所使用的參數選項按照下面的順序來使用
The specification must at least include either of --shm , --shmid , --file to specify the shared memory segment or file
         and a memory policy like described above ( --interleave , --localalloc , --preferred , --membind) .
在使用命令的時候必須至少包括 --shm ,--shmid , --file 來表明numa 機制作用在共享內存段上還是文件上面，
並且還要指定上面介紹過的操作內存的方式 (通過這些選項來指定 --interleave, --localalloc, --preferred , --membind )

--huge
When creating a SYSV shared memory segment use huge pages . Only valid before --shmid or --shm
當創建一個基於大內存頁面的系統級的共享內存段的時候,使用 --huge 這個選項。
--huge 選項僅在 --shmid 或是 --shm 命令的後面使用才有效.
--offset
Specify offset into the shared memory segment. Default 0. Valid uints are m (for MB), g (for GB) , k (for KB) , otherwise it specifies bytes.
該參數選項用於指定共享內存段中的位移量的偏移。默認的情況下偏移量是 0 。有效的偏移量單位是 m (用於表示 MB)
g (用於表示 GB) , k (用於表示 KB ), 其他沒有指定的被認為是以字節為單位.
--strict
Given an error when a page in the policied area in the shared memory segment already was faulted in with a conflicting policy .
Default is to silently ignore this .
這個參數選項當施加了 NUMA 調度機制的共享內存段區域的頁面被施加了另一個機制而導致錯誤的時候，
使用 --strict 選項將會把錯誤信息顯示出來. 默認情況是不使用該選項的。
--shmmode shmmode
Only valid before --shmid or --shm When creating a shared memory segment set it to numeric mode shmmode .
該選項僅在 --shmid 或是 --shm 之前使用才會生效。當創建一個共享內存段的時候，通過整型數值來指定
共享內存的共享的模式類型.
--length length
Apply policy to length range in the shared memory segment or make the segment length long Default is to use the remaining
length Required when a shared memory segment is created and specifies the length of the new segment then .
Valid units are m ( for MB ) , g( for GB) , k ( for KB) , otherwise it specifies bytes.
--shmid id
Create or use an shared memory segment with numeric ID id .
通過ID 號碼來創建或使用一個共享內存段。
(如果共享內存段已經存在，那麽通過 shmid 來指定下面要使用某個 ID 的共享內存段 ; 如果該 ID 對應的共享內存段並不存在的話，那麽就創建一個)
--shm shmkeyfile
Create or use an shared memory segment , with the ID generate using fork(3) from shmkeyfile
通過存放在 shmkeyfile（共享內存-鍵文件）中的 ID 號碼來創建或者是使用一個共享內存段。
訪問 shmkeyfile 文件的進程是通過 fork(3 arguments) 方法來實現的.
--file tmpfsfile
Set policy for a file in tmpfs or hugetlbfs
將 numa 機制施加於文件上面, 這個文件屬於 tmpfs或者是 hugetlbfs 這種特殊的文件系統
--touch
Touch pages to enforce policy early. Default is to not touch them , the policy is applied when a applications maps and accesses a page.
通過將 numa 機制施加於剛剛頁面上來實現內存的早期 numa 化。
默認情況下是不使用該選項，如果存在映射或是訪問頁面的應用的話，將會使用該早期實行 NUMA 機制的這種方法.
--dump
Dump policy in the specified range .
該選項用於廢除將已經 numa 化的特定區域上的 NUMA性質.
Valid node specifiers
all                             All nodes
number                     Node number
number1{,number2} Node number1 and Node number2
number1-number2     Nodes from number1 to number2
!nodes                     Invert selection of the following specification .
(--dump ) 選項後,有效指定 node 的書寫方式
all 用於將所有的節點上的 NUMA 特性移除
number 通過指定 node 後接的數值來廢除該數字對應的 node
number1(number2) node number1（node number2）上的 NUMA 特性將會被移除
number1-number2    node number1 -- node number2 區間上的所有存在的 node 的 NUMA 特性將會被移除
!nodes                除了 nodes 所指定的節點以外的所有節點上的 NUMA 特性全都會被移除
Examples
使用示例
numactl --physcpubind=+0-4,8-12 myapplic arguments
Run myapplic on cpus 0-4 and 8-12 of the current cpuset .
上述的這個命令用於將 myapplic 程序進程運行在當前中央處理器集合的 0-4 和 8-12 標號的cpu 上面.
numactl --interleave=all bigdatabase arguments
Run big database with its memory interleaved on all CPUs.
將大型數據庫系統進程，並且允許該進程使用所有 cpu 交替分配的內存空間.（round robin 內存分配機制）
numactl --cpubind=0 --membind=0,1 process
Run process on node 0 with memory allocated on node0 and 1 .
上述的命令是將進程運行在 node0 上面，而進程運行所需要的內存空間則是從 node0 和 node1 上面分配而來的
numactl --cpubind=0 --membind=0,1 --process -l
Run process as above, but with an option (-l) that would be confused with a numactl option .
運行上述的程序，但是這個命令中有一個 -l 的選項可能會和 numactl 的設置選項弄混淆
numactl --preferred=1 numactl --show
Set preferred node 1 and show the resulting state.
將 node1 這個節點設置為優先運行進程的節點，設置之後調用 --show 參數來顯示當前進程運行的狀態信息
numactl --interleave=all --shmkeyfile /tmp/shmkey
interleave all of the sysv shared memory region specified by /tmp/shmkey over all nodes.
交織/交替將系統中所有在 /tmp/shmkey 文件夾下創建的系統級的共享內存域(因為內存區間段可能有多個，並且有可能出現不連續的，這個命令是將其全部選上)
分配給系統中的所有節點
numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch
Bind the second gigabyte in the tmpfs file /dev/shm/A to node1.
在 tmpfs 文件系統中文件夾 /dev/shm/ 下面創建 1 GB 大小的文件，並將該文件命名為 A ，然後將該 1GB 的文件與 node 1 進行綁定.
/ 也就是說 node1 中使用的內存是來自於文件系統中的文件.
numactl --localalloc /dev/shm/file
Reset the policy for the shared memory file to the default localalloc policy.
將共享文件的 NUMA 特性重置成本地默認調度機制
Notes
註意
Requires an NUMA policy aware kernel .
執行 numactl 命令之前，需要明確知道當前主機的內核是支持 NUMA 技術的.
Command is not executed using s shell. If you want to use shell metacharacters in the child use sh -c as wrapper.
上述命令在 shell 腳本中無法被執行。如果想要在子腳本中使用 shell 元字符的話需要在前面加上用作封裝器的 -c 選項
Setting policy for hugetlbfs file does currently not work because it cannot be extended by truncate.
將 NUMA 機制施加於 hugetblfs 文件系統中的文件上的時是不會立即生效的，因為 hugetblfs 這個文件系統無法以 truncate 的方式打開文件.
/在普通文件系統中，truncate 打開文件的方式是這樣的 : 當文件不在指定路徑下存在的時候，會在該路徑下創建該文件，然後將其打開;
/ 當文件存在的時候，會自己打開該文件
Shared memory segments larger than numactl's address space cannot be completely policied.
如果共享內存段的大小要大於執行 numactl 這一命令進程的地址空間要打的話，不能夠確保全部共享內存區域都具有 NUMA 機制.
This could be a problem on 32bit architectures.
在 32 位架構主機上就有可能出現這個問題（共享內存段無法完全的實現 NUMA 機制）.
Changing it piece by piece may work .
把(範圍區間十分大的)共享內存段分割成多個小段，然後一段一段的 NUMA 化可能會解決這個問題.
The old --cpubind which accepts node numbers, not cpu numbers, is deprecated and replaced with the new --cpunodebind and --physcpubind optins.
以前的舊命令選項 --cpubind 接收的數字表示的是 node 的個數而非 cpu 的個數，這個命令已經過時了，取而代之的是 --cpunodebind 和 --physcpubind
(分別用於設定綁定的 cpu 中節點的個數，和綁定的物理 cpu 的個數 )
Files
關於系統文件
/proc/cpuinfo for the listing of active CPUs.
位於 /proc/cpuinfo 路徑下的文件記錄的是當前活躍狀態的 CPU 的信息
/sys/devices/system/node/node*/numastat for NUMA memory hit statistics.
/sys/devices/system/node/node*/numastat 文件用於記錄的是使用 NUMA 調度策略之後，內存命中率等相關統計信息