linux にはいろんな仮想ブロックデバイスが存在して、ものによっては機能がかぶっていたりする。
今回は、ファイルをブロックデバイスとして見せる、loopback block device で遊んでみた。
まずは伝統的な loop モジュールによるもの。
arch で使うには modprobe loop
が必要だった。
あとは、losetup /dev/loop0 /tmp/disk
とかして使う。
もう一つは、lio を使う方法。
lio は iSCSI の実装として有名だけど、データのストア先が backstore
として、またデータ入出力のインターフェースとして fabric module
(または frontend
)が提供されている。
backstore
として block device を指定することもできるし、ファイルを指定することもできる。
fabric module
には通常 iSCSI を指定して iSCSI target としてエクスポートするが、tcm_loop を使うことで、 backstore を通常の SCSI デバイスとしてローカルマシンに見せることができる。
ここで backstore をファイルにして frontend を tcm_loop とすることで loopback block device として使える。
lio を使うには arch では targetcli-fb
を AUR からインストールする。
あとは targetcli を root で起動してポチポチしていけば良い。
cd /backstores/fileio
create name=disk file_or_dev=/tmp/disk
cd /loopback
create
cd ***
cd luns
create /backstores/fileio/disk
exit
これで /dev/sdb とかが生えてくる。
さて、fio で簡単に2つの実装をテストしてみた。
設定は以下。
[rw]
size=250M
readwrite=randrw
nrfiles=8
ioengine=libaio
iodepth=16
direct=1
overwrite=1
runtime=180
time_based
numjobs=4
group_reporting
バックストア先のファイルは 1.5G で tmpfs 上に乗せている。
したがって、純粋にそれぞれの実装によるオーバーヘッドだけが見えるはず。
ファイルシステムには xfs を使った。
結果は以下。
rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-2.20
Starting 4 processes
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
Jobs: 4 (f=32): [m(4)][100.0%][r=442MiB/s,w=442MiB/s][r=113k,w=113k IOPS][eta 00m:00s]
rw: (groupid=0, jobs=4): err= 0: pid=2200: Sun Jun 4 01:16:32 2017
read: IOPS=122k, BW=478MiB/s (501MB/s)(83.0GiB/180001msec)
slat (usec): min=1, max=20022, avg= 4.76, stdev=50.93
clat (usec): min=0, max=26792, avg=248.51, stdev=254.66
lat (usec): min=4, max=26795, avg=253.46, stdev=259.78
clat percentiles (usec):
| 1.00th=[ 41], 5.00th=[ 88], 10.00th=[ 151], 20.00th=[ 203],
| 30.00th=[ 241], 40.00th=[ 255], 50.00th=[ 262], 60.00th=[ 270],
| 70.00th=[ 278], 80.00th=[ 286], 90.00th=[ 294], 95.00th=[ 302],
| 99.00th=[ 322], 99.50th=[ 330], 99.90th=[ 510], 99.95th=[10048],
| 99.99th=[10176]
bw ( KiB/s): min=105520, max=206152, per=0.02%, avg=122299.23, stdev=17817.78
write: IOPS=122k, BW=478MiB/s (501MB/s)(83.0GiB/180001msec)
slat (usec): min=1, max=20050, avg= 6.49, stdev=53.88
clat (usec): min=0, max=29928, avg=260.77, stdev=274.08
lat (usec): min=10, max=29937, avg=267.45, stdev=279.42
clat percentiles (usec):
| 1.00th=[ 87], 5.00th=[ 98], 10.00th=[ 187], 20.00th=[ 213],
| 30.00th=[ 245], 40.00th=[ 258], 50.00th=[ 266], 60.00th=[ 274],
| 70.00th=[ 278], 80.00th=[ 286], 90.00th=[ 298], 95.00th=[ 310],
| 99.00th=[ 498], 99.50th=[ 596], 99.90th=[ 1480], 99.95th=[10048],
| 99.99th=[10176]
bw ( KiB/s): min=107472, max=206120, per=0.02%, avg=122354.26, stdev=17781.92
lat (usec) : 2=0.01%, 4=0.01%, 10=0.05%, 20=0.11%, 50=0.54%
lat (usec) : 100=5.80%, 250=28.75%, 500=64.21%, 750=0.41%, 1000=0.04%
lat (msec) : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.05%, 50=0.01%
cpu : usr=12.74%, sys=40.54%, ctx=18171368, majf=0, minf=43
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=22007682,22017735,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=478MiB/s (501MB/s), 478MiB/s-478MiB/s (501MB/s-501MB/s), io=83.0GiB (90.1GB), run=180001-180001msec
WRITE: bw=478MiB/s (501MB/s), 478MiB/s-478MiB/s (501MB/s-501MB/s), io=83.0GiB (90.2GB), run=180001-180001msec
Disk stats (read/write):
loop0: ios=21995434/22005502, merge=0/0, ticks=5134440/5127490, in_queue=10299120, util=100.00%
rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-2.20
Starting 4 processes
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
Jobs: 4 (f=32): [m(4)][100.0%][r=179MiB/s,w=179MiB/s][r=45.9k,w=45.8k IOPS][eta 00m:00s]
rw: (groupid=0, jobs=4): err= 0: pid=7267: Sun Jun 4 01:31:19 2017
read: IOPS=53.6k, BW=210MiB/s (220MB/s)(36.8GiB/180001msec)
slat (usec): min=1, max=19641, avg=32.46, stdev=21.93
clat (usec): min=3, max=873729, avg=561.32, stdev=1187.46
lat (usec): min=12, max=873734, avg=594.14, stdev=1187.82
clat percentiles (usec):
| 1.00th=[ 438], 5.00th=[ 454], 10.00th=[ 462], 20.00th=[ 486],
| 30.00th=[ 556], 40.00th=[ 564], 50.00th=[ 572], 60.00th=[ 580],
| 70.00th=[ 588], 80.00th=[ 596], 90.00th=[ 620], 95.00th=[ 652],
| 99.00th=[ 692], 99.50th=[ 708], 99.90th=[ 1048], 99.95th=[ 1400],
| 99.99th=[ 5088]
bw ( KiB/s): min=19544, max=114648, per=0.02%, avg=53734.31, stdev=6185.94
write: IOPS=53.6k, BW=210MiB/s (220MB/s)(36.8GiB/180001msec)
slat (usec): min=2, max=18805, avg=35.23, stdev=26.63
clat (usec): min=2, max=873700, avg=559.72, stdev=1189.48
lat (usec): min=22, max=873706, avg=595.30, stdev=1189.95
clat percentiles (usec):
| 1.00th=[ 438], 5.00th=[ 454], 10.00th=[ 462], 20.00th=[ 486],
| 30.00th=[ 548], 40.00th=[ 564], 50.00th=[ 572], 60.00th=[ 572],
| 70.00th=[ 588], 80.00th=[ 596], 90.00th=[ 620], 95.00th=[ 652],
| 99.00th=[ 692], 99.50th=[ 708], 99.90th=[ 1048], 99.95th=[ 1400],
| 99.99th=[ 4960]
bw ( KiB/s): min=20088, max=112888, per=0.02%, avg=53746.07, stdev=6192.19
lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.02%
lat (usec) : 250=0.15%, 500=22.32%, 750=77.30%, 1000=0.09%
lat (msec) : 2=0.09%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%, 1000=0.01%
cpu : usr=7.29%, sys=45.40%, ctx=19322396, majf=0, minf=52
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=9654368,9656263,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=210MiB/s (220MB/s), 210MiB/s-210MiB/s (220MB/s-220MB/s), io=36.8GiB (39.5GB), run=180001-180001msec
WRITE: bw=210MiB/s (220MB/s), 210MiB/s-210MiB/s (220MB/s-220MB/s), io=36.8GiB (39.6GB), run=180001-180001msec
Disk stats (read/write):
sdb: ios=9647282/9649318, merge=2/9, ticks=221083/231767, in_queue=428717, util=96.83%
理由は追えていないが、loop のほうが lio に倍以上 IOPS 出ていて面白い。
マシンは、Sony Corporation SVP1121A1J/VAIO, BIOS R1045V7 を使った。
kernel は 4.11.3-1-ARCH。
追記
ramdisk backstore を使ってみた。
ほぼ fileio と変わらないので、lio 自体で律速されているようだ。
rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-2.20
Starting 4 processes
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
Jobs: 4 (f=32): [m(4)][100.0%][r=191MiB/s,w=193MiB/s][r=48.9k,w=49.3k IOPS][eta 00m:00s]
rw: (groupid=0, jobs=4): err= 0: pid=627: Sun Jun 4 03:37:05 2017
read: IOPS=57.7k, BW=225MiB/s (236MB/s)(39.6GiB/180001msec)
slat (usec): min=3, max=12337, avg=30.54, stdev=19.99
clat (usec): min=3, max=12778, avg=521.58, stdev=98.27
lat (usec): min=38, max=12805, avg=552.46, stdev=101.85
clat percentiles (usec):
| 1.00th=[ 426], 5.00th=[ 430], 10.00th=[ 434], 20.00th=[ 442],
| 30.00th=[ 524], 40.00th=[ 532], 50.00th=[ 532], 60.00th=[ 532],
| 70.00th=[ 540], 80.00th=[ 556], 90.00th=[ 572], 95.00th=[ 604],
| 99.00th=[ 628], 99.50th=[ 636], 99.90th=[ 732], 99.95th=[ 1096],
| 99.99th=[ 2320]
bw ( KiB/s): min=47520, max=70648, per=0.02%, avg=57715.31, stdev=5761.57
write: IOPS=57.7k, BW=225MiB/s (236MB/s)(39.6GiB/180001msec)
slat (usec): min=3, max=10906, avg=32.38, stdev=23.06
clat (usec): min=3, max=12780, avg=520.53, stdev=101.82
lat (usec): min=39, max=12804, avg=553.25, stdev=106.00
clat percentiles (usec):
| 1.00th=[ 426], 5.00th=[ 430], 10.00th=[ 434], 20.00th=[ 442],
| 30.00th=[ 524], 40.00th=[ 532], 50.00th=[ 532], 60.00th=[ 532],
| 70.00th=[ 540], 80.00th=[ 556], 90.00th=[ 564], 95.00th=[ 604],
| 99.00th=[ 628], 99.50th=[ 636], 99.90th=[ 732], 99.95th=[ 1112],
| 99.99th=[ 3120]
bw ( KiB/s): min=47432, max=70792, per=0.02%, avg=57724.14, stdev=5769.25
lat (usec) : 4=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=23.08%
lat (usec) : 750=76.82%, 1000=0.04%
lat (msec) : 2=0.05%, 4=0.01%, 10=0.01%, 20=0.01%
cpu : usr=7.49%, sys=46.22%, ctx=20795324, majf=0, minf=52
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=10383664,10385405,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=225MiB/s (236MB/s), 225MiB/s-225MiB/s (236MB/s-236MB/s), io=39.6GiB (42.5GB), run=180001-180001msec
WRITE: bw=225MiB/s (236MB/s), 225MiB/s-225MiB/s (236MB/s-236MB/s), io=39.6GiB (42.5GB), run=180001-180001msec
Disk stats (read/write):
sdb: ios=10377752/10379508, merge=0/7, ticks=196560/194066, in_queue=371003, util=100.00%
perf top で様子を見てみた。
この感じだとロックコンテンションが性能に影響を与えているようだ。
Samples: 838K of event 'cycles', Event count (approx.): 110568441611
Overhead Shared Object Symbol
3.79% [kernel] [k] memcpy_erms
2.08% fio [.] get_io_u
1.79% [kernel] [k] __radix_tree_lookup
1.74% [kernel] [k] _raw_spin_lock
1.25% [kernel] [k] __sched_text_start
1.20% [kernel] [k] _raw_spin_lock_irqsave
1.16% [kernel] [k] kmem_cache_alloc
1.03% [kernel] [k] sbitmap_get
1.00% fio [.] io_u_queued_complete
1.00% [loop] [k] loop_queue_work
0.89% fio [.] fio_gettime
0.86% [kernel] [k] iomap_dio_bio_end_io
0.85% [kernel] [k] do_io_submit
0.84% [kernel] [k] aio_complete
0.83% [kernel] [k] __slab_free
0.76% [kernel] [k] enqueue_entity
0.76% [kernel] [k] find_get_entry
0.75% [kernel] [k] __fget
0.75% [kernel] [k] gup_pte_range
0.74% [kernel] [k] __switch_to
0.70% [kernel] [k] iomap_dio_actor
0.69% [kernel] [k] dequeue_entity
0.68% [kernel] [k] _raw_spin_unlock_irqrestore
0.67% [kernel] [k] set_next_entity
Samples: 850K of event 'cycles', Event count (approx.): 183184369456
Overhead Shared Object Symbol
18.53% [kernel] [k] native_queued_spin_lock_slowpath
2.10% [kernel] [k] _raw_spin_lock_irqsave
1.90% [kernel] [k] memcpy_erms
1.22% [kernel] [k] _raw_spin_unlock_irqrestore
0.96% [kernel] [k] __sched_text_start
0.88% [kernel] [k] read_tsc
0.88% [kernel] [k] _raw_spin_lock
0.73% [kernel] [k] __radix_tree_lookup
0.72% [kernel] [k] kmem_cache_alloc
0.68% zsh [.] execbuiltin
0.67% [kernel] [k] _raw_spin_lock_irq
0.65% [kernel] [k] enqueue_entity
0.63% [kernel] [k] __percpu_counter_add
0.63% [kernel] [k] set_next_entity
0.63% [kernel] [k] ktime_get
0.61% [kernel] [k] dequeue_entity
0.60% [kernel] [k] __switch_to
0.59% zsh [.] bin_read
0.59% zsh [.] bin_print
0.59% [kernel] [k] update_curr
0.57% [kernel] [k] _mix_pool_bytes
0.55% [kernel] [k] kmem_cache_free
0.55% [kernel] [k] cfq_completed_request
0.53% [kernel] [k] pick_next_task_fair
blk-mq を有効にしたら少し改善した。
loop はデフォルトで mq を使っていた。
cat /sys/block/loop0/queue/scheduler
[mq-deadline] none
rw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-2.20
Starting 4 processes
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
rw: Laying out IO files (8 files / total 250MiB)
Jobs: 4 (f=32): [m(4)][100.0%][r=320MiB/s,w=320MiB/s][r=81.8k,w=81.8k IOPS][eta 00m:00s]
rw: (groupid=0, jobs=4): err= 0: pid=685: Sun Jun 4 05:06:29 2017
read: IOPS=90.7k, BW=354MiB/s (372MB/s)(62.3GiB/180002msec)
slat (usec): min=2, max=10340, avg=17.83, stdev=18.18
clat (usec): min=1, max=10751, avg=331.24, stdev=82.57
lat (usec): min=13, max=10770, avg=349.35, stdev=85.77
clat percentiles (usec):
| 1.00th=[ 239], 5.00th=[ 251], 10.00th=[ 258], 20.00th=[ 278],
| 30.00th=[ 318], 40.00th=[ 330], 50.00th=[ 342], 60.00th=[ 354],
| 70.00th=[ 362], 80.00th=[ 370], 90.00th=[ 382], 95.00th=[ 390],
| 99.00th=[ 406], 99.50th=[ 414], 99.90th=[ 442], 99.95th=[ 532],
| 99.99th=[ 1544]
bw ( KiB/s): min=77056, max=118824, per=0.02%, avg=90792.63, stdev=12202.74
write: IOPS=90.8k, BW=355MiB/s (372MB/s)(62.3GiB/180002msec)
slat (usec): min=3, max=10087, avg=20.06, stdev=16.31
clat (usec): min=3, max=10749, avg=331.87, stdev=84.35
lat (usec): min=14, max=10773, avg=352.20, stdev=87.32
clat percentiles (usec):
| 1.00th=[ 241], 5.00th=[ 251], 10.00th=[ 258], 20.00th=[ 278],
| 30.00th=[ 318], 40.00th=[ 330], 50.00th=[ 342], 60.00th=[ 354],
| 70.00th=[ 362], 80.00th=[ 370], 90.00th=[ 382], 95.00th=[ 390],
| 99.00th=[ 406], 99.50th=[ 414], 99.90th=[ 442], 99.95th=[ 548],
| 99.99th=[ 1544]
bw ( KiB/s): min=76792, max=118288, per=0.02%, avg=90829.20, stdev=12165.43
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (usec) : 100=0.01%, 250=4.06%, 500=95.88%, 750=0.03%, 1000=0.01%
lat (msec) : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%
cpu : usr=12.31%, sys=45.85%, ctx=31964284, majf=0, minf=46
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=16331936,16338526,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=354MiB/s (372MB/s), 354MiB/s-354MiB/s (372MB/s-372MB/s), io=62.3GiB (66.9GB), run=180002-180002msec
WRITE: bw=355MiB/s (372MB/s), 355MiB/s-355MiB/s (372MB/s-372MB/s), io=62.3GiB (66.9GB), run=180002-180002msec
Disk stats (read/write):
sdb: ios=16331311/16337886, merge=0/8, ticks=180867/176570, in_queue=413730, util=93.08%