Home Tweaks, x86 • Ext4 journal on NVRAM card

Ext4 journal on NVRAM card

 - 

mm5425cn512mRecently I came in posession of 2 old NVRAM cards, the Micro Memory MM-5425CN/256. This is a PCI-X card consisting of 256MB ECC DRAM memory backed up by 2 rechargeable batteries of 1950mAh. As the cards are using PCI-X, the cards are already a couple of years old so placing them in my old workstation left me with only 1 fully working card + charging batteries, the 2nd card works, but the batteries are dead and the card needs to be reinitialised on every boot.

In this post I will show how you can use the NVRAM card as the journal for an ext4 filesystem that will be created on a new disk, but first some basic information about these cards:

History and usage:
This type of card was used back in the days when SSD caching and memory (especially ECC) was quite expensive. They were primarily aimed at fileservers or filesystems that were IO intensive so the journal could be placed on the card and alleviate the IO from the journal to the card.

Nowadays:
These type of cards are still being produced, but in a more modern fashion and now use a PCIe x4 connector and are available in 512MB, 1GB and 2GB.

Information and transfer speeds:
According to the specifications of this card the read/write speed should be at max 533MB/sec, but that is the theoretical bus speed for PCI-X, in my tests the read and write speeds were around 310MB/sec as seen with the hdparm test below:

dsbf # hdparm -Tt /dev/umema

/dev/umema:
Timing cached reads: 12560 MB in 2.00 seconds = 6286.35 MB/sec
Timing buffered disk reads: 256 MB in 0.87 seconds = 310.89 MB/sec

As you can see the card is identified as a umem device, in this case umema. The kernel sees the card upon boot like this:

v2.3 : Micro Memory(tm) PCI memory board block driver
umem 0000:05:01.0: Micro Memory(tm) controller found (PCI Mem Module (Battery Backup))
umem 0000:05:01.0: CSR 0xc0214000 -> 0xffffc9000314a000 (0x100)
umem 0000:05:01.0: Size 262144 KB, Battery 1 Enabled (OK), Battery 2 Enabled (OK)
umem 0000:05:01.0: Window size 16777216 bytes, IRQ 24
umem 0000:05:01.0: memory already initialized

Also the card will produce alot of IO errors if the card is not initialized, the initialize process will perform automatically once both batteries are fully charged. This can be seen with the orange status LED and will stop flickering, also in dmesg this is seen:

umem 0000:05:01.0: Battery 1 now good
umem 0000:05:01.0: Battery 2 now good

The charge time was a couple of hours and left the card connected to the motherboard so the batteries could charge over night. The discharge rate according to my voltage meter was around 0,03V per night (I left the computer off for the night). Fully charged the load was 4.18V, after one night it was 4.15V. In theory you should be able to hold the NVRAM contents in memory for a month at minimum if you let the batteries discharge. Possibly it can be held for a longer period but I don’t know what the tripping point is at where the content will be lost (in Voltages).

Using the NVRAM card:
Now that the card is fully charged and initialised we can create the journal on the card, for this I’ve partitioned the device with parted with 1 partition using all the space available and then created the journal with the following command:

dsbf # mkfs.ext4 -b 4096 -O journal_dev /dev/umema1
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
0 inodes, 65280 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
0 block group
32768 blocks per group, 32768 fragments per group
0 inodes per group
Superblock backups stored on blocks:

Zeroing journal device:

In this case I specified the blocksize manually to 4K blocks, default for this size is 1K blocks, and the filesystem that we will create later will inherit this blocksize as well, but 1K blocks for a regular disk is too low.

Now that the journal is created with 4K blocks, lets format the disk I’ve got hooked up to my system, in this case a 15K RPM SAS disk of 146GB seen as /dev/sdb. I’ve created 1 partition using all space available and created the ext4 filesystem on it:

dsbf # mkfs.ext4 -b 4096 -J device=/dev/umema1 /dev/sdb1
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
8962048 inodes, 35843429 blocks
1792171 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1094 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done
Writing inode tables: done
Adding journal to device /dev/umema1: done
Writing superblocks and filesystem accounting information: done

Mounting the disk:
Now that the filesystem is created, we can mount the disk, but in order to use the external journal we have to use a special mount flag. I’ve created a mountpoint in /media/disk/ so the mount command looks like:

mount -o journal_async_commit /dev/sdb1 /media/disk

The “journal_async_commit” is mandatory in order to use the external journal. If you forget to mount the filesystem with this flag it eventually become read-only and will need a diskcheck before you can mount it again! This problem occurs because the external journal cannot sync in synchronised mode.

I/O benchmarking:
The next step was to see if the external journal resulted in a performance boost, I’ve ran the following iozone test twice, once with the external journal setup and later again but this time with the journal on the same disk:

dsbf disk # iozone -M -o -c -e -t3 -T -I -s 4g -r 256k -i0 -i2 -i6 -i8 -i9 -i11

External journal test:

Children see throughput for 3 initial writers = 57157.45 KB/sec
Parent sees throughput for 3 initial writers = 57125.04 KB/sec
Min throughput per thread = 19034.27 KB/sec
Max throughput per thread = 19062.55 KB/sec
Avg throughput per thread = 19052.48 KB/sec
Min xfer = 4188160.00 KB

Children see throughput for 3 rewriters = 59525.32 KB/sec
Parent sees throughput for 3 rewriters = 59523.80 KB/sec
Min throughput per thread = 17972.00 KB/sec
Max throughput per thread = 20788.24 KB/sec
Avg throughput per thread = 19841.77 KB/sec
Min xfer = 3626240.00 KB

Children see throughput for 3 random readers = 63546.42 KB/sec
Parent sees throughput for 3 random readers = 63544.64 KB/sec
Min throughput per thread = 21021.94 KB/sec
Max throughput per thread = 21378.14 KB/sec
Avg throughput per thread = 21182.14 KB/sec
Min xfer = 4124672.00 KB

Children see throughput for 3 mixed workload = 61882.63 KB/sec
Parent sees throughput for 3 mixed workload = 56041.96 KB/sec
Min throughput per thread = 17321.78 KB/sec
Max throughput per thread = 22430.49 KB/sec
Avg throughput per thread = 20627.54 KB/sec
Min xfer = 3239168.00 KB

Children see throughput for 3 random writers = 59007.07 KB/sec
Parent sees throughput for 3 random writers = 58830.27 KB/sec
Min throughput per thread = 19625.10 KB/sec
Max throughput per thread = 19724.30 KB/sec
Avg throughput per thread = 19669.02 KB/sec
Min xfer = 4173312.00 KB

Children see throughput for 3 pwrite writers = 59760.40 KB/sec
Parent sees throughput for 3 pwrite writers = 59586.57 KB/sec
Min throughput per thread = 19881.11 KB/sec
Max throughput per thread = 19983.16 KB/sec
Avg throughput per thread = 19920.13 KB/sec
Min xfer = 4173056.00 KB

Children see throughput for 3 fwriters = 183240.45 KB/sec
Parent sees throughput for 3 fwriters = 174999.93 KB/sec
Min throughput per thread = 58333.40 KB/sec
Max throughput per thread = 63064.84 KB/sec
Avg throughput per thread = 61080.15 KB/sec
Min xfer = 4194304.00 KB

Internal journal test:
This is the test with the journal placed on the same disk as the filesystem resides on (as with usual setups).

Children see throughput for 3 initial writers = 20755.95 KB/sec
Parent sees throughput for 3 initial writers = 20708.86 KB/sec
Min throughput per thread = 6896.20 KB/sec
Max throughput per thread = 6930.13 KB/sec
Avg throughput per thread = 6918.65 KB/sec
Min xfer = 4173824.00 KB

Children see throughput for 3 rewriters = 31132.26 KB/sec
Parent sees throughput for 3 rewriters = 31131.43 KB/sec
Min throughput per thread = 10376.50 KB/sec
Max throughput per thread = 10378.40 KB/sec
Avg throughput per thread = 10377.42 KB/sec
Min xfer = 4193536.00 KB

Children see throughput for 3 random readers = 63207.00 KB/sec
Parent sees throughput for 3 random readers = 63205.37 KB/sec
Min throughput per thread = 21055.82 KB/sec
Max throughput per thread = 21080.58 KB/sec
Avg throughput per thread = 21069.00 KB/sec
Min xfer = 4189440.00 KB

Children see throughput for 3 mixed workload = 55335.17 KB/sec
Parent sees throughput for 3 mixed workload = 20447.98 KB/sec
Min throughput per thread = 1611.65 KB/sec
Max throughput per thread = 27112.88 KB/sec
Avg throughput per thread = 18445.06 KB/sec
Min xfer = 249344.00 KB

Children see throughput for 3 random writers = 30185.71 KB/sec
Parent sees throughput for 3 random writers = 30157.88 KB/sec
Min throughput per thread = 10053.92 KB/sec
Max throughput per thread = 10066.82 KB/sec
Avg throughput per thread = 10061.90 KB/sec
Min xfer = 4188928.00 KB

Children see throughput for 3 pwrite writers = 23411.11 KB/sec
Parent sees throughput for 3 pwrite writers = 23372.91 KB/sec
Min throughput per thread = 7793.33 KB/sec
Max throughput per thread = 7813.48 KB/sec
Avg throughput per thread = 7803.70 KB/sec
Min xfer = 4183552.00 KB

Children see throughput for 3 fwriters = 181371.43 KB/sec
Parent sees throughput for 3 fwriters = 172422.00 KB/sec
Min throughput per thread = 57474.07 KB/sec
Max throughput per thread = 66419.88 KB/sec
Avg throughput per thread = 60457.14 KB/sec
Min xfer = 4194304.00 KB

Conclusion:
If you compare the 2 tests with each other it’s clear that write operations are in general 2 to 3 times faster in combination with the NVRAM card and especially the random workload test shows the positive effect the NVRAM card has on the filesystem performance. Reading in most cases seems to be almost on-par with each other and does not cost as much I/O as a write operation, but this was expected. The data already is available on disk and in the journal as with a write operation it is not.

Journal on other devices?
In theory you can create a journal on every device that is known as a block device within Linux. So you could use 2 mechanical disks, 1 for the filesystem itself and the other one for the journal. Also having a small partition on a SSD as journal for your mechanical disk should work as well. The only exception with SSD’s against NVRAM is that SSD’s have limited write cycles and has a shorter lifespan than a NVRAM card. I will test the combination of SSD with a regular harddisk soon as well as migrating from internal journal on an existing installation to external journal on either the NVRAM card or SSD storage.

Other usages?
Yes, there are other scenarios as well where this type of card can be used, mostly related to caching. There is ZFS which can use the card as a ZIL layer to alleviate I/O through the card and NFS is capable of using the card to speed up write operations to the NFS server it’s used with. I don’t have any tests performed with NFS or ZFS so that will remain untested for now, but I will experiment further with the card to see how it can be used with NFS and a post will be covering this up once figured out!

In Tweaks, x86

Author:Jeffrey Langerak

Leave a Reply

Your email address will not be published. Required fields are marked*

*

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.