The Linux Kernel Series: Every Article

Configuring 16

In this article, we will discuss the input/output ports.

First, the "i8042 PC Keyboard controller" driver is needed for PS/2 mice and AT keyboards. Before USB, mice and keyboards used PS/2 ports which are circular ports. The AT keyboard is an 84-key IBM keyboard that uses the AT port. The AT port has five pins while the PS/2 port has six pins.

Input devices that use the COM port (sometime called RS232 serial port) will need this diver (Serial port line discipline). The COM port is a serial port meaning that one bit at a time is transferred.

The TravelMate notebooks need this special driver to use a mouse attached to the QuickPort (ct82c710 Aux port controller).

Parallel port adapters for PS/2 mice, AT keyboards, and XT keyboards use this driver (Parallel port keyboard adapter).

The "PS/2 driver library" is for PS/2 mice and AT keyboards.

"Raw access to serio ports" can be enabled to allow device files to be used as character devices.

Next, there is a driver for the "Altera UP PS/2 controller".

The PS/2 multiplexer also needs a driver (TQC PS/2 multiplexer).

The ARC FPGA platform needs special driver for PS/2 controllers (ARC PS/2 support).

NOTE: I want to make it clear that the PS/2 controllers that are discussed in this article are not Sony's game controllers for their PlayStation. This article is discussing the 6-pin mouse/keyboard ports. The controller is the card that holds the PS/2 ports.

The "Gameport support" driver offers support for the 15-pin gameport. Gameport was the 15-pin port used by many input gaming devices until the invention of the USB port.

The next driver is for gameports on ISA and PnP bus cards (Classic ISA and PnP gameport support). ISA stands for Industry Standard Architecture and was a parallel bus standard before PCI. PnP stands for Plug-and-Play and was a common standard before ISA.

"PDPI Lightning 4 gamecard support" provides a driver for a proprietary gamecard with gameports.

The SoundBlaster Audigy card is a proprietary gameport card (SB Live and Audigy gameport support).

The ForteMedia FM801 PCI audio controller has a gameport on the card (ForteMedia FM801 gameport support). This driver only supports the gameport.

Next, we can move on to "Character devices". Character devices transfer data character by character.

First, TTY can be enabled or disabled (Enable TTY). Removing TTY will save a lot of space, but TTY is needed for terminals and such. Unless you know what you are doing, do not disabled TTY.

NOTE TO MY FANS: If you know of a reason for disabling TTY, could you post the answer below and share with us. Mahalo!

Next, support for "Virtual terminals" can be enabled/disabled. Again, a lot of space can be saved, but virtual terminals are very important.

This next driver supports font mapping and Unicode translation (Enable character translations in console). This can be used to convert ASCII to Unicode.

Virtual terminals can be used as system consoles with this driver (Support for console on virtual terminal). A system console manages the logins and kernel messages/warnings.

Virtual terminals must channel through a console driver to interact with the physical terminal (Support for binding and unbinding console drivers). Before the virtual terminal can do so, the console driver must be loaded. When the virtual terminal is closed, the console terminal must be unloaded.

The next driver provides support for Unix98 PTY (Unix98 PTY support). This is Unix98 pseudo terminal.

FUN FACT: The Linux kernel allows a filesystem to be mount many times in many places at once.

Next, "Support multiple instances of devpts" can be supported. The devpts filesystem is for pseudo-terminal slaves.

Legacy support for PTY can also be enabled (Legacy (BSD) PTY support).

The max amount of legacy PTYs in use can be set (Maximum number of legacy PTY in use).

The next driver can be used to offer support to serial boards that the other drivers fail to support (Non-standard serial port support).

Next, there are some drivers for specific boards and cards.

The GSM MUX protocol is supported with this driver (GSM MUX line discipline support (EXPERIMENTAL)).

The next driver enables the kmem device file (/dev/kmem virtual device support). kmem is usually used for kernel debugging. kmem can be used to read certain kernel variables and states.

The Stallion cards have many serial ports on them (Stallion multiport serial support). This driver specifically supports this card.

Next, we can move on to drivers for serial devices. As stated before, serial devices transfer one bit at a time.

The first driver is for standard serial port support (8250/16550 and compatible serial support).

Plug-and-Play also exists for serial ports with this driver (8250/16550 PNP device support).

The following driver allows the serial ports to be used for connecting a terminal to be used as a console (Console on 8250/16550 and compatible serial port).

Some UART controllers support Direct Memory Access (DMA support for 16550 compatible UART controllers). UART stands for Universal Asynchronous Receiver/Transmitter. UART controllers convert serial to parallel and vice versa.

Next, this driver offers support for standard PCI serial devices (8250/16550 PCI device support).

16-bit PCMCIA serial devices are supported by this driver (8250/16550 PCMCIA device support). Remember, PCMCIA is a PC-card that is usually used in laptops.

The maximum number of supported serial ports can be set (Maximum number of 8250/16550 serial ports) and then the maximum that are registered during boot-up (Number of 8250/16550 serial ports to register at runtime).

For extended serial abilities like HUB6 support, enable this driver (Extended 8250/16550 serial driver options).

A special driver is needed to support more than four legacy serial ports (Support more than 4 legacy serial ports).

Serial interrupts can be shared when this driver is used (Support for sharing serial interrupts).

Serial port IRQs can be autodetected using this driver (Autodetect IRQ on standard ports).

RSA serial ports are also supported by the Linux kernel (Support RSA serial ports). RSA stands for Remote Supervisor Adapter. RSA is an IBM-specific hardware.

Next, there are various vendor/device specific divers.

This is a TTY driver that uses printk to output user messages (TTY driver to output user messages via printk). Printk (print kernel) is a special piece of software that usually prints the boot-up messages. Any string that is displayed by printk is usually put in the /var/log/messages file. The shell command "dmesg" displays all strings that were used by printk.

Next, we can enable/disable support for parallel printers (Parallel printer support).

The next driver allows a printer to be used as a console (Support for console on line printer). This means kernel messages will be literally printed at the printer. Normally when the word "print" was used in this article series, it meant putting data on the screen. This time, this literally means putting the data on paper.

The following driver makes the device files at /dev/parport/ (Support for user-space parallel port device drivers). This allows some processes to access.
 


Configuring 17

This article will discuss various drivers.

First, the "virtio console" is a virtual console that is used with hypervisors.

The "IPMI top-level message handler" is a message manager for the IPMI system. IPMI stands for Intelligent Platform Management Interface. IPMI is an interface for managing the system via network without using a shell.

"/dev/nvram support" permits the system to read and write memory in the real time clock's memory. Generally, this feature is used for saving data during a power loss.

The next driver supports the Siemens R3964 packet protocol (Siemens R3964 line discipline). This is a device-to-device protocol.

Now, we can move on to PCMCIA character devices. However, most of the drivers here are vendor/device specific.

The RAW driver allows block devices to be bound to the device files /dev/raw/rawN (RAW driver (/dev/raw/rawN)). The advantage to this is efficient zero-copy. However, most software will still prefer to access the storage through /dev/sd** or /dev/hd**.

Next, the maximum number of RAW devices can be supported is set.

The following driver makes the device file /dev/hpet (HPET - High Precision Event Timer).

NOTE: Many of you may be wondering why enabling these device file matter. Well, these device files serve as an interface between the software and hardware.

The HPET timers can be mapped with this driver (Allow mmap of HPET). Mapping is the process of making a list of address in memory of devices and files. The files can then be found faster by getting the address from the memory and then commanding the hard-drive to get the data from the address.

The "Hangcheck timer" is used to detect whether of not the system has locked-up.
This timer watches for locked-up processes. As soon as a process freezes, a timer starts. After the timer goes off, if the process has not restarted or closed, then the timer will force the process to close.

Linus Torvalds Quote: Portability is for people who cannot write new programs.

The TPM security chip that uses Trusted Computing Group's specification will need this driver (TPM Hardware Support).

Now, we can move on to I2C devices. I2C stands for Inter-Integrated Circuit and is spoken as "eye two see". However, some people say "eye squared see". I2C is a serial bus standard.

Some old software used I2C adapters as class devices, but software now does not do that (Enable compatibility bits for old user-space). So, this driver will offer backwards compatibility for older software.

Next, the I2C device files can be made (I2C device interface).

I2C can support multiplexing with this driver (I2C bus multiplexing support).

I2C can support GPIO-controlled multiplexing with this driver (GPIO-based I2C multiplexer).

Various tests can be performed on I2C and SMBus with this driver for developers (I2C/SMBus Test Stub).

The I2C system will produce debugging messages with this feature enabled (I2C Core debugging messages).

The next driver produces additional I2C debugging messages (I2C Algorithm debugging messages).

Linus Torvalds Quote: The main reason there are no raw devices [in Linux] is that I personally think that raw devices are a stupid idea.

The following driver will cause the I2C drivers to produce debugging messages (I2C Bus debugging messages).

Next, we have Serial Peripheral Interface support (SPI support). SPI is a synchronous serial protocol used on SPI buses.

After that, there is a driver for High speed synchronous Serial Interface support (HSI support). HSI is a synchronous serial protocol.

PPS can also be supported by the Linux kernel (PPS support).

The "IP-over-InfiniBand" driver allows IP packets to be transported over InfiniBand.

After that, there is a debugging driver for IP-over-InfiniBand (IP-over-InfiniBand debugging).

SCSI's RDMA protocol can also travel over InfiniBand (InfiniBand SCSI RDMA Protocol).

There is also an extension for the iSCSI protocol to transmit over InfiniBand (iSCSI Extensions for RDMA (iSER)).

Sometimes, errors occur in the core system that the whole system must know (EDAC (Error Detection And Correction) reporting). This driver sends the core errors to the system. Generally, such low-level errors are reported in the processor and then seen by this driver to let other system processes know about or handle the error.

This driver provides legacy support for EDAC to use older versions of sysfs (EDAC legacy sysfs).

EDAC can be set to send debugging information to the logging system of Linux (Debugging).

Linus Torvalds Quote: Nobody actually creates perfect code the first time around, except me.

The Machine Check Exceptions (MCEs) are converted to a readable form via this driver (Decode MCEs in human-readable form (only on AMD for now)).
MCEs are hardware errors detected by the CPU. MCEs usually trigger kernel panics.

The decoding process for MCE to a readable form can be injected to test error handling (Simple MCE injection interface over /sysfs).

The next driver allows errors to be detected in memory and then corrected (Main Memory EDAC (Error Detection And Correction) reporting).

Next, there are many drivers that detect and correct errors on specific hardware sets.

Linus Torvalds Quote: Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

Now, we can move on to the "Real Time Clock". This is commonly abbreviated "RTC". The RTC keeps track of time.

rtc-png.606


The next setting allows us to make the Linux system use the time from the RTC as the time on the "wall clock" (Set system time from RTC on startup and resume). The wall clock is the clock on the desktop or the time seen using the "date" command.

Alternately, the wall clock can get the time from an NTP server and then sync with the RTC (Set the RTC time based on NTP synchronization).

Some systems have more than one RTC, so the user must set which is the default (RTC used to set the system time).
It is best to make the first one (/dev/rtc0) the primary clock.

Debugging abilities can be set for the RTC system (RTC debug support).

The RTC can use various interfaces for giving the operating system the current time. Using sysfs will require this driver (/sys/class/rtc/rtcN (sysfs)) while using proc will require this driver (/proc/driver/rtc (procfs for rtcN)). Special RTC character devices can be made and used (/dev/rtcN (character devices)). The shell command "hwclock" uses /dev/rtc, so the RTC character devices are needed.

The next driver allows interrupts of the RTC to be emulated on the /dev/ interface (RTC UIE emulation on dev interface). This driver reads the clock time and allows the new time to be retrieved from /dev/.

The RTC system can be tested with the test driver (Test driver/device).

Next, we will discuss the Direct Memory Access system.
DMA is the process of hardware accessing the memory independently of the processor. DMA increases system performance because the processor will have less to do if the hardware is performing more tasks for itself. Otherwise, the hardware would be waiting for the processor to complete the task.

The debugging engine is for debugging the DMA system (DMA Engine debugging).

Next, there are many vendor/device specific drivers for DMA support.

Some DMA controllers support big endian reading and writing with this driver (Use big endian I/O register access).

Big endian refers to the arrangement of the binary code. The number system used in English speaking countries places the largest end of the number on the left. For example, in the number 17, the most left numbers place is the tens place which is larger than the ones place. In big endian, each byte is arranged with the largest portion on the left. A byte is eight bits. Example: 10110100. Each place has a value of 128, 64, 32, 16, 8, 4, 2, and 1 respectively. So the byte mentioned converts to the decimal number 180.

The DMA system can use the network to reduce CPU usage (Network: TCP receive copy offload).

The "DMA Test Client" is used for testing the DMA system.

REFERENCE: The quotes from Linus Torvalds came from this site: http://en.wikiquote.org/wiki/Linus_Torvalds
 
Last edited:
Configuring 18

In this article, we will discuss the auxiliary-screen. The auxiliary displays are small LCD screens; most are equal to or less than 128x64. Then, we will discuss Userspace IO drivers, some virtual drivers, Hyper-V, staging drivers, IOMMU, and other kernel features.

The first driver to configure for the auxiliary display is the "KS0108 LCD Controller" driver. The KS0108 LCD Controller is a graphics controller made by Samsung.

Next, the parallel port address for the LCD can be set (Parallel port where the LCD is connected). The first port address is 0x378, the next is 0x278 and the third is 0x3BC. These are not the only choices of addresses. The majority of people will not need to change this. The shell command "cat /proc/ioports" will list the available parallel ports and the addresses.

The kernel developer can set the writing delay of the KS0108 LCD Controller to the parallel port (Delay between each control writing (microseconds)). The default value is almost always correct, so this typically does not need to be changed.

The "CFAG12864B LCD" screen is a 128x64, two-color LCD screen. This screen relies on the KS0108 LCD Controller.

The refresh rate of these LCD screens can be changed (Refresh rate (hertz)). Generally, a higher refresh rate causes more CPU activity. This means slower systems will need a smaller refresh rate.

After the auxiliary displays are configured, the "Userspace I/O drivers" are then set. The userspace system allows the user's applications and processes to access kernel interrupts and memory addresses. With this enabled, some drivers will be placed in the userspace.

The "generic Hilscher CIF Card driver" is a userspace driver for Profibus cards and Hilscher CIF Cards.

The "Userspace I/O platform driver" creates a general system for drivers to be in the userspace.

The next driver is the same as above, but adds IRQ handling (Userspace I/O platform driver with generic IRQ handling).

The following driver is again like the one before, but with dynamic memory abilities added (Userspace platform driver with generic irq and dynamic memory).

Next, some vendor/device specific drivers are available.

Then, there is a generic PCI/PCIe card driver (Generic driver for PCI 2.3 and PCI Express cards).

The following driver is for "VFIO support for PCI devices". VFIO stands for Virtual Function Input/Output. VFIO allows devices to directly access userspace in a secure fashion.

The "VFIO PCI support for VGA devices" allows VGA to be supported by PCI through VFIO.

Next, are virtio drivers. Virtio is a IO virtualization platform. This virtual software is for operating system virtualization. This is required for running an operating system in a virtual machine on the Linux system.

The first virtio driver we can configure is the "PCI driver for virtio devices". This allows virtual access to PCI.

The "Virtio balloon driver" allows the memory owned by a virtual system to be expanded or decreased as needed. Generally, no one wants a virtual system to reserve memory it may never use when the host operating system needs the memory.

The following driver supports memory mapped virtio devices (Platform bus driver for memory mapped virtio devices).

If the Linux kernel being configured is intended to run on a Microsoft Hyper-V system, then enable this driver (Microsoft Hyper-V client drivers). This would allow Linux to be the guest/client system on Hyper-V.

kernel_18-png.617


Next, we have the staging drivers. These are drivers that are under development, may change soon, or are not up to the standard quality for the Linux kernel. The only group of drivers in this category (in this kernel version 3.9.4) are the Android drivers. Yes, Android uses the Linux kernel which would make Android a Linux system. However, this is still debated. If the kernel is intended for Android, then it may be wise to enable all of the drivers.

The "Android Binder IPC Driver" provides support for Binder which is a system that allows processes to communicate with each other on Android systems.

The ashmem driver can be enabled next (Enable the Anonymous Shared Memory Subsystem). Ashmem stands for "Anonymous SHared MEMory" or "Android SHared MEMory". This supports a file-based memory system for userspace.

The "Android log driver" offers the complete Android logging system.

The "Timed output class driver" and "Android timed gpio driver" allow the Android system to manipulate GPIO pins and undo the manipulations after the timeout.

The "Android Low Memory Killer" closes processes when more memory is needed. This feature kills the tasks that are not used or inactive.

The "Android alarm driver" makes the kernel wakeup at set intervals.

After the staging drivers are configured, the next set of drivers are for the X86 platform. These drivers are vendor/device specific for X86 (32-bit) hardware.

The next driver is for "Mailbox Hardware Support". This framework controls mailbox queues and interrupt signals for hardware mailbox systems.

"IOMMU Hardware Support" links the memory to devices that are able to use DMA. IOMMU enhances DMA. The IOMMU maps addresses and blocks faulty devices from accessing the memory. IOMMU also allows hardware to access more memory than it could without IOMMU.

The "AMD IOMMU support" driver offers better IOMMU support for AMD devices.

Debugging abilities exist for the AMD IOMMU support (Export AMD IOMMU statistics to debugfs).

A newer version of the IOMMU driver exists for AMD hardware (AMD IOMMU Version 2 driver).

The Linux kernel also provides an IOMMU driver specifically for Intel devices (Support for Intel IOMMU using DMA Remapping Devices).

Some devices may be able to accept a variety of voltages and clock frequencies. This driver allows the operating system to control the device's voltage output and clock rate (Generic Dynamic Voltage and Frequency Scaling (DVFS) support). With this driver enabled, other kernel features can be enabled for power/performance management as seen below.

"Simple Ondemand" is like above, but specifically changes the clock rate based on the device's activity. Generally, more activity means the device needs a faster clock speed to accommodate for the larger resource demand.

"Performance" allows the system to set the clock speed to the maximum supported amount for best performance. This increases power consumption.

"Powersave" sets the clock rate to the lowest value to save power.

"Userspace" allows the userspace to set the clock speed.

"External Connector Class (extcon) support" provides the userspace with a way to watch external connectors like USB and AC ports. This allows applications to know if a cable was plugged into a port. Users will almost always want this enabled. If anyone has purposely disabled this for a legitimate reason, please share with us why that would be needed.

The "GPIO extcon support" driver is just like the above driver, but is made specifically for GPIO pins.

Next, there is a list of various vendor/device specific controllers for memory (Memory Controller drivers). Memory chip controllers may be separate devices or built inside the memory chips. These controllers manage the incoming and outgoing data flow.

The "Industrial I/O support" driver provides a standard interface for sensors despite the bus type they are on (that is, PCIe, spi, GPIO, etc.). IIO is a common abbreviation for Industrial Input/Output.

The Linux kernel offers support for a large variety of accelerometers, amplifiers, analog to digital converters, inertial measurement units, light sensors, magnetometer sensors, and many other sensors and converters.

The "Intel Non-Transparent Bridge support" driver supports PCIe hardware bridges which connect to systems. All writes to mapped memory will be mirrored on both systems.

"VME bridge support" is the same as above except the bridge uses VME which is a different bus standard.

"Pulse-Width Modulation (PWM) Support" controls the back-light and fan speed by regulating the average power received by such devices.

"IndustryPack bus support" offers drivers for the IndustryPack bus standards.
 
Last edited:
Configuring 19

In this article, we will discuss firmware drivers and then the filesystem drivers.

The first driver in this category is for finding the boot-disk (BIOS Enhanced Disk Drive calls determine boot disk). Sometimes, Linux does not know which drive is the bootable drive. This driver allows the kernel to ask the BIOS. Linux then stores the information on sysfs. Linux needs to know this for setting up bootloaders.

Even if BIOS EDD services are compiled in the kernel, this option can set such services to be inactive by default (Sets default behavior for EDD detection to off ). EDD stands for Enhanced Disk Drive.

When using kexec to load a different kernel, performance can be increased by having the firmware provide a memory map (Add firmware-provided memory map to sysfs).

The "Dell Systems Management Base Driver" gives the Linux kernel better control of the Dell hardware via the sysfs interface.

The hardware's information can be accessed by the software via /sys/class/dmi/id/ with this driver enabled (Export DMI identification via sysfs to userspace). DMI stands for Desktop Management Interface. The DMI manages the components of the hardware and can access the hardware's data. The structure of the data in the BIOS and hardware is regulated by the System Management BIOS (SMBIOS) specification.

The raw data tables from the DMI can be accessed with this driver (DMI table support in sysfs).

To boot from an iSCSI driver, enable this driver (iSCSI Boot Firmware Table Attributes).

The last firmware driver is a set of "Google Firmware Drivers". These are drivers for Google-specific hardware. Do not enable this driver unless you work for Google and need to use Linux on such hardware or if you are making a Linux kernel for a computer you stole from Google.

Next, we can configure the file system support of the kernel.

The "Second extended fs support" driver provides the EXT2 filesystem. http://www.linux.org/threads/ext-file-system.4365/

kernel_19-png.627


The "Ext2 extended attributes" offers the ability to use extra metadata not natively supported by the filesystem.

The "Ext2 POSIX Access Control Lists" driver adds additional permission schemes not native to EXT2.

The "Ext2 Security Labels" enhances the security provided by SElinux.

Enabling "Ext2 execute in place support" allows executables to be executed in the current location without being executed using the paged cache.

The EXT3 filesystem is offered by this driver (Ext3 journaling file system support). http://www.linux.org/threads/ext-file-system.4365/

The "Default to 'data=ordered' in ext3" driver sets the data ordering mode to "Ordered". This deals with the way the journaling and writing work. Data ordering is explained in this article - http://www.linux.org/threads/journal-file-system.4136/

The "Ext3 extended attributes" offers the ability to use extra metadata not natively supported by the filesystem. Again, the following EXT3 drivers/features are the same as for EXT2 - "Ext3 POSIX Access Control Lists" and "Ext3 Security Labels". Also, the same is true for the following EXT4 drivers/features - "Ext4 POSIX Access Control Lists", "Ext4 Security Labels", and "EXT4 debugging support".

Journal Block Device debugging is supported by EXT3 (JBD debugging support) and EXT4 (JBD2 debugging support).

The next driver offers the Reiser filesystem (Reiserfs support). http://www.linux.org/threads/reiser-file-system-reiser3-and-reiser4.4403/

Debugging exists for the Reiser filesystem (Enable reiserfs debug mode).

The kernel can store ReiserFS statistics in /proc/fs/reiserfs (Stats in /proc/fs/reiserfs).

The following Reiser drivers/features are the same as the ones for EXT2/3/4 - "ReiserFS extended attributes", "ReiserFS POSIX Access Control Lists", and "ReiserFS Security Labels".

JFS is also supported by the Linux kernel and includes various features - "JFS filesystem support", "JFS POSIX Access Control Lists", "JFS Security Labels", "JFS debugging", and "JFS statistics". http://www.linux.org/threads/journaled-file-system-jfs.4404/

Again, XFS is supported with drivers/features that can be enabled - "XFS filesystem support", "XFS Quota support", "XFS POSIX ACL support", "XFS Realtime subvolume support", and "XFS Debugging support". http://www.linux.org/threads/xfs-file-system.4364/

The Global FileSystem 2 is supported by the Linux kernel (GFS2 file system support). This filesystem is used to share storage in a cluster.

The "GFS2 DLM locking" driver offers a distributed lock manager (DLM) for GFS2.

The Oracle Cluster FileSystem 2 is supported by the Linux kernel (OCFS2 file system support). This filesystem is used to share storage in a cluster.

The "O2CB Kernelspace Clustering" driver offers various services for the OCFS2 filesystem.

The "OCFS2 Userspace Clustering" driver allows the cluster stack to execute in userspace.

The "OCFS2 statistics" driver allows the user to get statistics concerning the filesystem.

Like with most of the Linux kernel, the OCFS2 offers logging (OCFS2 logging support). This may be used to watch for errors or for debugging purposes.

The "OCFS2 expensive checks" driver offers storage consistency checks at the cost of performance. Some Linux users recommend only enabling this feature for debugging purposes.

The kernel also contains the new B-Tree FileSystem; this driver offers the disk formatter (Btrfs filesystem Unstable disk format). BTRFS is still in development and is planned to one day become as popular or more popular than EXT4. http://www.linux.org/threads/b-tree-file-system-btrfs.4430/

The "Btrfs POSIX Access Control Lists" driver adds additional permission schemes not native to BTRFS.

Next, there is a BTRFS check tool (Btrfs with integrity check tool compiled in (DANGEROUS)). Since, BTRFS is a newly developing filesystem, most of the software associated with it are unstable.

The NIL-FileSystem is also supported by Linux (NILFS2 file system support). http://www.linux.org/threads/new-implementation-of-a-log-structured-file-system-nilfs.4547/

To support the flock() system call used by some filesystems, enable this driver (Enable POSIX file locking API). Disabling this driver will reduce the kernel size by about eleven kilobytes. The driver provides file-locking. File-locking is the process of allowing one process to read a file at a time. This is commonly used with network filesystems like NFS.

The "Dnotify support" driver is a legacy filesystem notification system that informs the userspace of events on the file system. One use of this and the successor notifications software is to monitor the filesystem for applications. Certain applications tell this daemon what events to watch. Otherwise, each userspace application would need to complete this task themselves.

Remember, Dnotify is a legacy system, so what is the new notification system? It is Inotify which is provided by this driver (Inotify support for userspace).

An alternative notification system is fanotify (Filesystem wide access notification). Fanotify is the same as Inotify, but fanotify relays more information to the userspace than Inotify.

Fanotify can check permissions with this driver enabled (fanotify permissions checking).

For systems that need to divide the storage space by user will want "Quota support". http://www.linux.org/threads/atomic-disk-quotas.4277/

The following driver allows disk quota warnings and messages to be reported through netlink (Report quota messages through netlink interface). Netlink is a socket interface on the userspace that communicates with the kernel.

Quota messages can also be sent to a console (Print quota warnings to console (OBSOLETE)).

This driver allows the quota system to perform extra sanity checks (Additional quota sanity checks). In computer technology, a sanity check is the process of checking for errors that may be due to poor programming. The files and output are inspected to ensure the data is what it should be and not structured in some odd fashion.

Some old system use the old quota system but want to retain the old quota system when upgrading to a newer kernel. This is easily solved by enabling this driver (Old quota format support). Many readers may be wondering why someone would want to keep the old quota system instead of upgrading. Well, imagine being the manager of the IT department of a very large corporation that has many servers running very important tasks. Would you want to create and configure a new (and possibly large) quota system when you can continue using the one that works well? Generally, with computers, follow the principle - If it is not broken or will not cause security issues, do not fix it.

The newer quota system supports 32-bit UIDs and GIDs with this driver (Quota format vfsv0 and vfsv1 support).

To automatically mount remote storage units, enable this driver (Kernel automounter version 4 support).

FUSE filesystems are supported by this driver (FUSE (Filesystem in Userspace) support). Filesystem in Userspace (FUSE) allows any user to create their own filesystem and utilize it in userspace.

A special extension for FUSE can be used to utilize character devices in userspace (Character device in Userspace support).
 
Last edited:
Configuring 20

In this article, we will continue configuring filesystem support.

First, we can enable "General filesystem local caching manager" which allows the kernel to store filesystem cache. This can enhance performance at the cost of storage space.

The caching system can be monitored with statistical information used for debugging purposes (Gather statistical information on local caching). Generally, this feature should only be enabled if you plan to debug the caching system.

kernel_20-png.664


This next feature is a lot like the above, but this feature stores latency information (Gather latency information on local caching). Again, this is a debugging feature.

The "Debug FS-Cache" driver offers many other debugging abilities for the cache system.

The next cache debugging tool keeps a global list (any process can access the list) of filesystem cache objects (Maintain global object list for debugging purposes).

To enhance the speed of network filesystems, enable this next driver (Filesystem caching on files). This feature allows a whole local filesystem to be used as cache for remote filesystem and storage units. The Linux kernel will manage this partition.

Two different debugging drivers exist for this local cache system for remote filesystems (Debug CacheFiles) and (Gather latency information on CacheFiles).

The most common optical disc filesystem is ISO-9660 which is ISO standard 9660, hence the name (ISO 9660 CDROM file system support). This driver is needed to read/write the major of optical discs.

When reading an optical disc with files using long Unicode filenames or writing such files, this driver is required (Microsoft Joliet CDROM extensions). This is an extension to the ISO-9660 filesystem.

The "Transparent decompression extension" allows data to be written to a disc in a compressed form and read off the disc and decompressed transparently. This will allow more data to be placed on the disc.

"UDF file system support" allows the kernel to read/write rewritable-optical-discs that are using the UDF filesystem. UDF is designed to manage incremental writes. UDF allows the rewritable optical disc to be used more like flash drives. The system can write and update the optical disc's data more quickly than regular writing on ISO-9660 filesystems. However, this is not faster than using flash drives.

As many of you know, Windows is a very popular system, so many storage units are using the FAT filesystem of NTFS. Thankfully, Linux supports such filesystems. The "MSDOS fs support" driver is a general driver for MS-DOS filesystems. This will increase the kernel size significantly, but since the FAT filesystems are very common, this size increase is usually worth the cost. http://www.linux.org/threads/file-allocation-table-fat.4472/

To support the FAT filesystems, enable this driver (VFAT (Windows-95) fs support). At the time this article was written, this driver does not support FAT64 (commonly called exFAT).

kernel_20_2-png.665


The size of the codepage can be set here (Default codepage for FAT).

After that, the default character set is configured for the FAT filesystems (Default iocharset for FAT).

The NTFS file system is offered with this driver (NTFS file system support). The driver provides read-only abilities. To write to NTFS, enable this driver (NTFS write support).

The Linux kernel offers debugging tools for the NTFS filesystem (NTFS debugging support).

To have a proc folder in the root, this feature must be enabled (/proc file system support). Some other similar drivers that rely on this one include (/proc/kcore support), (/proc/vmcore support), and (Sysctl support (/proc/sys)). The proc system (short for “process”) uses the proc-filesystem sometimes called procfs. This filesystem is in the hardware's memory and is created when Linux boots up. So, when viewing files in proc, the user is browsing the memory as if it were like other storage units. Proc acts as an interface between userspace and the kernelspace. Proc is in the kernelspace.

The "Enable /proc page monitoring" driver offers some proc files that monitor the memory utilization of processes.

The "sysfs file system support" driver creates the /sys/ folder. The sysfs filesystem is in memory and provides an interface to the kernel objects.

The tmp directory is needed by many applications and Linux itself, so it is strongly recommended that this driver be enabled (Tmpfs virtual memory file system support (former shm fs)). The tmp filesystem maybe stored on the hard-drive or in memory and is used only to store temporary files.

The "Tmpfs POSIX Access Control Lists" driver offers extra permission features for the files in the tmpfs virtual filesystem.

The "Tmpfs extended attributes" driver provides more attributes to tmpfs files than what they would normally have without he driver.

The "HugeTLB file system support" driver provides the hugetlbfs filesystem, which is ramfs based. This virtual filesystem contains HugeTLB pages.

The configfs filesystem is a kernel object manager in the form of a filesystem (Userspace-driven configuration filesystem). It is highly recommended that this driver be enabled. ConfigFS is a lot like sysfs. However, ConfigFS is used to create and delete kernel object while sysfs is used to view and modify kernel objects.

Next, we can move back to "real" filesystems. That is, the filesystems users themselves use to store their personal files. Next, the kernel can be given the ability to read ADFS filesystems (ADFS file system support).

The ability to write to ADFS filesystems is provided by a separate and unstable driver (ADFS write support (DANGEROUS)). ADFS stands for Advanced Disc Filing System.

Linux also supports the Amiga Fast FileSystem (Amiga FFS file system support). http://www.linux.org/threads/amiga-fast-file-system-affs.4429/

The "eCrypt filesystem layer support" driver offers a POSIX-compliant cryptographic filesystem layer. This eCrypt can be placed on every and any filesystem no matter what partition table the filesystem resides on. http://www.linux.org/threads/intro-to-encryption.4376/

The eCrypt layer can have a device file if this driver is enabled (Enable notifications for userspace key wrap/unwrap). The device path is /dev/ecryptfs.

Linux also supports HFS and HFS+ (Apple Macintosh file system support) and (Apple Extended HFS file system support). http://www.linux.org/threads/hierarchical-file-system-hfs.4480/ and http://www.linux.org/threads/hierarchical-file-system-plus-hfs.4493/

The BeFS filesystem can be used by Linux as a read-only filesystem (BeOS file system (BeFS) support (read only)). Generally, it is easier to program the reading abilities for a filesystem than the writing features.

Special debugging features exist for BeFS (Debug BeFS).

EFS is another filesystem that Linux can only read, not write (EFS file system support (read only)). http://www.linux.org/threads/extent-file-system-efs.4447/

Some flash drives may use the JFFS2 filesystem (Journalling Flash File System v2 (JFFS2) support). Next, the debugging level can be set (JFFS2 debugging verbosity). http://www.linux.org/threads/journaling-flash-file-system-version-2-jffs2.4495/

To use JFFS2 on NAND and NOR flash drives, this driver is needed (JFFS2 write-buffering support).

This next driver offers better error protection (Verify JFFS2 write-buffer reads).

JFFS filesystems can be mounted faster with "JFFS2 summary support" enabled. This driver stores information about the filesystem.

Like the other extended/extra attributes drivers for some filesystems, JFFS2 has such a driver (JFFS2 XATTR support).

The JFFS2 filesystem supports various transparent compression systems. This allows files to be smaller on JFFS2 filesystems and be read without the user needing to perform any special actions. (Advanced compression options for JFFS2), (JFFS2 ZLIB compression support), (JFFS2 LZO compression support), (JFFS2 RTIME compression support), and (JFFS2 RUBIN compression support). The default compression format can be defined in the following option (JFFS2 default compression mode).

A successor for JFFS2 exists and is supported by the kernel (UBIFS file system support). The Unsorted Block Image File System (UBIFS) also competes with LogFS.

The Linux kernel also supports LogFS (LogFS file system).

ROM-based embedded systems need support for CramFS (Compressed ROM file system support (cramfs)).

Alternately, embedded systems could use SquashFS which is a read-only compression filesystem (SquashFS 4.0 - Squashed file system support). The Linux kernel also offers extended attributes for SquashFS (Squashfs XATTR support).

There are three different compression formats supported by SquashFS - (Include support for ZLIB compressed file systems), (Include support for LZO compressed file systems), and (Include support for XZ compressed file systems). The block size for SquashFS can be set to four kilobytes (Use 4K device block size?). Also, the cache size can be set (Additional option for memory-constrained systems).

The Linux kernel supports FreeVxFS (FreeVxFS file system support (VERITAS VxFS(TM) compatible)), Minix (Minix file system support), MPEG filesystem (SonicBlue Optimized MPEG File System support), HPFS (OS/2 HPFS file system support), QNX4 (QNX4 file system support (read only)), QNX6 (QNX6 file system support (read only)), and the ROM filesystem (ROM file system support). http://www.linux.org/threads/qnx-file-systems.4577/ and http://www.linux.org/threads/minix-mini-unix-file-system.4545/

"RomFS backing stores (Block device-backed ROM file system support)" offers a list of various ROMfs extra features and abilities.

The "Persistent store support" driver provides support for the pstore filesystem which allows access to platform level persistent storage.

The pstore filesystem can store kernel logs/messages (Log kernel console messages).

When a kernel panic takes place (equivalent to the "Blue-Screen-of-Death" on Windows), the "Log panic/oops to a RAM buffer" driver will store a log in the RAM.

This next single driver offers support for the Xenix, Coherent, Version 7, and System V filesystems (System V/Xenix/V7/Coherent file system support).

The Linux kernel also supports UFS (UFS file system support (read only)), (UFS file system write support (DANGEROUS)), and (UFS debugging).

exofs is also supported by the kernel (exofs: OSD based file system support).

The Flash-Friendly FileSystem is a special filesystem for flash drives (F2FS filesystem support (EXPERIMENTAL)), (F2FS Status Information ), (F2FS extended attributes), and (F2FS Access Control Lists). http://www.linux.org/threads/flash-friendly-file-system-f2fs.4477/
 
Last edited:
Configuring 21

In this article, we will configure network filesystem support for the Linux kernel. A network filesystem is a remote filesystem that computers access via the network.

First, the "NFS client support" driver allows the Linux system to use the NFS network filesystem. There are also three other drivers for different versions of NFS - (NFS client support for NFS version 2), (NFS client support for NFS version 3), (NFS client support for NFS version 4), and (NFS client support for NFSv4.1). If you have a network that possess NFS, either figure out what version of NFS you are using, or enable all of the NFS drivers.

Swap space is not required to be on a local storage unit. This driver allows Linux to use NFS support to use remote swap spaces (Provide swap over NFS support).

The NFS system can be sped up by using a cache system (Provide NFS client caching support). This is local cache.

Enable this driver to allow DNS to use host-names for NFS servers (Use the legacy NFS DNS resolver).

"NFS server support" gives the server providing NFS the features it needs to fulfill such a task. Some other NFS drivers include (NFS server support for NFS version 3) and (NFS server support for NFS version 4).

The "NFS server manual fault injection" driver is a debugging tool that allows developers to make the NFS server think an error occurred with NFS. Specifically, this is used to test how the server handles NFS errors.

The "Secure RPC: Kerberos V mechanism" is needed to make the RPC calls secure. NFS cannot be added to the kernel without this feature for security reasons.

There is a special debugging tool for RPC (RPC: Enable dprintk debugging).

The Linux kernel supports the Ceph filesystem (Ceph distributed file system).

CIFS is a virtual filesystem used by Samba and Windows servers (CIFS support (advanced network filesystem, SMBFS successor)). CIFS stands for Common Internet FileSystem.

There are two features that can be used to debug or monitor the CIFS driver (CIFS statistics) and (Extended statistics).

A special driver is needed to support servers with LANMAN security (Support legacy servers which use weaker LANMAN security). LANMAN or LM hash is a special password hashing function that has some weaknesses.

CIFS requires that Kerberos tickets be used before mounting to secure servers (Kerberos/SPNEGO advanced session setup). This driver offers the ability for CIFS to use userspace tools which is needed to provide the tickets.

Like other filesystems, CIFS can have extended abilities (CIFS extended attributes) and (CIFS POSIX Extensions).

This driver gets the Access Control List (ACL) from the CIFS server (Provide CIFS ACL support).

CIFS has two other debugging tools (Enable CIFS debugging routines) and (Enable additional CIFS debugging routines).

CIFS can have "DFS feature support" which allows shares to be accessed even when they are moved. DFS stands for Distributed FileSystem.

SMB2 is an improved alternative to CIFS (SMB2 network file system support). SMB2 stands for Server Message Block version 2.

Clients can store CIFS cache with this driver enabled (Provide CIFS client caching support).

Novell NetWare clients need this driver to access NetWare volumes (NCP file system support (to mount NetWare volumes)). NCP stands for NetWare Core Protocol. NCP is a protocol that allows clients to communicate with the servers hosting NetWare volumes.

NetWare servers can use NFS namespaces if this driver is enabled (Use NFS namespace if available).

NetWare servers can use the OS/2 long namespaces if this driver is enabled (Use LONG (OS/2) namespace if available).

If this driver is enabled, then filenames made by DOS or on storage units owned by a DOS system will be converted to lowercase (Lowercase DOS filenames).

Many filesystems depend on native language support (Use Native Language Support). Specifically, Native Language Support (NLS) allows the different character-sets to be used in filenames.

NCP filesystems can support the execute flag and symbolic links with this driver enabled (Enable symbolic links and execute flags).

The Linux kernel offers support for the Coda filesystem (Coda file system support (advanced network fs)). Coda is one of many network filesystems.

The Linux kernel can support the Andrew Filesystem (Andrew File System support (AFS)). However, the Linux kernel can only read such filesystems in an insecure manner. This driver is intended to allow Linux systems to access AFS. If your network only contains Linux systems, then select a different network filesystem that the kernel can fully support.

The Linux kernel has an experimental driver for accessing Plan 9 resources via the 9P2000 protocol (Plan 9 Resource Sharing Support (9P2000)). The kernel also has cache support (Enable 9P client caching support) and control lists (9P POSIX Access Control Lists) for the previously mentioned Plan 9 feature.

kernel_21-png.687


After the network filesystems have been configured, the next part of the kernel to setup is the "Native Language Support". This whole menu contains the drivers for most or all of the character-sets and encodings. Enabling these encodings allows these character sets to be used by the system and applications. UTF-8 is the most commonly used encoding, but it is not the only one. Most applications and driver need UTF-8, so this encoding is already set to be added to the kernel.

After that menu, the "Distributed Lock Manager (DLM)" can be configured. A DLM is used to keep shared resources in sync and performing well. This driver manages the userspace and kernelspace applications that access or manipulate shared resources (like network filesystems). Clusters strongly depend on this driver.

Now that we have finally finished configuring the filesystems and related features, we can now move on to the "Kernel hacking" menu seen on the main (first) screen on the kernel configuration tool (when using the command "make menuconfig"). I am using the ncurses interface (seen in the screenshot) which is initiated with the "make menuconfig" command, so other interfaces may be a little different. Many of the features and settings in the kernel hacking menu contain various settings concerning the kernel itself. Some of these features are debugging tools and some control the kernel's behavior.

This first setting adds the pritnk time stamps to the syslog system call output (Show timing information on printks).

The next three features control various debugging features (Default message log level (1-7)), (Enable __deprecated logic), and (Enable __must_check logic).

kernel_21_1-png.688


The next feature is a debugging feature that is active at compiling time ((1024) Warn for stack frames larger than (needs gcc 4.4)). If stack frames are larger than the specified amount, then the compiler will warn the user.

The "Magic SysRq key" driver will enable support for the Magic SysRq key. This allows users to send the kernel special commands when Alt+PrintScreen is pressed. This works in most cases regardless of the kernel's state. However, exceptions exist. It is highly recommended that the Magic SysRq Key be enabled.

During compilation, the assembler's symbolic links will be removed during a link to reduce the output of get_wchan() (Strip assembler-generated symbols during link).

This next feature is for debugging purposes (Generate readable assembler code). If enabled, some kernel optimizations will be disabled so some of the assembly code will be human-readable. This will harm the kernel's speed. Only enable this if you have a specific reason for doing so.

This setting enables/disables commonly unneeded or obsolete symbols (Enable unused/obsolete exported symbols). However, some modules may need such symbols. Enabling this will increase the kernel's size. It is very unlikely that a Linux user will need such symbols. In general, disable this unless you know for a fact the user needs a symbol for an important module.

Sanity checks will be performed on user kernel headers if this setting is enabled (Run 'make headers_check' when building vmlinux).

During compilation, this feature will check for invalid references (Enable full Section mismatch analysis).

The kernel can be configured to detect soft and hard lockups (Detect Hard and Soft Lockups). When the system is frozen for more than twenty second and other tasks cannot execute, this is called a soft-lockup. If the CPU is in a loop that lasts for more than ten seconds and interrupts fail to get execution time, then this is called a hard-lockup.

The next to features set the kernel to reboot on hard and soft lockups respectively, (Panic (Reboot) On Hard Lockups) and (Panic (Reboot) On Soft Lockups).

When the kernel experiences major problems, it can be set to start a kernel panic (Panic on Oops). It is highly recommended that this setting be enabled. This will help to prevent the kernel from causing system damage and data loss.

The kernel can be set to detect hung tasks (Detect Hung Tasks). This is when a process or application locks-up or is frozen. Specifically, the application becomes uninterruptable. The following setting allows the user to define how much time must pass before a process is deemed "hung" (Default timeout for hung task detection (in seconds)).

The kernel can be set to restart when a process hangs (Panic (Reboot) On Hung Tasks). Generally, users will not want to enabled this. Would you like your computer to restart every time an application becomes frozen?

The "Kernel memory leak detector" finds and logs memory leaks.

The kernel uses frame pointers to help report errors more efficiently and include more information (Compile the kernel with frame pointers). I will skip a lot of the debugging tools because they are self-explanatory.

As many Linux users know, when the system boots up, the boot messages appear too quickly to be read. This feature sets the delay time which will give users more time to read the messages (Delay each boot printk message by N milliseconds).

This is a special developmental feature for testing backtrace code (Self test for the backtrace code). Backtrace code is a self-test.

Block device number can be extended (Force extended block device numbers and spread them). However, this may cause booting issues, so use with caution.
 
Last edited:
Configuring 22

In this article, we will continue to configure the kernel hacks and then we will configure the whole security system.

The next feature to configure is needed by Alpha and s390 processor (Force weak per-cpu definitions). This feature offers a fix for an addressing issue commonly seen in such processors. Other processors do not need this feature enabled.

Kernel dumps can be tested with this special debugging tool (Linux Kernel Dump Test Tool Module). This software will allow a kernel developer to trigger a fake error that will cause a kernel dump. The kernel developers can then ensure that the dumps complete successfully.

The kernel offers some different error injection modules that allow kernel developers to test the notifiers (CPU notifier error injection module), (PM notifier error injection module), and (Memory hotplug notifier error injection module). A notifier informs the system that the hardware is present, which is important for hotplugging. These error injection modules trigger an error in this notification system so developers can test the notification system's error handling abilities.

The "Fault-injection framework" driver offers various tools for testing fault-handling.

The "Latency measuring infrastructure" driver provides the LatencyTOP tool used to find the userspace object that is blocking/interfering with a kernel execution/task.

Next, we have a sub-menu titled "Tracers" that contains a list of various tracers. A tracer is a piece of code that watches various kernel functions. Every time a particular function starts, a tracer will be called to watch the function.

This next module tests the performance of the Red-Black tree library (Red-Black tree test). The Red-Black tree library is a sorting and searching algorithm.

The next feature is the same except that it tests the interval tree library (Interval tree test).

The kernel can also debug FireWire on other systems while that particular remote system is booting (Remote debugging over FireWire early on boot) and (Remote debugging over FireWire with firewire-ohci).

The printk() function can be made to print various debugging messages if this feature is enabled (Enable dynamic printk() support). “printk()” is a commonly discussed system call, so remember that it prints debugging messages about the kernel.

Here is a Direct Memory Access (DMA) debugging driver (Enable debugging of DMA-API usage).

The Atomic64 self-tests checks if the system supports atomic operations (Perform an atomic64_t self-test at boot). This is where a 32-bit system performs a 64-bit operation.

This driver provides a self-test for all of the possible RAID6 recovery systems (Self test for hardware accelerated raid6 recovery).

NOTE: Self-tests are low-level tests and detection software that executes before most of the system's hardware and software turns on and executes. Self-tests search for hardware, failing devices, etc. A self-test may also be code an application uses to test itself.

In the Kernel Hacking menu (if you are using a menu-like interface such as ncurses), there is a sub-menu titled "Sample kernel code". If you make your own personal modules, this is where you can enable them. In a later article, we will discuss how to implement custom/home-made kernel modules. Just remember this is where you enable your module.

kernel_22-png.704


The Kernel GNU DeBugger (KGDB) has many features that can be enabled or disabled (KGDB: kernel debugger). This debugger only works with two Linux systems plugged into each other via serial connection.

This feature provides extra boot-up messages for the decompression of the bzimage (Enable verbose x86 bootup info messages). You set the kernel encryption near the beginning of the configuration process.

Printk() prints various information to the boot-screen of dmesg, but after the serial and console drivers load. Enable this driver to make printk start printing messages sooner (Early printk).

This next driver is the same as above, but uses the EHCI port (Early printk via EHCI debug port).

The kernel can be set to watch for stack overflows so the kernel can manage the error better (Check for stack overflows). The kernel will execute more slowly, but overflows will not cause as much damage.

The page-table for the kernel can be seen on debugfs with this enabled (Export kernel pagetable layout to userspace via debugfs). However, this will slow down the kernel. This is needed for debugging purposes.

The kernel's written mistakes can be caught with this feature (Write protect kernel read-only data structures). This option turns the kernel's read-only data to write-protected mode. This debugging tool harms the kernel's speed. That debugging tool has a tool to debug itself (Testcase for the DEBUG_RODATA feature).

To prevent the execution of modules with modified code (due to an error), then enable this protective feature (Set loadable kernel module data as NX and text as RO). The debugging tool for that feature is provided by this driver (Testcase for the NX non-executable stack feature).

The kernel can be set to flush one of the TLB entries at a time or the whole table using this option (Set upper limit of TLB entries to flush one-by-one).

The next feature is an IOMMU debugging feature (Enable IOMMU debugging). There is another debugging test that disables some IOMMU features to test for extra stability (Enable IOMMU stress-test mode). The IOMMU stands for input/output memory management unit.

Enabling this option will make the kernel perform selt-tests on the change_page_attr() system call on thirty second intervals (CPA self-test code). This system call changes page attributes.

Any kernel code marked as "inline" can not be manipulated as much as it would by GCC than if it were not marked (Allow gcc to uninline functions marked). The GCC compiler adds code that it feels will make the code better (GCC is good at doing so). However, some code is not meant to be manipulated by GCC.

This next driver offers sanity checks for the "copy_from_user()" system call (Strict copy size checks). copy_from_user() copies a block of userspace data to kernelspace.

Here is another self-test; this one is for NMI (NMI Selftest).

Now, we can move on to the "Security Options" which is seen as a sub-menu in the main menu if you are using a menu-based interface, like ncurses. The first option allows access keys and authentication tokens to be stored in the kernel (Enable access key retention support). This is used for many reasons like accessing encrypted filesystems.

The following option is for creating and sealing/unsealing keys (TRUSTED KEYS). Encrypted keys are encrypted/decrypted using this driver (ENCRYPTED KEYS).

Keys can be viewed in proc with this feature enabled (Enable the /proc/keys file by which keys may be viewed).

Extra restrictions can be applied to syslog with this security feature (Restrict unprivileged access to the kernel syslog).

If this option is enabled, then the user can select different security models (Enable different security models). Otherwise, the defaults will be used. Disable this if you do not fully understand security or if you are fine with your kernel using the defaults.

The securityfs filesystem is offered by this driver (Enable the securityfs filesystem).

Hooks are added to networking and socket security when this feature is enabled (Socket and Networking Security Hooks). These hooks are access controls.

IPSec networking hooks (also called XFRM networking hooks) are implemented when this option is enabled (XFRM (IPSec) Networking Security Hooks). Security hooks are also available for files (Security hooks for pathname based access control).

The next driver provides support for Intel's Trusted Execution Technology (Enable Intel(R) Trusted Execution Technology (Intel(R) TXT)).

The user can set the range of memory addresses that cannot be reserved for userspace (Low address space for LSM to protect from user allocation). The starting point is 0. The user types the end point for this option. For most platforms, 65536 is a recommended choice.

SELinux (mentioned in the Kernel Security article) is one of the popular Linux-Security-Modules (NSA SELinux Support). Many options and features exist for SELinux. The boot parameter determines whether SELinux is started {1} or not started {0} when the kernel executes (NSA SELinux boot parameter). SELinux can be configured with the ability to be temporarily disabled at times when the Root user needs to do so (NSA SELinux runtime disable). Users can develop and test new policies with this feature enabled (NSA SELinux Development Support). AVC statistics are collected and stored by this feature (NSA SELinux AVC Statistics). A default can be set for the checkreqprot flag; a "1" means SELinux will check the application's requested protection and zero will default to the kernel's protection for mmap and mprotect system calls (NSA SELinux checkreqprot default value). Many SELinux policies exist; the user can set the latest version that they wish SELinux not to excede (NSA SELinux maximum supported policy format version).

One of the other Linux-Security-Modules (LSM), SMACK, is supported by the kernel (Simplified Mandatory Access Control Kernel Support).

TOMOYO is another supported LSM (TOMOYO Linux Support). The maximum number of entries permitted to be added during learning-mode is set in the following feature (Default maximal count for learning mode). The amount of log entires can also be set (Default maximal count for audit log). Next, this option allows/disallows TOMOYO to be activated without a policy loader (Activate without calling userspace policy loader). The location of the policy loader is configured here ((/sbin/tomoyo-init) Location of userspace policy loader) and the executable that triggers the execution is set here ((/sbin/init) Trigger for calling userspace policy loader).

Again, the kernel supports another LSM - AppArmor (AppArmor support). Like with SELinux, the default boot parameter can be set for AppArmor (AppArmor boot parameter default value).

Yama is another LSM (Yama support). Yama can be used with another LSM if this feature is enabled (Yama stacked with other LSMs).

This driver gives the kernel the ability to use multiple keyrings for verification processes (Digital signature verification using multiple keyrings).

Asymmetric keys are supported with this feature (Enable asymmetric keys support).

The kernel can keep and maintain a list of hashes and important system files (Integrity Measurement Architecture(IMA)). Then, if malware changes an important file, the kernel will know because the hashes are checked before the file or executable are used. It is highly recommended that this feature be enabled.

Extra security attributes are added to files if this feature is enabled (EVM support). The version can be set using this next option (EVM HMAC version). The two choices are version 1 and 2.

Remember all of the different Linux Security Modules (LSMs)? Well, the default can be set here (Default security module (AppArmor)).
 
Last edited:
Configuring 23

In this article, we will configure the Cryptographic API, Virtualization, and Library Routines. Cryptography refers to encryption and secure communication between desired computers. Users may encrypt data to ensure only the recipient reads the data instead of hackers that may obtain the data.

The Linux kernel requires the "Cryptographic algorithm manager" to be enabled in the kernel. This feature provides the software needed to operate the cryptographic abilities of the kernel.

The userspace can configure the cryptography features when this driver is enabled (Userspace cryptographic algorithm configuration). NOTE: This configuration is referring to the cryptographic setup during kernel runtime, not the tool for making the kernel.

To enhance performance, enable this feature which stops self-tests on cryptographic algorithms (Disable run-time self tests).

The "GF(2^128) multiplication functions" is a specific algorithm used by some ciphers. GF stands for Galois field and is a set of finite numbers. These sets are called fields and they come in a variety of sizes.

"Null algorithms" are algorithms used in IPsec. Null encryption means no encryption, so this driver allows IPsec to use no encryption.

Arbitrary algorithms can be converted to parallel algorithms (Parallel crypto engine). This feature provides that converter.

Arbitrary algorithms can also be converted to asynchronous algorithms (Software async crypto daemon).

"Authenc support" is needed by IPsec. Authenc support stands for Authenticated Encryption and offers multiple encryptions to IPsec.

CCM stands for "Counter with CBC MAC" and is needed by IPsec (CCM support).

This driver offers "GCM/GMAC support". GCM is "Galois/Counter Mode" and GMAC is "Galois Message Authentication Code".

NOTE: I will not be able to explain specifically the use and details of some of these features. Cryptography is a detail field of computers and explaining cryptography is beyond the scope of this article.

The "Sequence Number IV Generator" is a special number generator used by some cryptography software.

The Linux kernel supports various cipher algorithms (CBC support), (CTR support), (CTS support), (ECB support), (LRW support), (PCBC support), (XTS support), (HMAC support), (XCBC support), and (VMAC support).

The "CRC32c CRC algorithm" is used specifically with SPARC64 processors.

"CRC32c INTEL hardware acceleration" is another processor specific algorithm. It works on Intel processors with SSE4.2.

The kernel also offers many digests, ciphers, and other cryptographic software. Generally, allow the defaults unless you have a specific reason for enabling or disabling features.

NOTE: A digest (like MD5) generates a hash (sequence of characters) based on a file. Hashes are then used to check files. For example, if you download the Ubuntu installation ISO file from Canonical's website, you may want to know if the file on your hard-drive is an exact replica of the server's file. Users do this because the ISO may become corrupted during the long download. The hash is used to prove that the file is unchanged.

NOTE: A cipher is an encryption/decryption algorithm. Encryption is the process of making a file unreadable to anyone other than the intended recipients/owners. Decryption is the process used to view an encrypted file.

The Linux kernel also supports various compression algorithms that users are strongly recommended to enable (Deflate compression algorithm), (Zlib compression algorithm), and (LZO compression algorithm).

The kernel can generate random numbers which are needed for cryptographic software (Pseudo Random Number Generation for Cryptographic modules), (User-space interface for hash algorithms), and (User-space interface for symmetric key cipher algorithms).

"Hardware crypto devices" is a sub-menu that contains a list of drivers for hardware-based cryptography tools. This is hardware that has the algorithms in the firmware.

Various drivers for asymmetric public-keys exist in the "Asymmetric (public-key cryptographic) key type" menu.

Now, we can move on to the next entry on the main menu of the configuration tool (on menu-based interfaces). Virtualization is the ability to host an operating system. This means Linux (and other systems) can run another operating system as if the guest system is an application.

"Kernel-based Virtual Machine (KVM) support" allows the kernel itself to manage the guest system. Computers with Intel processors need this driver (KVM for Intel processors support) and AMD processors need (KVM for AMD processors support).

The memory management unit (MMU) for the Kernel Virtual Machine (KVM) can have an auditing system.

The guest's network can become faster with this feature enabled (Host kernel accelerator for virtio net).

After configuring the virtual machine abilities for the kernel, the last menu on the main screen of the configuration tool (when using a menu interface) is the last portion of the kernel to configure. This menu is for the "Library Routines", also called library functions. Parts of the kernel can be used as a linkable library. For example, XZ filters (compression algorithms) can be accessed by external programs. The different libraries are listed below.

NOTE: The CRC functions are mostly the same but with different features or performance. Generally, it is best to allow the defaults.

CRC-CCITT functions - The Cyclic Redundancy Check function tests for changes in raw data

CRC16 functions - The Cyclic Redundancy Check function tests for changes in raw data

CRC calculation for the T10 Data Integrity Field - The Cyclic Redundancy Check function tests for changes in raw data

CRC ITU-T V.41 functions - The Cyclic Redundancy Check function tests for changes in raw data

CRC32/CRC32c functions - The Cyclic Redundancy Check function tests for changes in raw data

CRC32 perform self test on init - The Cyclic Redundancy Check function tests for changes in raw data

CRC32 implementation (Slice by 8 bytes) - The Cyclic Redundancy Check function tests for changes in raw data

CRC7 functions - The Cyclic Redundancy Check function tests for changes in raw data

CRC32c (Castagnoli, et al) Cyclic Redundancy-Check - The Cyclic Redundancy Check function tests for changes in raw data

CRC8 function - The Cyclic Redundancy Check function tests for changes in raw data

* BCJ filter decoder - XZ decoder design for a specific processor where "*" is the processor. The kernel lists a few different architectures.

XZ decompressor tester - Debug functions for testing the XZ decoder

Averaging functions - Load average as seen in the output of "uptime"

CORDIC algorithm - hyperbolic and trigonometry functions

JEDEC DDR data - JEDEC Double Data Rate SD-RAM specification


Now, the configuration is complete!
 
Compiling and Installing

After you have spent a lot of time configuring your kernel to make the kernel you need, you can now compile it. The source code is C code in the form of plain text. This is readable to humans but not computers. Compiling the code converts the code to a form computers understand called binary (ones [on] and zeros [off]). Compilation will also make all of those kernel files one file called the kernel.

To compile the kernel, type "make" in a terminal that is in the same directory as the kernel's source code folders. This will take some time. Once done, the modules must be compiled by typing "make modules". To make the compiling process easier from the beginning, type "make; make modules". This will make the kernel and then the modules instead of the user coming back later to type "make modules".

compiling_01-png.763


WARNING: Before you install a kernel, backup all of the important data. Be sure to make a copy of /boot/ onto a FAT32 memory card. This helps to repair the system if the installation fails. FAT32 does not store permissions, so it will be easier to use a live disk to copy the files back. Remember to set the original file permissions and executable bits.

Once the compilation has finished successfully, we can then install the kernel to the local system (I will explain how to install the kernel on other systems in a moment [cross-compiling]). In the same terminal, after compilation, type "make install". This will place some files in the /boot/ directory. The "vmlinuz" (or some similar name) is the kernel itself. "initrd" is the temporary RAM-based filesystem that is placed in memory and used during boot-up. The "System-map" contains a list of kernel symbols. These are global variables and functions used in the kernel's code. "config" is the configuration file for the kernel. grub.cfg will automatically be updated. However, some other bootloaders may need to be manually configured. The kernel installer can configure the Grub, LILO, and SysLinux bootloader automatically. Bootloaders like BURG may need to be manually configured. The modules must also be installed by typing "make modules_install".

compiling_04-png.764


NOTE: Both the kernel and modules can be installed using one line - “make install && make modules_install”

compiling_05-png.765


Once that process is complete, the user can ensure the kernel was installed by restarting the system and typing "uname -r" in a terminal when the system is back on. If the system fails to boot or uname reports a different version number than expected, the issue may be due to one of many issues. Either the bootloader was improperly setup, feature/configuration conflict, compilation error, improperly installed, or some other reason. The best way to start finding the source of the issue is to look at the systems logs (if the system boots up enough to produce logs). "dmesg" is a command that prints the kernels logs to the screen. Look for any errors, warnings, or unexpected results. If the system does not boot or does not boot-up enough to produce logs, use a live Linux disc to perform diagnostics and repairs. If all else fails, compile the kernel again and make sure you installed the kernel as Root or used "sudo".

NOTE: The best way to repair such a system is to use a live Linux distro to remove the new/broken kernel and then manually fix (or paste a backup) Grub's files.

Some Linux users like to have the documentation installed as well, but this is not required. For those that like to have the documentation installed, type this line where version is the kernel version - "install -d /usr/share/doc/linux-VERSION && cp -r Documentation/* /usr/share/doc/linux-VERSION" (VERSION is the kernel's version number). Obviously, Root privileges are required.

To compile a newer kernel with the same features as your current kernel, then type this command "zcat /proc/config.gz > .config". This file may not exist, if so, you may be able to ask the developers of your distro/kernel for the file. The "zcat" command uncompresses the data and places it in the file ".config". Remember to type where you want ".config". This file is to be placed in the Linux kernel directory and allow it to replace the current file. Then, compile and install the kernel as you normally would.

Cross-compiling is slightly different. Configure the kernel for the intended system. Make sure that when the kernel was configured, that it was configured with cross-piling in mind. When cross-compiling, their are two terms to be familiar with. "Host" is the system performing the compilation. The "Target" is the system that will receive the new kernel. Make sure that the host system has the proper compilers. For example, to cross-compile for ARM systems, users will need gcc-arm-linux-gnueabi on the host system. Generally, the developer can do a search in their package manager or Google for the proper/best cross-compiler for their needs. The specific command used to cross-compile for ARM systems is "make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-". The "ARCH=arm" refers to the target's processor type and "CROSS_COMPILE" declares the cross-compiler. Notice that the cross-compiler is missing the "gcc-" at the beginning and ends in a dash. That is the format users must use when using the cross-compiler as a parameter. The modules can be cross-compiled by typing "make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- modules". To install the kernel on the target system, copy the Linux kernel folder to the target system. Once the files are on the target system and a terminal is open in that directory, type "make install && make modules_install". Of course, you must be Root or use "sudo".

INFO: Kernel.org hosts a list of supported cross-compilers (https://www.kernel.org/pub/tools/crosstool/).

FUN FACT: Some Linux distros store the kernel's config file in the /boot/ directory. Developers wanting to compile a kernel with the same settings as their current kernel can copy the file to their build directory. cp /boot/config-$(uname -r) /PATH/TO/.config



Compilation and Installation Summary:

Standard:

Code:
make && make modules && make install && make modules_install

Make a newer version or remix of your kernel:

Code:
zcat /proc/config.gz > .config &&  make && make modules && make install && make modules_install

Cross-compile:

Code:
make ARCH={TARGET-ARCHITERCTURE} CROSS_COMPILE={COMPILER}; make ARCH={TARGET-ARCHITERCTURE} CROSS_COMPILE={COMPILER} modules && make install && make modules_install
 
Last edited:
Modules

Now that we have our new kernel installed, we may have a module we wish to manipulate. Modules allow users to get extra hardware support without making a new kernel. For example, if a kernel is made without HFS support, a user can still use the HFS filesystem without making a new kernel.

Just in case you do not fully understand modules, Window's equivalent to Linux's modules are called drivers. As an analogy - Linux is to Windows as module is to driver. However, many people still call modules "drivers". That is fine, people and search engines know what you mean. Sometimes, modules are referred to as Loadable Kernel Modules (LKMs) because they can be loaded without changing the kernel.

Modules are stored under /lib/modules/ this directory has a directory for each installed kernel. The module files themselves end in ".ko" which stands for "Kernel Object".

NOTE: Not all “.ko” files are modules. Linux also uses “.ko” for Linux's analogy of Window's “.dll” files.

To access the module directory (via command-line) of the currently active kernel, type this command - /lib/modules/$(uname -r)/. The "$(uname -r)" will be replaced by the output of "uname -r", so the user does not need to know or memorize the name/version of the active kernel. This folder is well organized. In ./kernel/, these are the following directories.
modules_01-png.789


arch - Architecture specific modules

crypto - Cryptography modules

drivers - Many modules are stored here and are sorted by hardware type. For example, ATA modules are under the "ata" folder.

modules_02-png.790


fs - Filesystem modules are kept here. For instance, the module file for the Minix filesystem is /lib/modules/$(uname -r)/kernel/fs/minix/minix.ko

lib - The library routines are stored under this directory

mm - Modules for managing memory and debugging the kernel are stored here

net - Network modules are stored here

sound - This is an obvious one

Some other folders may exist. For instance, I have a "ubuntu" directory which contains modules specific to Ubuntu or were added by the Ubuntu developers.

When managing modules, Root privileges must be used. So, use "sudo" or login as Root.

To load a module, use this command (when logged in as Root. Otherwise, remember sudo.)

Code:
modprobe <MODULE>

For illustration, to use the HFS+ filesystem, load its module.

Code:
modprobe hfsplus

Include the "-v" parameter for verbose messaging and "-s" sends error messages to the syslog. To see what modules the desired module requires, use "--show-depends" as a parameter. To remove a module, use "-r".

To get information about a module, use the "modinfo <MODULE>" command. You may notice information about aliases. Some modules have an alias which can be used to reference the module. The real name and alias each work well. Use which ever you remember better.

modules_04-png.791


The "lsmod" command lists the currently load modules. This is helpful when users need to ensure a module is loaded.

Some modules can be given special parameters before they are loaded. To view these parameters (and aliases) for a specific module, use this command

Code:
modprobe -c | grep <MODULE>

To set a parameter for a module at runtime (the module is already running/loaded), use this command

Code:
insmod MODULE PARAM_NAME=VALUE

If you wish to load a module and also set the parameter at the same time, use this command

Code:
modprobe MODULE PARAM_NAME=VALUE

Alternately, parameters can be set at runtime be changing files in /sys/module/. For illustration, to change a parameter for a bluetooth module, use something like this

Code:
sudo echo -n "VALUE" > /sys/module/bluetooth/parameters/FILENAME

If a module has issues loading, like an "unknown symbol error", the module may still be able to be loaded by using the "-f" parameter. Use this parameter with caution.

Code:
modprobe -f <MODULE>

To load modules from different kernel versions, use the same command above after you copy the module to the active kernel's module path. Or, try

Code:
insmod /PATH/TO/MODULE.ko

Now, you can enjoy your kernel even more with module manipulation.
 
Last edited:
Patches

Sometimes, the kernel developers may release a patch for a particular Linux kernel. It helps to know how to apply these patches which will usually fix bugs or enhance performance. Patches can be obtained from "kernel.org". Once a patch is downloaded to your local system, place it in the directory containing the folder of your kernel's source code. Ensure that the kernel and patch are compatible, meaning, they must be the same version. Patches are applied to the uncompressed source code before the kernel is configured.

To apply a patch, type the following where "PATCH" is the patches file name. Also, use this command while the active directory is the kernel's source code.

Code:
patch -p1 < ../PATCH

If a patch is accidentally applied or the user decides not use a patch, then the patch can be removed/undone in the following manner,

Code:
patch -R -p1 < ../PATCH

The "-R" reverses the patches changes.

Compressed patches can be applied using one of these commands depending on the compression type. Again, you must be in the directory with the kernel's code.

Code:
zcat ../PATCH.gz | patch -p1
xzcat ../PATCH.xz | patch -p1
bzcat ../PATCH.bz2 | patch -p1

However, patches can be applied other ways. Ksplice is an administrator tool that allows users to apply minor security patches to the kernel that is currently running. To apply the security patches, type "uptrack-upgrade -y" in a terminal. "uptrack-show --available" displays available patches and "uptrack-remove --all" removes the patches. Ksplice functions like a package manager. Ksplice searches its repositories for patches and then installs the patches. However, unlike a package manager, Ksplice applies these patches to the kernel in memory. So, if the computer reboots, the updates will not exist.

Kernel Names

When using patches, it is important to understand the name of the kernel. For example, kernel v3.12.5 is not the same as v3.12.5-rc2 or v3.12.5-mm. "-rc#" means release candidate. These are kernels that will soon be released after some final testing. "-mm" indicates the kernel is experimental, so such kernels may be unstable and are not suitable for mainstream use. "-git" is found on the daily snapshots of the kernel from GitHub.


Updating the Kernel

For those of you that dislike compiling your own kernel or are happy with the configuration of other developers, there are some options for you in getting a newer kernel.

The number one method of updating the system's kernel is via the package manager. Updating the kernel is just like updating any other software. Open your package manager and refresh the repository list to see the latest updates. If updates exist for your kernel, then select the updates and apply them. If you are using a command-line, then follow the normal procedures for applying updates.

To Update or not to Update

If you are unsure about whether an update is worth-while, ask yourself these question -

Why do I want to upgrade?
Does my system have a bug that I want fixed?
Is there some hardware I want supported by my Linux system?
Do I have hardware that is partially supported?
What if the update goes wrong?
What if the update introduces problems or new bugs?
Would updating improve system security?
Will the system continue to function well without the update?

If you need help making a decision, read the changelog to see the changes in the newer version of the kernel. Go to Kernel.org to read the changelogs. Changelogs include detailed information on changes and improvements as well as bug fixes.


Overall, only perform updates/upgrades on important computers when necessary. When working for a large company (or any company), try to refrain from updating the kernel unless the server is at risk to malware and bugs.
 
Types of Kernels

Many Linux users are familiar with Linux being called "GNU/Linux". This means something special. "GNU" refers to the userland and "Linux" refers to the kernel. Now, you may be wondering if these two components can be swapped. Yes, they can. For example, in the beginning of Linux's history, the GNU community had no kernel at the time and Linux was just a kernel. So, they put the two projects together. Today, the GNU community has a kernel called "Hurd". They then swapped the Linux kernel with the Hurd kernel to make "GNU/Hurd" systems. For instance, Arch exists as "GNU/Linux" and "GNU/Hurd". Not only do various userland and kernel combinations exist, there are different Linux kernels. Also, Linux is not the only operating system with a module (obvious, right?).

Many kernels exist for various operating systems. Two main types of kernels exist - monolithic kernels and microkernels. Linux is a monolithic kernel and Hurd is a microkernel. Microkernels offer the bare essentials to get a system operating. Microkernel systems have small kernelspaces and large userspaces. Monolithic kernels, however, contain much more. Monolithic systems have large kernelspaces. For instance, one difference is the placement of device drivers. Monolithic kernels contain drivers (modules) and place them in kernelspace while microkernels lack drivers. In such systems, the device drivers are offered in another way and placed in the userspace. This means microkernel system still have drivers, but they are not part of the kernel. In other words, the drivers exist in another part of the operating system. There is a lot more to the definition and more differences, but these are the main defining characteristics.

One other type of kernel is called a hybrid kernel which lies on the boundary between monolithic kernels and microkernels. This means it has qualities of both, but hybrid kernels cannot be classified as a monolithic kernel or microkernel exclusively.

Userland refers to the user-space applications. For instance, Mastodon Linux (FreeBSD/Linux) uses the Linux kernel, but has FreeBSD applications (userland).

Debian is a system that has many variants. All Debian systems use GNU, but may have different kernels. Most people have at least heard of the GNU/Linux form that has many derivatives (like Ubuntu). Some interesting forms of Debian include GNU/Hurd, GNU/NetBSD, and GNU/kFreeBSD (FreeBSD Kernel). Obviously, people need to be more clear on which Debian system they have when they need help fixing an issue.

NOTE: The Hurd kernel contains the GNU-Mach kernel.

Nextenta OS is an OpenSolaris system (GNU/kOpenSolaris). Once installed, it appears to be GNU/Linux with the GNOME user-interface and typical applications seen on Linux systems. However, Nextenta OS is not Linux.

MkLinux is technically not Linux. This system uses the Mach kernel. Even though the userland is specifically RedHat Linux, this does not make MkLinux a Linux distro. The Mach kernel is an example of a microkernel.

StormOS is another example of an operating system that uses the GNU userland, but is not Linux. StormOS is a GNU/Illumos system.

This information may now answer the popular question - "Is Android a Linux system?". Yes, Android is Linux, but not GNU/Linux. Android uses the Linux kernel and the Dalvik userland, thus making Android - "Dalvik/Linux". Android also uses a modified Linux kernel. Yes, there are different Linux kernels, but they are all still Linux. By the way, MeeGo is also Linux.

Food for thought: GNU = "GNU is Not Unix". What does the "G" stand for in "GNU"?

Now, remember, I said there are different Linux kernels. All Linux kernels have come from the Vanilla kernel directly or indirectly. The Vanilla kernel is the Linux kernel that can be downloaded from Kernel.org. The Vanilla kernel is the mainstream, official kernel that is made and managed by Linus Torvalds.

NOTE: The Vanilla kernel is developed on GitHub.com.

RTLinux is a real-time microkernel from of Vanilla. Yes, not all Linux kernels are monolithic, but most are monolithic. (http://www.rtlinuxfree.com/)

μClinux not only refers to the distro, but also the specialized kernel for very small, weak microcontrollers.

Firefox-OS is a mobile operating system that uses the Gonk kernel which contains various needed drivers (modules) for mobile devices like phones. Gonk also has enhancements for the system's software.

DS-Linux is the kernel used by Nintendo for the Nintendo-DS consoles. Sony also has a special Linux kernel for their Playstation. The kernel is called "Runix" or "PSXLinux".

Linux-libre is a Linux kernel that completely lacks proprietary code and modules. Basically, it is the Vanilla Kernel with code removed. This is perfect for people that want to avoid all proprietary software. Beware, some device will not work with this kernel.

CoLinux is a specialized kernel that was modified to allow Linux to run with Windows at the same time. (http://www.colinux.org/)

Compute-Node-Linux (CNL) is a kernel for the Cray computers. The INK kernel is used in IBM's Blue Gene Supercomputer.

The L4Linux kernel is designed to run on the L4 microkernel.
L4Android is the kernel that Dalvik/Android used for version 2.2-2.3. This is a combination of Google's changes and code with L4Linux. (http://l4android.org/)

NOTE: These kernels are different not because they were configured or compiled differently. Rather, the source code itself was greatly changed.

VServer is a Linux kernel with extra virtualization features (http://linux-vserver.org/Welcome_to_Linux-VServer.org).

Some individual versions of the Linux kernel are given nicknames/codenames, but these are not used in mainstream use. (http://en.wikipedia.org/wiki/List_of_Linux_kernel_names)

For some people, it is important to have a deep understanding of some of these different systems.
 
Android?

Now that we have studied the Linux kernel very well and learned how to make our own, we will move on to a slightly different direction in this series. Many of you may be unaware of this, but Android is Linux. True, they are not quite the same, but Android is Linux. For example, Ubuntu is "GNU/Linux" while Android is "Dalvik/Linux". If an operating system uses the Linux kernel, then it is a Linux system. The userland (GNU and Dalvik) does not determine whether an OS is Linux or not. Android uses a modified Linux kernel. As we know, Android runs on phones. As you may remember from configuring the kernel, there were no drivers for phone devices (like small keypads, 3G/4G cards, SIM cards, etc.). The Linux kernel used in Android lacks drivers that would not be in phones and instead has drivers for phone devices. In other words, no Android system uses a Vanilla Kernel.

NOTE: Some people say "Android/Linux" instead of "Dalvik/Linux". However, both are valid.

Linux kernels for Android can be downloaded or viewed here - (https://android.googlesource.com/?format=HTML). Just like the Vanilla kernel, the Android kernel is open-source software (due to the GPL license on the Vanilla kernel). The generic Android kernel can be seen here - (https://android.googlesource.com/kernel/common.git/ /android-3.10) [for kernel v3.10] which looks very similar to the Vanilla kernel.

android_03-png.879


NOTE: In this article and series, "Android Kernel" will be used interchangeably with "Linux kernel for Android" and other similar phrases.

The ./android/ directory contains a configuration file for making the Android kernel. These files tell the configuration tool which features/modules/drivers to enable or disable (https://android.googlesource.com/kernel/common.git/ /android-3.10/android/configs/android-base.cfg).

android_01-png.877


Some proprietary or special hardware may have a kernel designed specifically for them. For instance, kernels for Samsung phones can be downloaded here (https://android.googlesource.com/kernel/samsung/).

DOWNLOAD: To download an Android kernel, look to the "tgz" hyperlink near "Commit" around the top of the screen. Here is a download link - https://android.googlesource.com/kernel/common.git/ archive/android-3.10.tar.gz

As seen in the ./security/ folder of the kernel's source code, Android supports the same Linux Security Modules (LSMs) that a desktop/server kernel would. By looking at any part of the Android kernel, the Android and Vanilla kernel are nearly the same. For illustration, the same filesystems are seen in Android's filesystem driver directory (./fs/) as in the Vanilla kernel.

To configure the kernel, do as you would for a Vanilla kernel - open a terminal in the source code directory and type "make <PREFERRED CONFIG TOOL>". I will use "make menuconfig" for my screenshots. Here is the most important step, load an alternate configuration file such as "./android/configs/android-base.cfg". Now, you can configure the kernel. However, be careful not to undo a feature/option that is important to Android (like ashmem). Also, you may see in the configuration tool (after loading the proper config file) that Android is an embedded Linux system. Keep this in mind when configuring the kernel. When configuring the Android kernel, it is very important to note that it must be cross-compiled.

android_04-png.880


NOTE: To know which features should not be changed, open the loaded configuration file in a text-editor (such as Gedit, Leafpad, etc.). The listed features should not be manipulated unless you have a specific purpose for doing so.

Once you have finished configuring the system, save the configuration as "./.config". Overwrite the file if needed (make a backup if the original "./.config" is important).

android_02-png.878


Before compiling, the host system must have JDK 6, Python2.7, and various developer's libraries installed on your system. Many developers recommend compiling the kernel on a 64-bit system (doing so on a 32-bit system is possible, but more difficult). Also, remember to set these variables before executing the "make" command -

Code:
export ARCH=arm
export SUBARCH=arm
export CROSS_COMPILE=arm-eabi-

The developer libraries I mentioned can be installed like this (for RedHat-based systems, use "yum" instead of "apt-get") -

Code:
sudo apt-get install git gnupg flex bison gperf build-essential zip curl libc6-dev libncurses5-dev:i386 x11proto-core-dev libx11-dev:i386 libreadline6-dev:i386 libgl1-mesa-glx:i386 libgl1-mesa-dev g++-multilib mingw32 tofrodos python-markdown libxml2-utils xsltproc zlib1g-dev:i386

Also, make sure that the proper cross-compiler is installed on your system (arm-eabi).

Making a symbolic link for this library may help on some systems -

Code:
sudo ln -s /usr/lib/i386-linux-gnu/mesa/libGL.so.1 /usr/lib/i386-linux-gnu/libGL.so

NOTE: Some other libraries or links may be needed. If so, the compiler will give an error that a library or file is missing. Install or link as needed.

android_05-png.881


If the configuration and compiling process went well, the Android kernel is ready for a device. The process of installing an Android kernel is a process that requires more knowledge in Android development and many more steps. So, I will not discuss the Android kernel that far.

Patches can be applied to the Android kernel's source code the same way as the Vanilla kernel, except the patches must be Android patches. The patches on Kernel.org will not work. They may work in some instances, but you must know what you are doing and only apply them to drivers/modules that are still Vanilla (unaltered from the original).

There are a few different kinds of Android kernels as seen on the Android kernel's Git page (https://android.googlesource.com/?format=HTML).

The Goldfish kernel (https://android.googlesource.com/kernel/goldfish/) is for emulated platforms like running a virtual Android system within a host system on a desktop computer.

The MSM kernel (https://android.googlesource.com/kernel/msm/) is for Qualcomm MSM chipsets.

The OMAP kernel is for the TI OMAP chipsets (https://android.googlesource.com/kernel/omap/).

Samsung hummingbird chipsets use the Samsung kernel (https://android.googlesource.com/kernel/samsung/).

The NVIDIA Tegra chipsets run on the Tegra chipset kernel (https://android.googlesource.com/kernel/tegra/).

The Exynos kernel is used by Samsung Exynos chipsets.

These various kernels mainly variety in their driver availability. For example, the Exynos kernel will have drivers for Exynos Samsung devices while the OMAP kernel does not.
 
Last edited:
Intro to System Calls

Many GNU/Linux users have probably heard of systems calls. A system call is a special function/command that a program uses to communicate with the kernel of the operating system. The Linux kernel has a variety of system calls that it recognizes. Learning these system calls helps people to understand how GNU/Linux works. Even general/mainstream Linux users may find it interesting to know just how complicated the system is even though the user cannot see the complexity.

NOTE: Kernel calls is another name for system calls and so is syscall.

There are about six kinds of system calls (depending on how you want to classify them). These six are process control, information maintenance, communication, file management, memory management, and device management. "Information maintenance" is referring to system time, attributes of files and devices, and many other sets of information. "Communication" refers to networking, data transfer, attachment/detachment of remote devices.

When the Linux kernel receives a system call, it executes the command in kernel mode (privileged execution mode). This privileged mode is commonly called ring-0 (pronounced “ring zero”).

NOTE: Some people get interrupts and system calls mixed up. A system call is a command while an interrupt is an event that causes the CPU to stop the current task and tend to the event. Hardware interrupts are called “interrupts” and software interrupts are called “traps” or “exceptions”.

Some of you may be wondering, when an application is programmed, how does it get the code for the standard system calls. Well, the system calls come in the GNU C Library which is also called glibc. This is the library used for applications that run on systems using the Linux kernel and Hurd kernel or any GNU userland. Some derivatives are used in applications running on other kernels. For instance, after some major tweaking, glibc works on the NetBSD, OpenSolaris, FreeBSD kernel. FreeBSD and NetBSD typically use their own libc called "BSD libc". The modified glibc mentioned is used in the Debian system that uses the FreeBSD and NetBSD kernel (Debian GNU/FreeBSD and Debian GNU/NetBSD). Some other glibc derivatives and alternatives include

μClibc - This libc is used in mobile devices using the Linux kernel (except Android).
Bionic - Used in the Android OS. Bionic is based on BSD libc.
dietlibc - This is a lightweight libc for embedded systems.
Embedded glibc (EGLIBC) - The libc used in embedded systems is a tweaked/optimized version of the standard glibc.
klibc (Kernel libc) - The Linux kernel uses klibc while starting up. Once the kernel is loaded, it then uses glibc. However, not all distros use klibc.
Newlib - Used in embedded systems.

These libraries provide various headers for C/C++ programming. The system calls are put in the code by importing a library as seen below. All of the system calls are not in one header, so an application only contains the system calls that it needs (unless there are some extra calls in the imported library that the program does not use).

Code:
#include <HEADER.h>

One reason why applications compiled for one operating system do not work on another is because the application uses different system calls. Wine is a compatibility layer (not an emulator) that allows Windows software to work on GNU/Linux and other Unix and Unix-like systems. This works because the Windows system calls are converted to the system calls that Linux recognizes (there are other mechanisms that make Windows applications work). If all systems used the same system calls, then some applications would be more cross-platform (some or many exceptions would exist). Think about source code. An application can be compiled on Linux, Solaris, and FreeBSD, but the binaries would only work on the operating system type on which the application was compiled.

Winelib is a libc used to compile with source code that only works on Windows systems. However, Winelib makes the compiled program work on Unix and Unix-like systems. Beware, Winelib is not perfect and may not work with some programs. Also, Winelib only works with 32-bit software. Usually, to use Winelib, the make-file for the source code needs some tweaking.
 
System Calls A-E

Once a Linux user learns about the different system calls, then it becomes clear just how complex Linux can be in completing common tasks. Below, many system calls are listed and explained. Notice the double parenthesis after each one. These exist because in computer programming (most languages) functions and commands that are to be executed end in "()" with parameters within the parenthesis. Thus, all of the system calls are functions that are defined and programmed in the kernel's source code.

NOTE: Most of the system calls will be discussed across a few articles, but not all of them will be covered. Some of the obsolete ones will be listed, but most will not be mentioned.

accept() - This system call creates socket connections. The similar system call accept4() supports flags. This syscall supports various protocols such as IPv4/6, Appletalk, IPX, and others (including sockets for communications between processes).

access() - This checks the permissions of a file before the calling process can access the file. This syscall first ensures the specified file exists. If so, then the system call checks if the process/user may read, write, and/or execute the file.

acct() - Process accounting is turned on and off using this system call. Process accounting is record keeping for executed commands. This allows admins to be aware of all commands that were executed, who/what executed them, etc.

add_key() - This adds or updates a key in the kernel's key management facility.

adjtimex() - This system call updates the kernel's clock using an algorithm by David L. Mills'. This system call can also get various information like the amount of microseconds between ticks, current time, offset, precision, etc.

alarm() - An alarm is set which will send a signal to a process.

alloc_hugepages() - An old system call (no longer used) that allocated and freed huge pages (large chunks of memory).

bind() - Newly created sockets get an address (sometimes called a name) from this system call. When connecting to a server, the client uses bind() on its side of the connection (the initiated side) and the server will use connect() on its side of the socket.

brk() - Memory can be given to or taken from running processes using this system call. A "program-break" is the last part of a processes memory on RAM. More memory can be given to the process by allocating more memory at the program-break, and deallocated the program's memory removes memory at the program-break.

cacheflush() - Flush the data cache in the specified address.

capget() - Get the capabilities of threads.

capset() - Set the capabilities of threads.

NOTE: A thread's capabilities refers to its attributes and permissions such as permissions to access particular network ports, execute Root programs, etc. A complete list of the capabilities may be found here /usr/include/linux/capabilities.h

chdir() - Yes, this command users regularly use to change the current directory is a system call.

chmod() - Surprise! Another commonly used command is a system call. This one changes file permissions.

chown() - At this point, you may not be shocked; another system call that changes file ownership.

chroot() - This popular shell command is also a system call. This changes the root directory.

clock_getres() - This retrieves the clock's resolution. Resolution is another term for precision. This syscall only works on POSIX clocks.

clock_nanosleep() - This system call is like the commonly used "sleep" command, but this system call pauses threads at the nanosecond level.

clone() - Like fork(), the process is forked, but not with the same results as fork(). There are several differences between fork() and clone(), but clone() makes a child process that uses the parent process's memory space while fork() gives the child process its own memory space.

close() - After a program is done writing or reading a file, the file should be closed to release memory and a file descriptor for reuse. The close() system call performs the closing of the file. Some documentation may say close() closes a file descriptor. This is also true. Closing a file descriptor just means a file descriptor is freed.

connect() - Create a connection to a socket.

creat() - This system call creates a file. No, this is not a typo. The system call really is called "creat()" without the second "e".

delete_module() - Kernel modules are unloaded by this system call. If the specified module is being used or is needed by other modules or the kernel itself, then the syscall will leave the module alone.

dup() - File descriptors can be duplicated with this system call. File descriptors would need to be duplicated when a thread viewing a file forks or when a command's output is redirected.

epoll_create() - Create a new file descriptor for a new instance of epoll.

epoll_ctl() - This system call is used to perform various tasks on an epoll file descriptor.

epoll_wait() - This system call waits until an event is performed on a specified epoll file descriptor. This syscall is important when software should only perform some action after a particular event happens to an epoll file descriptor.

eventfd() - The Event-File-Descriptor system call creates a file descriptor that is used to notify software about events or to make some software wait on some event.

NOTE: An abbreviation for file descriptor is “fd”. So, system calls that end in “fd” may relate to file descriptors.

execve() - Have you ever wondered which system call (if any) causes executable to run? Well, this system call is the one that does so. execve() can also execute scripts that begin with a valid hashpling. If needed, this syscall will call the Linux Dynamic Linker (ld.so) to set and link required libraries to the executable.

_exit() - Processes/threads use this system call to close themselves. Yes, there is an underscore at the beginning of this syscall. If a thread calls _exit(), then only that thread closes.

FUN FACT: Programs close in one of three ways – kill signal, fatal error, or calling _exit(). Notice that only one out of three is a graceful way to close. In other words, programs either willing close (_exit()), crash (fatal error), or they are murdered (kill signal). Wow, software has a harsh life (^u^).

exit_group() - All of a process's threads and associated threads are closed with this syscall. This is a special form of _exit().
 
System Calls F-G

faccessat() - The permissions for the specified file is checked, but this is performed using a directory file descriptor.

NOTE: In the most simplest terms, a file descriptor is a special number used to access a file.

posix_fadvise() - (commonly called fadvice(), although the actual call is posix_fadvise()) This system call is used to optimize data access. Specifically, this syscall plans ahead what file will be accessed and how to get the data. This speeds up data access for the kernel.

fallocate() - Disk space of a specified file is manipulated by this kernel call. Obviously, since every filesystem type (XFS, EXT4, NTFS, tmpfs, etc.) is different, this syscall does not work on all filesystems. fallocate() also works on some pseudo/virtual filesystems like tmpfs.

fchmod() - This syscall is the same thing as chmod(). The difference lies in the fact chmod() accepts a path name and fchmod() accepts a file descriptor instead.

fchmodat() - This system call changes a file's permissions and the file is specified using a file descriptor. fchmodat() is exactly like chmod() with the difference being their accepted input, a file descriptor and path, respectively. fchmod() and fchmodat() work a little differently from each other.

fchown() - Just like chown(), the owner of the specified file is changed. However, fchown() knows the file by its file descriptor, not its path.

fchownat() - This syscall is just like fchown().

fcntl() - Manipulate the specified file descriptor.

fgetxattr() - This syscall gets the value of the specified extended file attribute.

finit_module() - Using a file descriptor, an ELF-image is loaded into the kernel space.

flistxattr() - Given a file descriptor, this syscall will list the extended attributes owned by the specified file.

flock() - Create or remove an advisory lock on the specified file (the file must be open).

fork() - Child processes are commonly created using this kernel call. With this call, the child process gets its own PID and memory space. Many other attributes are not inherited.

free_hugepages() - Free huge pages (large chunks of memory).

fremovexattr() - When using a file descriptor, this syscall can remove an extended attribute.

fsetxattr() - With a known file descriptor, an extended attribute can be set.

fstat() - The status of a file can be read with this syscall when given a file descriptor.

fstatat() - With a directory file descriptor, a file's status can be read.

fstatfs() - The statistics of a filesystem can be retrieved with this kernel call. This system call is directed to the filesystem in question by using a file descriptor of any given file on that filesystem.

ftruncate() - With a given file descriptor, this kernel call will truncate the specified file to a desired length. This may mean cutting the file, thus losing data, or enlarging the file by adding null bytes. Null bytes are designated as a backslash zero (\0).

futex() - Futex stands for Fast User-space muTEX. With this syscall, threads and processes can adhere to the futex standard so the executing code can wait for shared resources. Any code that calls futex() must be written in non-portable assembly instructions.

get_kernel_syms() - All of the exported module and kernel symbols can be read with this system call.

NOTE: A symbol is a function or variable.

get_mempolicy() - To get the NUMA memory policy for a process, use this syscall.

get_robust_list() - The robust-futex list can be retrieved with this kernel call.

getcpu() - This syscall allows the calling thread to be found on a specific CPU and NUMA node. This is like the thread yelling "I am over here!".

getcwd(), getwd(), and get_current_dir_name() - GET Current Working Directory. These three syscalls give the same result, but they each function a little differently.

getdents() - This system call gets directory entries.

getgid() - The syscall retrieves the real GID of the calling process.

getegid() - The syscall retrieves the effective GID of the calling process.

FUN FACT: getgid() and getegid() are claimed to NEVER fail according to the man pages. Do we agree with that, or has someone found an exception?

getitimer() - This syscall gets the current value of one of the timers of a process. Each process has three timers - ITIMER_REAL (decrements in real time), ITIMER_VIRTUAL (decrements during execution), and ITIMER_PROF (decrements while either the process or system executes). These three timers are called "interval timers".

getpeername() - The name of a connected peer socket can be retrieved using this syscall.

getpagesize() - The size of a regular page in memory can be known with this kernel call.

NOTE: In simplest terms, a page in memory is analogous to a block on a magnetic hard-drive.

getpgid() - This syscall gets the PGID of the specified process by using its PID.

getpid() - The PID of the calling process is returned.

getppid() - The PID of the calling process's parent is returned.

getpriority() - The scheduling priority of the specified program is returned by this syscall.

getresuid() - The RUID, EUID, and the SUID of the calling process is returned.

getresgid()- The RGID, EGID, and the SGID of the calling process is returned.

NOTE: GID = Group ID. UID = User ID. R = Real. E = Effective. S = Set.

getrlimit() - This kernel call returns the resource limit of a process.
 
System Calls G-M

getrusage() - The amount of resources used by the specified process is given by this syscall.

getsid() - This returns the session ID. "sid" stands for Session ID.

getsockname() - The address (name) of the socket is returned by this kernel call.

getsockopt() - The options for a specified socket is listed by getsockopt().

gettid() - The TID of a thread can be seen with this syscall. TID stands for thread identification.

gettimeofday() - The current time and timezone can be seen with this call.

getxattr() - With a given inode, this system call retrieves the extended attributes associated with the inode.

NOTE: Extended attributes are attributes not normally supported by the filesystem.

init_module() - Kernel modules are loaded with this syscall. This system call loads the module into the kernel space and then performs other needed tasks to prepare the module for runtime.

inotify_add_watch() - Given a file-path, this syscall creates or modifies an inotify watch.

inotify_init() - After an inotify watch is created, it must be started via inotify_init().

inotify_rm_watch() - When an inotify watch is no longer needed, this syscall removes the watch specified by a watch descriptor (wd). When watches are made with inotify_init(), the wd is given.

io_cancel() - Asynchronous IO tasks are canceled with this syscall.

io_destroy() - Instead of canceling asynchronous IO tasks, they can be destroyed, meaning all asynchronous IO tasks associated with the given identifier will be canceled.

io_getevents() - The asynchronous IO events listed in the completion queue can be seen with this syscall.

io_setup() - Asynchronous IO contexts are made using io_setup().

io_submit() - To queue asynchronous IO blocks, use this kernel call.

ioctl() - With a file descriptor for a device-file, the device's parameters can be changed.

ioperm() - The IO permission of ports are set with this call.

iopl() - This kernel call changes the IO privilege level of the process that executed this call.

ioprio_get() - The priority and IO scheduling class of threads can be seen with this call. This call can return this information for one or multiple threads.

ioprio_set() - The priority and IO scheduling class of a thread can be set with this call.

ipc() - To execute System V IPC system calls, the Linux kernel uses ipc() to start a System V IPC call. Not all architectures support ipc(). For instance, this call is not seen in ARM systems. Obviously, when making cross-platform software, do not use ipc().

kcmp() - With two PIDs, this system call can identify what kernel resources (if any) are shared between two processes.

kexec_load() - This syscall sets up a kernel to be executed after the next reboot. This is useful for running a diagnostic kernel after a system crash.

keyctl() - Changes to the key management facility of the kernel is made via keyctl().

kill() - A kill signal is sent to the specified process. As you may have noticed, many system calls require a pid, file descriptor, or some other low-level (closer to the hardware/inner-workings) identification to know on what to do work. Since kill is like other syscalls, that is why it uses a pid. Yes, in a command-line users are using a system call (kill) to perform a low-level operation on the system.

lgetxattr() - This is just like getxattr() except this system call is used on links to get the extended attributes of the link itself.

link() - Hard links are created using link().

listen() - This is the system call that listens for connections on a socket.

lookup_dcookie() - Using a cookie, the full path of a directory entry can be seen. A cookie is a directory entry identifier.

lremovexattr() - The extended attributes of a symbolic link can be removed. The extended attributes are untouched in the file to which the link points.

lseek() - A file's offset is changed. This syscall identifies the file based on file descriptor (fd).

lsetxattr() - This syscall sets extended attributes on links.

lstat() - This call provides information about the specified link.

madvise() - This system call is used by applications to give the kernel advice on how the application wishes to use memory. The kernel typically maps out memory as it is needed and as the kernel sees fit. The madvise() syscall comes form an application that can probably run more efficiently if its memory usage is managed in a particular way. Notice that the syscall is advice, meaning the kernel may disregard the application's request.

mbind() - This syscall sets the NUMA memory policy.

migrate_pages() - All of the memory pages belonging to the specified node will be moved to another node in memory.

mincore() - This kernel call checks to ensure the needed pages of memory (a page of memory is like a block of data on a hard-drive) exist where they are expected. If memory is accessed but is missing, a page fault will result, thus causing some severe errors. Thanks to this syscall, such errors can be prevented when the syscall is used.

mkdir() - This commonly known shell command is actually a system call that creates a directory. (You probably already knew that)

mknod() - This syscall can make a file, device file, or a named pipe.

mlock() - A specified portion of a processes virtual address space can be set to remain on the RAM and not go to the swap area. mlockall() is used to lock all of the virtual address space and munlock() and munlockall() can undo those syscalls.

mmap() - This maps memory for processes. munmap() unmaps the memory.

mount() - Here is another shell command that is actually a syscall. As you all may know, this mounts filesystems whether virtual (pseudo filesystems like tmpfs), real (ext4, fat32), network filesystems (NFS), files (iso files like DVD images).

move_pages() - This is another syscall that moves pages, but this syscall moves the memory page-by-page rather than in bulk or whole nodes.

mprotect() - This syscall changes the protection of the calling process's memory. In memory, "protection" is analogous (just like/equal) to permissions of files on hard-drives.

mq_notify() - Processes can "subscribe" to certain system messages (notifications). This system call allows processes to do so to specific types of notifications in the message queue.

mq_open() - POSIX message queues can be made or opened using this kernel call.

mq_send() - Messages are sent to the message queue using this system call.

mq_unlink() - Message queues can be deleted using this syscall.
 
System Calls M-R

mremap() - Memory REMAP is a syscall that remaps a virtual memory address. This means the kernel call gets a section of data and changes the size and location of that data's allocated area in memory.

msync() - As many people may know, when a file is edited (for example, a plain text file), the file is loaded to memory and changes take place there. To save the changes to the hard-drive, msync() synchronizes the file on RAM with the older file on the hard-drive.

nanosleep() - Like the sleep command commonly used in shell scripts, this command suspends execution on that thread. However, this command works on the nano-scale level.

nfsservctl() - This is the interface for the NFS daemon.

nice() - That commonly used and known command "nice" is another syscall.

uname() - Here is yet another syscall that is sometimes used by the user in a command-line or script.

open() - This syscall opens files.

NOTE: Sometimes, the calling process is referred to as a local process and the other processes are remote. For instance, if both Firefox and Thunderbird are running on the same machine, Firefox refers to Thunderbird as a remote process as does Thunderbird to Firefox. Each process views themselves as local.

pause() - This kernel call makes the calling process pause until one of two events take place. These two events are the death of the process (like a kill signal) or receiving a signal.

pciconfig_iobase() - This call is used to get information about IO regions on memory.

perf_event_open() - The system's performance is monitored when this syscall is executed.

personality() - This syscall creates the process's execution domain. In computing, a personality is the way an executable behaves. This refers to the different system calls and application binary interfaces (ABI).

perfmonctl() - This kernel call is the interface for the performance monitoring unit (PMU) of IA-64 CPUs.

pipe() - This kernel call makes a pipe (|) which is a form of interprocess communication. This sends data from one process to another, and data does not go to the sender.

pivot_root() - The root filesystem can be changed using this system call. pivot_root() is commonly used to change the root from initrd.

poll() - This syscall watches file descriptors for ones that are ready for IO operations.

pread() - With a given offset, pread() reads a file descriptor (fd).

pwrite() - With a given offset, pwrite() writes to a file descriptor (fd).

preadv() - This system call can read a file descriptor and fill many buffers. preadv() is like pread() and readv() combined.

prlimit() - This is getrlimit() and setrlimit() combined into one syscall, so this one call gets and sets a process's resource limits.

process_vm_readv() - This kernel call gets data from a specified process (by pid) and gives it to the calling process.

process_vm_writev() - The calling process uses this syscall to send data to a remote process.

pselect() - This is like poll(), watching for many file descriptors for one to be free for IO operations.

ptrace() - A process can control and monitor another process (if permissions permit). The calling process is called the tracer and the process being monitored is called the tracee.

pwritev() - This syscall has both the features of writev() and pwrite().

query_module() - The information about a module can be received with this syscall.

quotactl() - Disk quotas are managed with this syscall.

read() - This system call gets data byte-by-byte from the specified file descriptor and places them in the buffer.

readahead() - Files are placed in the page cache by this syscall.

readlink() - Gets the full real pathname of the file the link points towards.

reboot() - Obviously, this syscall reboots the system. When CAD is used (Ctrl+Alt+Del), this kernel call is executed.
 
System Calls R-S

There are still many more kernel calls, as you will see. Each one is important to the functioning of the kernel and system as a whole. Some of these syscalls are defined by the POSIX standard and are used by other systems (like *BSD). Some systems use a system call that has the same name and performs the same function. However, such calls may use different code even between architectures of the same operating system.

recv(), recvfrom(), recvmsg() - These three syscalls are nearly the same. They all receive messages from connected sockets, but these calls do so in a different way.

recvmmsg() - Like the three calls mentioned previously, recvmmsg() gets messages from sockets. However, this syscall can receive multiple messages at once, while the other calls get one at a time. The code used to make recvmmsg() came from recvmsg(). (Notice the number of "m"s)

remap_file_pages() - This system call creates a new mapping on memory. Specifically, remap_file_pages() sets up a nonlinear mapping, meaning the pages are not placed in order on memory.

removexattr() - The extended attributes of files are removed with this syscall. The needed parameters include the path of the file and the name of the attribute.

rename() - This syscall renames a file.

request_key() - Keys can be retrieved from the kernel's key-ring by using this system call.

restart_syscall() - Sometimes, syscalls are temporarily paused by a stop signal (typically SIGSTOP). To resume such syscalls, use restart_syscall().

rmdir() - Empty directories can be deleted with rmdir().

rt_sigqueueinfo(), rt_tgsigqueueinfo() - A signal and data are sent to the specified process using one of these system calls. Both of these calls are the same, but they differ in the accepted parameters. rt_sigqueueinfo() needs to know the tgid (Thread Group ID) while rt_tgsigqueueinfo() needs to know both the tgid and tid (Thread ID).

sigaction() - Signals sent to processes may need to be modified. This kernel call allows the calling process to change the desired result of a signal sent to a process.

sigpending() - This syscall allows the calling process to view pending signals.

sigprocmask() - This kernel call allows the calling process to view its masked signals.

NOTE: Masked signals are signals that are blocked.

sigsuspend() - This syscall is used to pause a process.

sched_get_priority_max(), sched_get_priority_min() - Every scheduling policy (or scheduling algorithm) has a set priority range. These two syscalls return the maximum and minimum priority numbers (respectively) accepted by a policy.

sched_setaffinity(), sched_getaffinity() - The CPU affinity (CPU pinning) of a thread can be set or viewed with these syscalls, respectively. CPU affinity assigns a thread or process to a processor. For instance, on systems with multiple processors, processes and threads may not be processed by many CPU chips at once. Instead, code may stay with one CPU.

sched_setparam(), sched_getparam() - These syscalls allow the parameters of a schedule to be set and viewed for a process specified by its PID.

sched_setscheduler(), sched_getscheduler() - With a given PID, a processes scheduling policy (algorithm) can be set or viewed.

sched_yield() - The calling process will be placed at the end of the processor's task queue.

select() - This is another syscall used to monitor multiple file descriptors so that an IO task can be performed on the next available descriptor.

send(), sendto(), and sendmsg() - These syscalls perform nearly the same task, but they each have a slightly different method of functioning. send() is the same as write() except that the system call accepts flags while write() cannot. These send syscalls all use sockets, but different arguments.

sendfile() - This syscall copies data from one file descriptor to another. This is a faster way to copy files since this action is performed within the kernel. Most tasks completed in the kernel space complete faster than they do in the userspace.

FUN FACT: Are you wondering how many system calls are being made on your system right now? To figure out how many system calls are made per second system wide (on all processors), use the vmstat command and look at the “sy” column. For my system at the time of executing vmstat, I had five system calls running.

vmstat-png.1078


sendmmsg() - More than one message can be sent down a socket using this kernel call. Most message-sending syscalls can only send one message down a socket at a time.

set_mempolicy() - This syscall is used by the calling process to change their NUMA-memory policy.

set_thread_area() - This syscall writes an entry on the local storage array of a thread (TLS = Thread Local Storage).

set_tid_address() - A pointer is created by this kernel call that points to the specified TID (Thread ID).

setdomainname() - This syscall sets the domain name and saves it as an array with each character in their own field. This value can be retrieved with getdomainname().

setfsgid() - This kernel call changes the FileSystem Group ID (FSGID) which is a GID used when accessing network filesystems.

setfsuid() - setfsuid() is a lot like setfsgid() except that setfsuid() changes the User ID (UID).

setgid() - The calling process is given a Group ID.

setgroups() - The supplementary Group IDs are set for the calling process.

sethostname() - The hostname is set in the form of an array with one character per field.

setns() - A thread can be given a namespace by using this syscall.

setpgid() - The GID of a process is set with this kernel call.

setpriority() - The schedule priority of a process is set.

setreuid(), setregid() - These syscalls set the real and effective User or Group IDs.

setresuid(), setresgid() - These two syscalls are like their equivalent kernels calls above, but with the additional ability to set the Saved-User-ID (SUID) or Saved-Group-ID (SGID).

setrlimit() - A resource limit is set with this kernel call.
 
Last edited:
System Calls S-T

setsid() - The process group ID and session ID of the calling process are set to the PID of the calling process.

setsockopt() - Options for a specified socket are set using this syscall.

settimeofday() - The timezone and time are set via settimeofday().

setuid() - The User ID (UID) is set with this call.

setup() - This deprecated syscall was once used to prepare devices and filesystems on the system and mount the root filesystem.

setxattr() - Extended attributes are set using this kernel call.

shutdown() - Many of you may think this system call shuts down the system. Well, guess what? It does not. Rather, this system call closes a socket or at least part of the socket.

NOTE: reboot() is the system call that reboots or powers off the system.

sigaction() - Signals can be viewed and changed with this syscall.

signalfd() - Signals can be accepted by a process via a file descriptor. However, such a file descriptor must be created first using this syscall.

sigpending() - The calling thread can view pending signals coming to it using this syscall.

sigprocmask() - A process's blocked signals can be changed via sigprocmask().

socket() - The commonly discussed sockets are created with this syscall. A socket is a named pipe with additional abilities. The sockets commonly discussed in these syscall articles are sometimes called Unix Domain Sockets.

socketpair() - A pair of sockets are created using this system call.

splice() - Sockets can be spliced either for input or output.

stat() - This syscall returns the "status" of a file. The "status" is information such as the number of blocks owned by the file, the IDs of the owning group and owner, the storage device's ID, file size, number of hardlinks, and a few other pieces of information.

statfs() - Use this kernel call to get the "status" of a filesystem. Such status includes the number of free and total blocks, filesystem type, and other information pertaining to the filesystem itself.

stime() - This syscall sets the time in seconds since January 1st, 1970 (epoch).

subpage_prot() - Pages can be divided into subpages on memory. This system call allows permissions to be set to specific subpages, but only on PowerPC processors.

NOTE: Remember that blocks are to hard-drives as pages are to memory.

swapon() and swapoff() - These kernel calls turn the swap area on and off, respectively. Turning swap off may be done when the admin is changing its size on a live/active system or for various other reasons.

symlink() - Shortcuts (or soft-links) are made using this system call.

sync() - When files are changed, the edits are held in memory. The changes are written to the hard-drive after the sync() call is executed.

syncfs() - Like sync(), syncfs() causes changes to be written to the storage unit. However, only one file is changed while sync() tells all modifications to be written.

sync_file_range() - Like syncfs(), not all synchronizations are executed. However, only a portion of a single file is synchronized with the memory.

NOTE: When you make changes to a file (like opening a text file in Gedit), the changes are held in the buffer cache on memory.

sysfs() - Information about the current/present filesystems can be viewed with this syscall.

sysinfo() - An overview of the systems information (system statistics or sys stats) can be viewed with sysinfo(). Specifically, this data contains various memory space info, time since booting, buffer data, and some other helpful information.

syslog() - Kernel messages are viewed by syslog() which is also the system call that gives syslogd (a daemon) the data it places into logs.

tee() - The "tee" command used in shells is a system call that splits a pipe into two data pipes.

tgkill() - Many users are aware that kill() kills a process, but few people know that individual threads can be killed. Such a task can be done with the syscall tgkill().

time() - The current system time will be returned by the syscall in the form of seconds since January 1st, 1970.

NOTE: January 1st, 1970 is commonly referred to as "The Epoch". Now, when you read about some software (like a syscall) returning time in seconds since the Epoch, you know that that is what it means.

timer_create() - This syscall creates a per-process timer, which is a timer for each process.

timer_delete() - Timers can be deleted via timer_delete().

NOTE: Timers are identified by "timerid".

timer_getoverrun() - This syscall is used for expiration notices (like stating a backup is three days late). This call calculates the time interval between the time the timer was due up to the time timer_getoverrun() is called. This time range is the overrun.

timer_gettime() and timer_settime() - These syscalls allow the time on timers to be set or viewed.


We are almost done discussing system calls. As you have probably noticed, system calls are very important in managing some common tasks (like time and system logs). Other system calls (like tee()) are used like commands in a terminal. Then, there are the system calls that only the kernel can use (like setup()). All of these kernel call work together to give the user a functioning system.
 

Members online


Top