Linux Address Space

6 min readMar 2, 2018

Linux processes interact with virtual memory and not the physical memory. Every process has a notion that it is the only process running in the system and hence, has unlimited access to the memory present in the system.

Various processes may have the same virtual memory address space but it doesn’t collide because the kernel takes care of the virtual memory to physical memory mapping. An example when a process may have to share it’s virtual memory is when it spawns threads, or threads of execution.

The process doesn’t have permission to access certain parts of the address space which is reserved by the kernel. A process can access a memory address only if it is in the valid area. Memory addresses can have associated permissions that a process must respect. If this is not respected by the process, then the kernel throws a Segmentation Fault message and kills the process.

Memory areas may have the following content:

Executable file’s code, which is known as the text section
Executable file’s initialized global variables, which is known as the data section
Uninitialized variables called the bss (block started by symbol) section
Stack
Heap

Memory Descriptor:

In the linux kernel code, the processes’ address space can be defined in the following data structure.

struct mm_struct {
        struct vm_area_struct  *mmap;               /* list of memory areas */
        struct rb_root         mm_rb;               /* red-black tree of VMAs */
        struct vm_area_struct  *mmap_cache;         /* last used memory area */
        unsigned long          free_area_cache;     /* 1st address space hole */
        pgd_t                  *pgd;                /* page global directory */
        atomic_t               mm_users;            /* address space users */
        atomic_t               mm_count;            /* primary usage counter */
        int                    map_count;           /* number of memory areas */
        struct rw_semaphore    mmap_sem;            /* memory area semaphore */
        spinlock_t             page_table_lock;     /* page table lock */
        struct list_head       mmlist;              /* list of all mm_structs */
        unsigned long          start_code;          /* start address of code */
        unsigned long          end_code;            /* final address of code */
        unsigned long          start_data;          /* start address of data */
        unsigned long          end_data;            /* final address of data */
        unsigned long          start_brk;           /* start address of heap */
        unsigned long          brk;                 /* final address of heap */
        unsigned long          start_stack;         /* start address of stack */
        unsigned long          arg_start;           /* start of arguments */
        unsigned long          arg_end;             /* end of arguments */
        unsigned long          env_start;           /* start of environment */
        unsigned long          env_end;             /* end of environment */
        unsigned long          rss;                 /* pages allocated */
        unsigned long          total_vm;            /* total number of pages */
        unsigned long          locked_vm;           /* number of locked pages */
        unsigned long          def_flags;           /* default access flags */
        unsigned long          cpu_vm_mask;         /* lazy TLB switch mask */
        unsigned long          swap_address;        /* last scanned address */
        unsigned               dumpable:1;          /* can this mm core dump? */
        int                    used_hugetlb;        /* used hugetlb pages? */
        mm_context_t           context;             /* arch-specific data */
        int                    core_waiters;        /* thread core dump waiters */
        struct completion      *core_startup_done;  /* core start completion */
        struct completion      core_done;           /* core end completion */
        rwlock_t               ioctx_list_lock;     /* AIO I/O list lock */
        struct kioctx          *ioctx_list;         /* AIO I/O list */
        struct kioctx          default_kioctx;      /* AIO default I/O context */
};

The number of processes/threads using the same address space can be checked via the mm_users variable. The mmap and mm_rb point to the memory addresses in the address space. Both the variables point to the same information but in different representations. mmap is a linked list whereas mm_rb is a red black tree. This is done so that the mmap can be used for simple traversal need and the mm_rb can be used for searching purposes.

The kernel represents the process address space via the memory descriptor. The memory descriptor of the process is pointed to via the mm field in the task_struct structure.

struct task_struct {  volatile long        state;          /* -1 unrunnable, 0 runnable, >0 stopped */
  long                 counter;
  long                 priority;
  unsigned             long signal;
  unsigned             long blocked;   /* bitmap of masked signals */
  unsigned             long flags;     /* per process flags, defined below */
  int errno;
  long                 debugreg[8];    /* Hardware debugging registers */
  struct exec_domain   *exec_domain;  struct linux_binfmt  *binfmt;
  struct task_struct   *next_task, *prev_task;
  struct task_struct   *next_run,  *prev_run;
  unsigned long        saved_kernel_stack;
  unsigned long        kernel_stack_page;
  int                  exit_code, exit_signal;  unsigned long        personality;
  int                  dumpable:1;
  int                  did_exec:1;
  int                  pid;
  int                  pgrp;
  int                  tty_old_pgrp;
  int                  session;
  /* boolean value for session group leader */
  int                  leader;
  int                  groups[NGROUPS];  struct task_struct   *p_opptr, *p_pptr, *p_cptr, 
                       *p_ysptr, *p_osptr;
  struct wait_queue    *wait_chldexit;  
  unsigned short       uid,euid,suid,fsuid;
  unsigned short       gid,egid,sgid,fsgid;
  unsigned long        timeout, policy, rt_priority;
  unsigned long        it_real_value, it_prof_value, it_virt_value;
  unsigned long        it_real_incr, it_prof_incr, it_virt_incr;
  struct timer_list    real_timer;
  long                 utime, stime, cutime, cstime, start_time;  unsigned long        min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
  int swappable:1;
  unsigned long        swap_address;
  unsigned long        old_maj_flt;    /* old value of maj_flt */
  unsigned long        dec_flt;        /* page fault count of the last time */
  unsigned long        swap_cnt;       /* number of pages to swap on next pass */  struct rlimit        rlim[RLIM_NLIMITS];
  unsigned short       used_math;
  char                 comm[16];  int                  link_count;
  struct tty_struct    *tty;
  struct sem_undo      *semundo;
  struct sem_queue     *semsleeping;
  struct desc_struct *ldt;
  struct thread_struct tss;
  struct fs_struct     *fs;
  struct files_struct  *files;
  struct mm_struct     *mm;
  struct signal_struct *sig;
#ifdef __SMP__
  int                  processor;
  int                  last_processor;
  int                  lock_depth;     
#endif   
};

The current->mm points to the memory descriptor of the process. The copy_mm() is used to copy the parent’s memory descriptor to the child during fork(). Each process receives a unique mm_struct, hence a unique address space. In some cases where the address space is shared by multiple processes, they are known as threads and are done by calling the clone()with CLONE_VM flag set. This is why threads are just another process according to the linux kernel who happen to share the address space i.e some of its resources with another process.

When the process exits, it calls the exit_mm() function which in turn calls free_mm() if the reference count of the process is 0 and does some housekeeping and statistics update.

Virtual memory areas

The memory areas are represented in the kernel code via the vm_area_struct, which are also called virtual memory areas.

struct vm_area_struct {
        struct mm_struct             *vm_mm;        /* associated mm_struct */
        unsigned long                vm_start;      /* VMA start, inclusive */
        unsigned long                vm_end;        /* VMA end , exclusive */
        struct vm_area_struct        *vm_next;      /* list of VMA's */
        pgprot_t                     vm_page_prot;  /* access permissions */
        unsigned long                vm_flags;      /* flags */
        struct rb_node               vm_rb;         /* VMA's node in the tree */
        union {         /* links to address_space->i_mmap or i_mmap_nonlinear */
                struct {
                        struct list_head        list;
                        void                    *parent;
                        struct vm_area_struct   *head;
                } vm_set;
                struct prio_tree_node prio_tree_node;
        } shared;
        struct list_head             anon_vma_node;     /* anon_vma entry */
        struct anon_vma              *anon_vma;         /* anonymous VMA object */
        struct vm_operations_struct  *vm_ops;           /* associated ops */
        unsigned long                vm_pgoff;          /* offset within file */
        struct file                  *vm_file;          /* mapped file, if any */
        void                         *vm_private_data;  /* private data */
};

It describes a single memory area over a contiguous interval. Each memory area has certain associated permissions and flags which help to denote the type of memory area — for example, memory-mapped areas or the processes’s user-space stack.

The vm_mm struct points to the corresponding mm_struct that it belongs to which confirms the uniqueness of the address space of a process.

Although the applications operate on the virtual memory address space, the processors operate on the physical memory. Therefore, whenever an application accesses a virtual memory address, it is first converted to the physical memory, i.e where the data actually resides. This lookup is done via page tables. Virtual memory is divided up into chunks and the index is stored. The index can point to another table or to the physical page.

Linux, by default, maintains 3 levels of page tables to further optimize the page lookup. Even on systems which have no hardware support, it still optimizes the 3 level page table as it is necessary to have indexed page tables for faster lookups.

The top page table is known as the Page Global Directory (PGD) which contains an array of unsigned long entries. The entry in the PGD point to the PMD.

The second page table is known as the Page Middle Directory (PMD) which further points to the PTE.

The Page Table Entries (PTE) point to the actual physical pages.

Every process has its own page tables and is pointed to the PGD via the pgd data structure in the memory descriptor.

Even after maintain 3 levels of page tables, the lookup can only be so fast as it is vast searchable area. In order to further improve upon this, most processors implement a Translation Lookaside Buffer (TLB) which acts as a hardware cache between virtual to physical mappings. Therefore, if the cache is hit, it returns directly from the TLB or it further processes the virtual to physical memory mapping.

Most of the data in the article is inspired by Linux Kernel Development book by Robert Love. This is a must read for anybody who wishes to actually understand the underneath workings of the linux kernel.

Linux Address Space

Written by Gaurav Sarma