commit 30a97c1e2dc39f45d9deeeccc2733278fc285d5e
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Aug 15 17:42:11 2018 +0200

    Linux 4.4.148

commit 8f2adf3d2118cc0822b83a7bb43475f9149a1d26
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Sat Jul 14 21:56:13 2018 +0200

    x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures
    
    commit 6c26fcd2abfe0a56bbd95271fce02df2896cfd24 upstream.
    
    pfn_modify_allowed() and arch_has_pfn_modify_check() are outside of the
    !__ASSEMBLY__ section in include/asm-generic/pgtable.h, which confuses
    assembler on archs that don't have __HAVE_ARCH_PFN_MODIFY_ALLOWED (e.g.
    ia64) and breaks build:
    
        include/asm-generic/pgtable.h: Assembler messages:
        include/asm-generic/pgtable.h:538: Error: Unknown opcode `static inline bool pfn_modify_allowed(unsigned long pfn,pgprot_t prot)'
        include/asm-generic/pgtable.h:540: Error: Unknown opcode `return true'
        include/asm-generic/pgtable.h:543: Error: Unknown opcode `static inline bool arch_has_pfn_modify_check(void)'
        include/asm-generic/pgtable.h:545: Error: Unknown opcode `return false'
        arch/ia64/kernel/entry.S:69: Error: `mov' does not fit into bundle
    
    Move those two static inlines into the !__ASSEMBLY__ section so that they
    don't confuse the asm build pass.
    
    Fixes: 42e4089c7890 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    [groeck: Context changes]
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

commit 4b90ff885c6cc88795b678414aaf5d7b0153a5dc
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue Aug 14 20:50:47 2018 +0200

    x86/init: fix build with CONFIG_SWAP=n
    
    commit 792adb90fa724ce07c0171cbc96b9215af4b1045 upstream.
    
    The introduction of generic_max_swapfile_size and arch-specific versions has
    broken linking on x86 with CONFIG_SWAP=n due to undefined reference to
    'generic_max_swapfile_size'. Fix it by compiling the x86-specific
    max_swapfile_size() only with CONFIG_SWAP=y.
    
    Reported-by: Tomas Pruzina <pruzinat@gmail.com>
    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit eb993211b9d7856a9ab8c487c701c84103842713
Author: Guenter Roeck <linux@roeck-us.net>
Date:   Mon Aug 13 10:15:16 2018 -0700

    x86/speculation/l1tf: Fix up CPU feature flags
    
    In linux-4.4.y, the definition of X86_FEATURE_RETPOLINE and
    X86_FEATURE_RETPOLINE_AMD is different from the upstream
    definition. Result is an overlap with the newly introduced
    X86_FEATURE_L1TF_PTEINV. Update RETPOLINE definitions to match
    upstream definitions to improve alignment with upstream code.
    
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6b06f36f07e2c91ad0126f17d0fc8f933c827da8
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:38 2018 -0700

    x86/mm/kmmio: Make the tracer robust against L1TF
    
    commit 1063711b57393c1999248cccb57bebfaf16739e7 upstream
    
    The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
    without inverting the address bits, which makes the PTE entry vulnerable
    for L1TF.
    
    Make it use the right low level macros to actually invert the address bits
    to protect against L1TF.
    
    In principle this could be avoided because MMIO tracing is not likely to be
    enabled on production machines, but the fix is straigt forward and for
    consistency sake it's better to get rid of the open coded PTE manipulation.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 02ff2769edbce2261e981effbc3c4b98fae4faf0
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:39 2018 -0700

    x86/mm/pat: Make set_memory_np() L1TF safe
    
    commit 958f79b9ee55dfaf00c8106ed1c22a2919e0028b upstream
    
    set_memory_np() is used to mark kernel mappings not present, but it has
    it's own open coded mechanism which does not have the L1TF protection of
    inverting the address bits.
    
    Replace the open coded PTE manipulation with the L1TF protecting low level
    PTE routines.
    
    Passes the CPA self test.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [ dwmw2: Pull in pud_mkhuge() from commit a00cc7d9dd, and pfn_pud() ]
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    [groeck: port to 4.4]
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9feecdb6cb73feaa55b0135aee8777eaac848c78
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:37 2018 -0700

    x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
    
    commit 0768f91530ff46683e0b372df14fd79fe8d156e5 upstream
    
    Some cases in THP like:
      - MADV_FREE
      - mprotect
      - split
    
    mark the PMD non present for temporarily to prevent races. The window for
    an L1TF attack in these contexts is very small, but it wants to be fixed
    for correctness sake.
    
    Use the proper low level functions for pmd/pud_mknotpresent() to address
    this.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0aae5fe8413dfcd949d0df1c7d6b835efecd5b3b
Author: Andi Kleen <ak@linux.intel.com>
Date:   Tue Aug 7 15:09:36 2018 -0700

    x86/speculation/l1tf: Invert all not present mappings
    
    commit f22cc87f6c1f771b57c407555cfefd811cdd9507 upstream
    
    For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
    mapping, but the inversion logic explicitely checks for !PRESENT and
    PROT_NONE.
    
    Remove the PROT_NONE check and make the inversion unconditional for all not
    present mappings.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 09049f022a9b96b0d09d90023d4f0a097a61a767
Author: Michal Hocko <mhocko@suse.cz>
Date:   Wed Jun 27 17:46:50 2018 +0200

    x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
    
    commit e14d7dfb41f5807a0c1c26a13f2b8ef16af24935 upstream
    
    Jan has noticed that pte_pfn and co. resp. pfn_pte are incorrect for
    CONFIG_PAE because phys_addr_t is wider than unsigned long and so the
    pte_val reps. shift left would get truncated. Fix this up by using proper
    types.
    
    [dwmw2: Backport to 4.9]
    
    Fixes: 6b28baca9b1f ("x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation")
    Reported-by: Jan Beulich <JBeulich@suse.com>
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b55b06bd3b3c977da2c938d1a73d38674cb88086
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri Jun 22 17:39:33 2018 +0200

    x86/speculation/l1tf: Protect PAE swap entries against L1TF
    
    commit 0d0f6249058834ffe1ceaad0bb31464af66f6e7a upstream
    
    The PAE 3-level paging code currently doesn't mitigate L1TF by flipping the
    offset bits, and uses the high PTE word, thus bits 32-36 for type, 37-63 for
    offset. The lower word is zeroed, thus systems with less than 4GB memory are
    safe. With 4GB to 128GB the swap type selects the memory locations vulnerable
    to L1TF; with even more memory, also the swap offfset influences the address.
    This might be a problem with 32bit PAE guests running on large 64bit hosts.
    
    By continuing to keep the whole swap entry in either high or low 32bit word of
    PTE we would limit the swap size too much. Thus this patch uses the whole PAE
    PTE with the same layout as the 64bit version does. The macros just become a
    bit tricky since they assume the arch-dependent swp_entry_t to be 32bit.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit dc48c1a2f45b628d3128ad4bb31d1bcd342c059d
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 16:42:58 2018 -0400

    x86/cpufeatures: Add detection of L1D cache flush support.
    
    commit 11e34e64e4103955fc4568750914c75d65ea87ee upstream
    
    336996-Speculative-Execution-Side-Channel-Mitigations.pdf defines a new MSR
    (IA32_FLUSH_CMD) which is detected by CPUID.7.EDX[28]=1 bit being set.
    
    This new MSR "gives software a way to invalidate structures with finer
    granularity than other architectual methods like WBINVD."
    
    A copy of this document is available at
      https://bugzilla.kernel.org/show_bug.cgi?id=199511
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit df7fd6ccb358bd4aa3abc8a6ff995b1f3da1b0fb
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu Jun 21 12:36:29 2018 +0200

    x86/speculation/l1tf: Extend 64bit swap file size limit
    
    commit 1a7ed1ba4bba6c075d5ad61bb75e3fbc870840d6 upstream
    
    The previous patch has limited swap file size so that large offsets cannot
    clear bits above MAX_PA/2 in the pte and interfere with L1TF mitigation.
    
    It assumed that offsets are encoded starting with bit 12, same as pfn. But
    on x86_64, offsets are encoded starting with bit 9.
    
    Thus the limit can be raised by 3 bits. That means 16TB with 42bit MAX_PA
    and 256TB with 46bit MAX_PA.
    
    Fixes: 377eeaa8e11f ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit fa86c208d22d8179ef3d295f6084fc87390c8366
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Jun 20 16:42:57 2018 -0400

    x86/bugs: Move the l1tf function and define pr_fmt properly
    
    commit 56563f53d3066afa9e63d6c997bf67e76a8b05c0 upstream
    
    The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
    which was defined as "Spectre V2 : ".
    
    Move the function to be past SSBD and also define the pr_fmt.
    
    Fixes: 17dbca119312 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 685b44483f077c949bd5016fdfe734b662b74aba
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:28 2018 -0700

    x86/speculation/l1tf: Limit swap file size to MAX_PA/2
    
    commit 377eeaa8e11fe815b1d07c81c4a0e2843a8c15eb upstream
    
    For the L1TF workaround its necessary to limit the swap file size to below
    MAX_PA/2, so that the higher bits of the swap offset inverted never point
    to valid memory.
    
    Add a mechanism for the architecture to override the swap file size check
    in swapfile.c and add a x86 specific max swapfile check function that
    enforces that limit.
    
    The check is only enabled if the CPU is vulnerable to L1TF.
    
    In VMs with 42bit MAX_PA the typical limit is 2TB now, on a native system
    with 46bit PA it is 32TB. The limit is only per individual swap file, so
    it's always possible to exceed these limits with multiple swap files or
    partitions.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d71af2dbacb5611c1dcdc16fd1d343821d61bd5e
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:27 2018 -0700

    x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
    
    commit 42e4089c7890725fcd329999252dc489b72f2921 upstream
    
    For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
    table entry. This sets the high bits in the CPU's address space, thus
    making sure to point to not point an unmapped entry to valid cached memory.
    
    Some server system BIOSes put the MMIO mappings high up in the physical
    address space. If such an high mapping was mapped to unprivileged users
    they could attack low memory by setting such a mapping to PROT_NONE. This
    could happen through a special device driver which is not access
    protected. Normal /dev/mem is of course access protected.
    
    To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.
    
    Valid page mappings are allowed because the system is then unsafe anyways.
    
    It's not expected that users commonly use PROT_NONE on MMIO. But to
    minimize any impact this is only enforced if the mapping actually refers to
    a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
    the check for root.
    
    For mmaps this is straight forward and can be handled in vm_insert_pfn and
    in remap_pfn_range().
    
    For mprotect it's a bit trickier. At the point where the actual PTEs are
    accessed a lot of state has been changed and it would be difficult to undo
    on an error. Since this is a uncommon case use a separate early page talk
    walk pass for MMIO PROT_NONE mappings that checks for this condition
    early. For non MMIO and non PROT_NONE there are no changes.
    
    [dwmw2: Backport to 4.9]
    [groeck: Backport to 4.4]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9ac0dc7d949db7afd4116d55fa4fcf6a66d820f0
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Fri Oct 7 17:00:18 2016 -0700

    mm: fix cache mode tracking in vm_insert_mixed()
    
    commit 87744ab3832b83ba71b931f86f9cfdb000d07da5 upstream
    
    vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
    fails to check the pgprot_t it uses for the mapping against the one
    recorded in the memtype tracking tree.  Add the missing call to
    track_pfn_insert() to preclude cases where incompatible aliased mappings
    are established for a given physical address range.
    
    [groeck: Backport to v4.4.y]
    
    Link: http://lkml.kernel.org/r/147328717909.35069.14256589123570653697.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Matthew Wilcox <mawilcox@microsoft.com>
    Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0371d9c4c822fceb290a0b4cd21119534f7bae47
Author: Andy Lutomirski <luto@kernel.org>
Date:   Tue Dec 29 20:12:20 2015 -0800

    mm: Add vm_insert_pfn_prot()
    
    commit 1745cbc5d0dee0749a6bc0ea8e872c5db0074061 upstream
    
    The x86 vvar vma contains pages with differing cacheability
    flags.  x86 currently implements this by manually inserting all
    the ptes using (io_)remap_pfn_range when the vma is set up.
    
    x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up
    the mappings as needed.  The correct API to use to insert a pfn
    in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the
    vma's cache mode, and the HPET page in particular needs to be
    uncached despite the fact that the rest of the VMA is cached.
    
    Add vm_insert_pfn_prot() to support varying cacheability within
    the same non-COW VMA in a more sane manner.
    
    x86 could alternatively use multiple VMAs, but that's messy,
    would break CRIU, and would create unnecessary VMAs that would
    waste memory.
    
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Acked-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bf0cca01b8736a5e146a980434ba36eb036e37ac
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:26 2018 -0700

    x86/speculation/l1tf: Add sysfs reporting for l1tf
    
    commit 17dbca119312b4e8173d4e25ff64262119fcef38 upstream
    
    L1TF core kernel workarounds are cheap and normally always enabled, However
    they still should be reported in sysfs if the system is vulnerable or
    mitigated. Add the necessary CPU feature/bug bits.
    
    - Extend the existing checks for Meltdowns to determine if the system is
      vulnerable. All CPUs which are not vulnerable to Meltdown are also not
      vulnerable to L1TF
    
    - Check for 32bit non PAE and emit a warning as there is no practical way
      for mitigation due to the limited physical address bits
    
    - If the system has more than MAX_PA/2 physical memory the invert page
      workarounds don't protect the system against the L1TF attack anymore,
      because an inverted physical address will also point to valid
      memory. Print a warning in this case and report that the system is
      vulnerable.
    
    Add a function which returns the PFN limit for the L1TF mitigation, which
    will be used in follow up patches for sanity and range checks.
    
    [ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]
    [ dwmw2: Backport to 4.9 (cpufeatures.h, E820) ]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 52dc5c9f8eee1c569974308f0bb7be64ec63565c
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:25 2018 -0700

    x86/speculation/l1tf: Make sure the first page is always reserved
    
    commit 10a70416e1f067f6c4efda6ffd8ea96002ac4223 upstream
    
    The L1TF workaround doesn't make any attempt to mitigate speculate accesses
    to the first physical page for zeroed PTEs. Normally it only contains some
    data from the early real mode BIOS.
    
    It's not entirely clear that the first page is reserved in all
    configurations, so add an extra reservation call to make sure it is really
    reserved. In most configurations (e.g.  with the standard reservations)
    it's likely a nop.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9ee2d2da676c48a459a99f10f45c71ffca8761a8
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:24 2018 -0700

    x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
    
    commit 6b28baca9b1f0d4a42b865da7a05b1c81424bd5c upstream
    
    When PTEs are set to PROT_NONE the kernel just clears the Present bit and
    preserves the PFN, which creates attack surface for L1TF speculation
    speculation attacks.
    
    This is important inside guests, because L1TF speculation bypasses physical
    page remapping. While the host has its own migitations preventing leaking
    data from other VMs into the guest, this would still risk leaking the wrong
    page inside the current guest.
    
    This uses the same technique as Linus' swap entry patch: while an entry is
    is in PROTNONE state invert the complete PFN part part of it. This ensures
    that the the highest bit will point to non existing memory.
    
    The invert is done by pte/pmd_modify and pfn/pmd/pud_pte for PROTNONE and
    pte/pmd/pud_pfn undo it.
    
    This assume that no code path touches the PFN part of a PTE directly
    without using these primitives.
    
    This doesn't handle the case that MMIO is on the top of the CPU physical
    memory. If such an MMIO region was exposed by an unpriviledged driver for
    mmap it would be possible to attack some real memory.  However this
    situation is all rather unlikely.
    
    For 32bit non PAE the inversion is not done because there are really not
    enough bits to protect anything.
    
    Q: Why does the guest need to be protected when the HyperVisor already has
       L1TF mitigations?
    
    A: Here's an example:
    
       Physical pages 1 2 get mapped into a guest as
       GPA 1 -> PA 2
       GPA 2 -> PA 1
       through EPT.
    
       The L1TF speculation ignores the EPT remapping.
    
       Now the guest kernel maps GPA 1 to process A and GPA 2 to process B, and
       they belong to different users and should be isolated.
    
       A sets the GPA 1 PA 2 PTE to PROT_NONE to bypass the EPT remapping and
       gets read access to the underlying physical page. Which in this case
       points to PA 2, so it can read process B's data, if it happened to be in
       L1, so isolation inside the guest is broken.
    
       There's nothing the hypervisor can do about this. This mitigation has to
       be done in the guest itself.
    
    [ tglx: Massaged changelog ]
    [ dwmw2: backported to 4.9 ]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9bbdab847fc9a0b8cf23fa7354e1210f0b492821
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 13 15:48:23 2018 -0700

    x86/speculation/l1tf: Protect swap entries against L1TF
    
    commit 2f22b4cd45b67b3496f4aa4c7180a1271c6452f6 upstream
    
    With L1 terminal fault the CPU speculates into unmapped PTEs, and resulting
    side effects allow to read the memory the PTE is pointing too, if its
    values are still in the L1 cache.
    
    For swapped out pages Linux uses unmapped PTEs and stores a swap entry into
    them.
    
    To protect against L1TF it must be ensured that the swap entry is not
    pointing to valid memory, which requires setting higher bits (between bit
    36 and bit 45) that are inside the CPUs physical address space, but outside
    any real memory.
    
    To do this invert the offset to make sure the higher bits are always set,
    as long as the swap file is not too big.
    
    Note there is no workaround for 32bit !PAE, or on systems which have more
    than MAX_PA/2 worth of memory. The later case is very unlikely to happen on
    real systems.
    
    [AK: updated description and minor tweaks by. Split out from the original
         patch ]
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Andi Kleen <ak@linux.intel.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 614f5e84640e382b9916b6f606328191ed0264b3
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jun 13 15:48:22 2018 -0700

    x86/speculation/l1tf: Change order of offset/type in swap entry
    
    commit bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f upstream
    
    If pages are swapped out, the swap entry is stored in the corresponding
    PTE, which has the Present bit cleared. CPUs vulnerable to L1TF speculate
    on PTE entries which have the present bit set and would treat the swap
    entry as phsyical address (PFN). To mitigate that the upper bits of the PTE
    must be set so the PTE points to non existent memory.
    
    The swap entry stores the type and the offset of a swapped out page in the
    PTE. type is stored in bit 9-13 and offset in bit 14-63. The hardware
    ignores the bits beyond the phsyical address space limit, so to make the
    mitigation effective its required to start 'offset' at the lowest possible
    bit so that even large swap offsets do not reach into the physical address
    space limit bits.
    
    Move offset to bit 9-58 and type to bit 59-63 which are the bits that
    hardware generally doesn't care about.
    
    That, in turn, means that if you on desktop chip with only 40 bits of
    physical addressing, now that the offset starts at bit 9, there needs to be
    30 bits of offset actually *in use* until bit 39 ends up being set, which
    means when inverted it will again point into existing memory.
    
    So that's 4 terabyte of swap space (because the offset is counted in pages,
    so 30 bits of offset is 42 bits of actual coverage). With bigger physical
    addressing, that obviously grows further, until the limit of the offset is
    hit (at 50 bits of offset - 62 bits of actual swap file coverage).
    
    This is a preparatory change for the actual swap entry inversion to protect
    against L1TF.
    
    [ AK: Updated description and minor tweaks. Split into two parts ]
    [ tglx: Massaged changelog ]
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Andi Kleen <ak@linux.intel.com>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 86b0948d7c546feb01cd2d7ac2bfb15476e6e974
Author: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Date:   Fri Sep 8 16:10:46 2017 -0700

    mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
    
    commit eee4818baac0f2b37848fdf90e4b16430dc536ac upstream
    
    _PAGE_PSE is used to distinguish between a truly non-present
    (_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
    should be treated as present.
    
    But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
    cause confusion between one of those PMDs undergoing a THP split, and a
    soft-dirty PMD.  Dropping _PAGE_PSE check in pmd_present() does not work
    well, because it can hurt optimization of tlb handling in thp split.
    
    Thus, we need to move the bit.
    
    In the current kernel, bits 1-4 are not used in non-present format since
    commit 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work
    around erratum").  So let's move _PAGE_SWP_SOFT_DIRTY to bit 1.  Bit 7
    is used as reserved (always clear), so please don't use it for other
    purpose.
    
    [dwmw2: Pulled in to 4.9 backport to support L1TF changes]
    
    Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com
    Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
    Cc: David Nellans <dnellans@nvidia.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f487cf69cf1456ceb34857a50474373aae42dd8a
Author: Dave Hansen <dave.hansen@linux.intel.com>
Date:   Wed Aug 10 10:23:25 2016 -0700

    x86/mm: Fix swap entry comment and macro
    
    commit ace7fab7a6cdd363a615ec537f2aa94dbc761ee2 upstream
    
    A recent patch changed the format of a swap PTE.
    
    The comment explaining the format of the swap PTE is wrong about
    the bits used for the swap type field.  Amusingly, the ASCII art
    and the patch description are correct, but the comment itself
    is wrong.
    
    As I was looking at this, I also noticed that the
    SWP_OFFSET_FIRST_BIT has an off-by-one error.  This does not
    really hurt anything.  It just wasted a bit of space in the PTE,
    giving us 2^59 bytes of addressable space in our swapfiles
    instead of 2^60.  But, it doesn't match with the comments, and it
    wastes a bit of space, so fix it.
    
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Dave Hansen <dave@sr71.net>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Luis R. Rodriguez <mcgrof@suse.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Toshi Kani <toshi.kani@hp.com>
    Fixes: 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work around erratum")
    Link: http://lkml.kernel.org/r/20160810172325.E56AD7DA@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0a5deacaac102f451bf8c1fb9d007047fcd712f6
Author: Dave Hansen <dave.hansen@linux.intel.com>
Date:   Thu Jul 7 17:19:11 2016 -0700

    x86/mm: Move swap offset/type up in PTE to work around erratum
    
    commit 00839ee3b299303c6a5e26a0a2485427a3afcbbf upstream
    
    This erratum can result in Accessed/Dirty getting set by the hardware
    when we do not expect them to be (on !Present PTEs).
    
    Instead of trying to fix them up after this happens, we just
    allow the bits to get set and try to ignore them.  We do this by
    shifting the layout of the bits we use for swap offset/type in
    our 64-bit PTEs.
    
    It looks like this:
    
     bitnrs: |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0|
     names:  |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
     before: |         OFFSET (9-63)          |0|X|X| TYPE(1-5) |0|
      after: | OFFSET (14-63)  |  TYPE (9-13) |0|X|X|X| X| X|X|X|0|
    
    Note that D was already a don't care (X) even before.  We just
    move TYPE up and turn its old spot (which could be hit by the
    A bit) into all don't cares.
    
    We take 5 bits away from the offset, but that still leaves us
    with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
    I think that's probably fine for the moment.  We could
    theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
    doesn't gain us anything.
    
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Dave Hansen <dave@sr71.net>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Luis R. Rodriguez <mcgrof@suse.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Toshi Kani <toshi.kani@hp.com>
    Cc: dave.hansen@intel.com
    Cc: linux-mm@kvack.org
    Cc: mhocko@suse.com
    Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 90a231c63cc28d896ab353b027011a949e9884d3
Author: Andi Kleen <ak@linux.intel.com>
Date:   Wed Jun 13 15:48:21 2018 -0700

    x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
    
    commit 50896e180c6aa3a9c61a26ced99e15d602666a4c upstream
    
    L1 Terminal Fault (L1TF) is a speculation related vulnerability. The CPU
    speculates on PTE entries which do not have the PRESENT bit set, if the
    content of the resulting physical address is available in the L1D cache.
    
    The OS side mitigation makes sure that a !PRESENT PTE entry points to a
    physical address outside the actually existing and cachable memory
    space. This is achieved by inverting the upper bits of the PTE. Due to the
    address space limitations this only works for 64bit and 32bit PAE kernels,
    but not for 32bit non PAE.
    
    This mitigation applies to both host and guest kernels, but in case of a
    64bit host (hypervisor) and a 32bit PAE guest, inverting the upper bits of
    the PAE address space (44bit) is not enough if the host has more than 43
    bits of populated memory address space, because the speculation treats the
    PTE content as a physical host address bypassing EPT.
    
    The host (hypervisor) protects itself against the guest by flushing L1D as
    needed, but pages inside the guest are not protected against attacks from
    other processes inside the same guest.
    
    For the guest the inverted PTE mask has to match the host to provide the
    full protection for all pages the host could possibly map into the
    guest. The hosts populated address space is not known to the guest, so the
    mask must cover the possible maximal host address space, i.e. 52 bit.
    
    On 32bit PAE the maximum PTE mask is currently set to 44 bit because that
    is the limit imposed by 32bit unsigned long PFNs in the VMs. This limits
    the mask to be below what the host could possible use for physical pages.
    
    The L1TF PROT_NONE protection code uses the PTE masks to determine which
    bits to invert to make sure the higher bits are set for unmapped entries to
    prevent L1TF speculation attacks against EPT inside guests.
    
    In order to invert all bits that could be used by the host, increase
    __PHYSICAL_PAGE_SHIFT to 52 to match 64bit.
    
    The real limit for a 32bit PAE kernel is still 44 bits because all Linux
    PTEs are created from unsigned long PFNs, so they cannot be higher than 44
    bits on a 32bit kernel. So these extra PFN bits should be never set. The
    only users of this macro are using it to look at PTEs, so it's safe.
    
    [ tglx: Massaged changelog ]
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ec5aa64fec7206537442a2f3cb67decabad252f4
Author: Nick Desaulniers <ndesaulniers@google.com>
Date:   Fri Aug 3 10:05:50 2018 -0700

    x86/irqflags: Provide a declaration for native_save_fl
    
    commit 208cbb32558907f68b3b2a081ca2337ac3744794 upstream.
    
    It was reported that the commit d0a8d9378d16 is causing users of gcc < 4.9
    to observe -Werror=missing-prototypes errors.
    
    Indeed, it seems that:
    extern inline unsigned long native_save_fl(void) { return 0; }
    
    compiled with -Werror=missing-prototypes produces this warning in gcc <
    4.9, but not gcc >= 4.9.
    
    Fixes: d0a8d9378d16 ("x86/paravirt: Make native_save_fl() extern inline").
    Reported-by: David Laight <david.laight@aculab.com>
    Reported-by: Jean Delvare <jdelvare@suse.de>
    Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: hpa@zytor.com
    Cc: jgross@suse.com
    Cc: kstewart@linuxfoundation.org
    Cc: gregkh@linuxfoundation.org
    Cc: boris.ostrovsky@oracle.com
    Cc: astrachan@google.com
    Cc: mka@chromium.org
    Cc: arnd@arndb.de
    Cc: tstellar@redhat.com
    Cc: sedat.dilek@gmail.com
    Cc: David.Laight@aculab.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20180803170550.164688-1-ndesaulniers@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 866234c373a0f34774d5dcb3886a6c982397bbc9
Author: Masami Hiramatsu <mhiramat@kernel.org>
Date:   Sat Apr 28 21:37:03 2018 +0900

    kprobes/x86: Fix %p uses in error messages
    
    commit 0ea063306eecf300fcf06d2f5917474b580f666f upstream.
    
    Remove all %p uses in error messages in kprobes/x86.
    
    Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
    Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: David Howells <dhowells@redhat.com>
    Cc: David S . Miller <davem@davemloft.net>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Jon Medhurst <tixy@linaro.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas Richter <tmricht@linux.ibm.com>
    Cc: Tobin C . Harding <me@tobin.cc>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: acme@kernel.org
    Cc: akpm@linux-foundation.org
    Cc: brueckner@linux.vnet.ibm.com
    Cc: linux-arch@vger.kernel.org
    Cc: rostedt@goodmis.org
    Cc: schwidefsky@de.ibm.com
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/lkml/152491902310.9916.13355297638917767319.stgit@devbox
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7744abbe29a59db367f59b0c9890356732f25a3b
Author: Jiri Kosina <jkosina@suse.cz>
Date:   Thu Jul 26 13:14:55 2018 +0200

    x86/speculation: Protect against userspace-userspace spectreRSB
    
    commit fdf82a7856b32d905c39afc85e34364491e46346 upstream.
    
    The article "Spectre Returns! Speculation Attacks using the Return Stack
    Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks,
    making use solely of the RSB contents even on CPUs that don't fallback to
    BTB on RSB underflow (Skylake+).
    
    Mitigate userspace-userspace attacks by always unconditionally filling RSB on
    context switch when the generic spectrev2 mitigation has been enabled.
    
    [1] https://arxiv.org/pdf/1807.07940.pdf
    
    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8dbce8a2e9cfc8e026565d75f7cb950393d04159
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Aug 3 16:41:39 2018 +0200

    x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
    
    commit 5800dc5c19f34e6e03b5adab1282535cb102fafd upstream.
    
    Nadav reported that on guests we're failing to rewrite the indirect
    calls to CALLEE_SAVE paravirt functions. In particular the
    pv_queued_spin_unlock() call is left unpatched and that is all over the
    place. This obviously wrecks Spectre-v2 mitigation (for paravirt
    guests) which relies on not actually having indirect calls around.
    
    The reason is an incorrect clobber test in paravirt_patch_call(); this
    function rewrites an indirect call with a direct call to the _SAME_
    function, there is no possible way the clobbers can be different
    because of this.
    
    Therefore remove this clobber check. Also put WARNs on the other patch
    failure case (not enough room for the instruction) which I've not seen
    trigger in my (limited) testing.
    
    Three live kernel image disassemblies for lock_sock_nested (as a small
    function that illustrates the problem nicely). PRE is the current
    situation for guests, POST is with this patch applied and NATIVE is with
    or without the patch for !guests.
    
    PRE:
    
    (gdb) disassemble lock_sock_nested
    Dump of assembler code for function lock_sock_nested:
       0xffffffff817be970 <+0>:     push   %rbp
       0xffffffff817be971 <+1>:     mov    %rdi,%rbp
       0xffffffff817be974 <+4>:     push   %rbx
       0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
       0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
       0xffffffff817be981 <+17>:    mov    %rbx,%rdi
       0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
       0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
       0xffffffff817be98f <+31>:    test   %eax,%eax
       0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
       0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
       0xffffffff817be99d <+45>:    mov    %rbx,%rdi
       0xffffffff817be9a0 <+48>:    callq  *0xffffffff822299e8
       0xffffffff817be9a7 <+55>:    pop    %rbx
       0xffffffff817be9a8 <+56>:    pop    %rbp
       0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
       0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
       0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
       0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
       0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
       0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
    End of assembler dump.
    
    POST:
    
    (gdb) disassemble lock_sock_nested
    Dump of assembler code for function lock_sock_nested:
       0xffffffff817be970 <+0>:     push   %rbp
       0xffffffff817be971 <+1>:     mov    %rdi,%rbp
       0xffffffff817be974 <+4>:     push   %rbx
       0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
       0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
       0xffffffff817be981 <+17>:    mov    %rbx,%rdi
       0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
       0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
       0xffffffff817be98f <+31>:    test   %eax,%eax
       0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
       0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
       0xffffffff817be99d <+45>:    mov    %rbx,%rdi
       0xffffffff817be9a0 <+48>:    callq  0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
       0xffffffff817be9a5 <+53>:    xchg   %ax,%ax
       0xffffffff817be9a7 <+55>:    pop    %rbx
       0xffffffff817be9a8 <+56>:    pop    %rbp
       0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
       0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
       0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063aa0 <__local_bh_enable_ip>
       0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
       0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
       0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
    End of assembler dump.
    
    NATIVE:
    
    (gdb) disassemble lock_sock_nested
    Dump of assembler code for function lock_sock_nested:
       0xffffffff817be970 <+0>:     push   %rbp
       0xffffffff817be971 <+1>:     mov    %rdi,%rbp
       0xffffffff817be974 <+4>:     push   %rbx
       0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
       0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
       0xffffffff817be981 <+17>:    mov    %rbx,%rdi
       0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
       0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
       0xffffffff817be98f <+31>:    test   %eax,%eax
       0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
       0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
       0xffffffff817be99d <+45>:    mov    %rbx,%rdi
       0xffffffff817be9a0 <+48>:    movb   $0x0,(%rdi)
       0xffffffff817be9a3 <+51>:    nopl   0x0(%rax)
       0xffffffff817be9a7 <+55>:    pop    %rbx
       0xffffffff817be9a8 <+56>:    pop    %rbp
       0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
       0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
       0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
       0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
       0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
       0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
    End of assembler dump.
    
    
    Fixes: 63f70270ccd9 ("[PATCH] i386: PARAVIRT: add common patching machinery")
    Fixes: 3010a0663fd9 ("x86/paravirt, objtool: Annotate indirect calls")
    Reported-by: Nadav Amit <namit@vmware.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: David Woodhouse <dwmw2@infradead.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 916a57896e00d4f92318c8ff5a7b8ca07e4e95a7
Author: Oleksij Rempel <o.rempel@pengutronix.de>
Date:   Fri Jun 15 09:41:29 2018 +0200

    ARM: dts: imx6sx: fix irq for pcie bridge
    
    commit 1bcfe0564044be578841744faea1c2f46adc8178 upstream.
    
    Use the correct IRQ line for the MSI controller in the PCIe host
    controller. Apparently a different IRQ line is used compared to other
    i.MX6 variants. Without this change MSI IRQs aren't properly propagated
    to the upstream interrupt controller.
    
    Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
    Fixes: b1d17f68e5c5 ("ARM: dts: imx: add initial imx6sx device tree source")
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>
    Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 45c679be34ac44ad24bc7abf60193b9f43a83490
Author: Michael Mera <dev@michaelmera.com>
Date:   Mon May 1 15:41:16 2017 +0900

    IB/ocrdma: fix out of bounds access to local buffer
    
    commit 062d0f22a30c39840ea49b72cfcfc1aa4cc538fa upstream.
    
    In write to debugfs file 'resource_stats' the local buffer 'tmp_str' is
    written at index 'count-1' where 'count' is the size of the write, so
    potentially 0.
    
    This patch filters odd values for the write size/position to avoid this
    type of problem.
    
    Signed-off-by: Michael Mera <dev@michaelmera.com>
    Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d803aa2fe665f2dca0e46cefca982ad5c537ca7e
Author: Jack Morgenstein <jackm@dev.mellanox.co.il>
Date:   Wed May 23 15:30:31 2018 +0300

    IB/mlx4: Mark user MR as writable if actual virtual memory is writable
    
    commit d8f9cc328c8888369880e2527e9186d745f2bbf6 upstream.
    
    To allow rereg_user_mr to modify the MR from read-only to writable without
    using get_user_pages again, we needed to define the initial MR as writable.
    However, this was originally done unconditionally, without taking into
    account the writability of the underlying virtual memory.
    
    As a result, any attempt to register a read-only MR over read-only
    virtual memory failed.
    
    To fix this, do not add the writable flag bit when the user virtual memory
    is not writable (e.g. const memory).
    
    However, when the underlying memory is NOT writable (and we therefore
    do not define the initial MR as writable), the IB core adds a
    "force writable" flag to its user-pages request. If this succeeds,
    the reg_user_mr caller gets a writable copy of the original pages.
    
    If the user-space caller then does a rereg_user_mr operation to enable
    writability, this will succeed. This should not be allowed, since
    the original virtual memory was not writable.
    
    Cc: <stable@vger.kernel.org>
    Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration")
    Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
    Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 01b377d3f0d286d071f46c30586cb261c79559f7
Author: Jack Morgenstein <jackm@dev.mellanox.co.il>
Date:   Wed May 23 15:30:30 2018 +0300

    IB/core: Make testing MR flags for writability a static inline function
    
    commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream.
    
    Make the MR writability flags check, which is performed in umem.c,
    a static inline function in file ib_verbs.h
    
    This allows the function to be used by low-level infiniband drivers.
    
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
    Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
    Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b9341f5aebd89f46d2cda7dd9c39aabc0a559bdb
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Aug 9 17:51:32 2018 -0400

    fix __legitimize_mnt()/mntput() race
    
    commit 119e1ef80ecfe0d1deb6378d4ab41f5b71519de1 upstream.
    
    __legitimize_mnt() has two problems - one is that in case of success
    the check of mount_lock is not ordered wrt preceding increment of
    refcount, making it possible to have successful __legitimize_mnt()
    on one CPU just before the otherwise final mntpu() on another,
    with __legitimize_mnt() not seeing mntput() taking the lock and
    mntput() not seeing the increment done by __legitimize_mnt().
    Solved by a pair of barriers.
    
    Another is that failure of __legitimize_mnt() on the second
    read_seqretry() leaves us with reference that'll need to be
    dropped by caller; however, if that races with final mntput()
    we can end up with caller dropping rcu_read_lock() and doing
    mntput() to release that reference - with the first mntput()
    having freed the damn thing just as rcu_read_lock() had been
    dropped.  Solution: in "do mntput() yourself" failure case
    grab mount_lock, check if MNT_DOOMED has been set by racing
    final mntput() that has missed our increment and if it has -
    undo the increment and treat that as "failure, caller doesn't
    need to drop anything" case.
    
    It's not easy to hit - the final mntput() has to come right
    after the first read_seqretry() in __legitimize_mnt() *and*
    manage to miss the increment done by __legitimize_mnt() before
    the second read_seqretry() in there.  The things that are almost
    impossible to hit on bare hardware are not impossible on SMP
    KVM, though...
    
    Reported-by: Oleg Nesterov <oleg@redhat.com>
    Fixes: 48a066e72d97 ("RCU'd vsfmounts")
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a3ababd599e72b9b92420c159564684fcbfa489f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Thu Aug 9 17:21:17 2018 -0400

    fix mntput/mntput race
    
    commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream.
    
    mntput_no_expire() does the calculation of total refcount under mount_lock;
    unfortunately, the decrement (as well as all increments) are done outside
    of it, leading to false positives in the "are we dropping the last reference"
    test.  Consider the following situation:
            * mnt is a lazy-umounted mount, kept alive by two opened files.  One
    of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
    mntput(mnt) (called from __fput()) drops one reference, decrementing component
            * After it has looked at component #0, the process on CPU 0 does
    mntget(), incrementing component #0, gets preempted and gets to run again -
    on CPU 69.  There it does mntput(), which drops the reference (component #69)
    and proceeds to spin on mount_lock.
            * On CPU 42 our first mntput() finishes counting.  It observes the
    decrement of component #69, but not the increment of component #0.  As the
    result, the total it gets is not 1 as it should've been - it's 0.  At which
    point we decide that vfsmount needs to be killed and proceed to free it and
    shut the filesystem down.  However, there's still another opened file
    on that filesystem, with reference to (now freed) vfsmount, etc. and we are
    screwed.
    
    It's not a wide race, but it can be reproduced with artificial slowdown of
    the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
    
    Fix consists of moving the refcount decrement under mount_lock; the tricky
    part is that we want (and can) keep the fast case (i.e. mount that still
    has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
    mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
    before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
    a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
    be dropped.
    
    Reported-by: Jann Horn <jannh@google.com>
    Tested-by: Jann Horn <jannh@google.com>
    Fixes: 48a066e72d97 ("RCU'd vsfmounts")
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ba744147871e7c6d3b6b60eede06f74a1a7abcd9
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Mon Aug 6 09:03:58 2018 -0400

    root dentries need RCU-delayed freeing
    
    commit 90bad5e05bcdb0308cfa3d3a60f5c0b9c8e2efb3 upstream.
    
    Since mountpoint crossing can happen without leaving lazy mode,
    root dentries do need the same protection against having their
    memory freed without RCU delay as everything else in the tree.
    
    It's partially hidden by RCU delay between detaching from the
    mount tree and dropping the vfsmount reference, but the starting
    point of pathwalk can be on an already detached mount, in which
    case umount-caused RCU delay has already passed by the time the
    lazy pathwalk grabs rcu_read_lock().  If the starting point
    happens to be at the root of that vfsmount *and* that vfsmount
    covers the entire filesystem, we get trouble.
    
    Fixes: 48a066e72d97 ("RCU'd vsfmounts")
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6aef4c4a1690b0b371d88babc41a8a314d0fd3f9
Author: Bart Van Assche <bart.vanassche@wdc.com>
Date:   Thu Aug 2 10:44:42 2018 -0700

    scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
    
    commit 1214fd7b497400d200e3f4e64e2338b303a20949 upstream.
    
    Surround scsi_execute() calls with scsi_autopm_get_device() and
    scsi_autopm_put_device(). Note: removing sr_mutex protection from the
    scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
    sr_mutex is to serialize cdrom_*() calls.
    
    This patch avoids that complaints similar to the following appear in the
    kernel log if runtime power management is enabled:
    
    INFO: task systemd-udevd:650 blocked for more than 120 seconds.
         Not tainted 4.18.0-rc7-dbg+ #1
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    systemd-udevd   D28176   650    513 0x00000104
    Call Trace:
    __schedule+0x444/0xfe0
    schedule+0x4e/0xe0
    schedule_preempt_disabled+0x18/0x30
    __mutex_lock+0x41c/0xc70
    mutex_lock_nested+0x1b/0x20
    __blkdev_get+0x106/0x970
    blkdev_get+0x22c/0x5a0
    blkdev_open+0xe9/0x100
    do_dentry_open.isra.19+0x33e/0x570
    vfs_open+0x7c/0xd0
    path_openat+0x6e3/0x1120
    do_filp_open+0x11c/0x1c0
    do_sys_open+0x208/0x2d0
    __x64_sys_openat+0x59/0x70
    do_syscall_64+0x77/0x230
    entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
    Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
    Cc: Maurizio Lombardi <mlombard@redhat.com>
    Cc: Johannes Thumshirn <jthumshirn@suse.de>
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: <stable@vger.kernel.org>
    Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 277131baccf9c96e01d5ffdb0c6447770b634eae
Author: Hans de Goede <hdegoede@redhat.com>
Date:   Thu Apr 26 14:10:24 2018 +0200

    ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices
    
    commit fdcb613d49321b5bf5d5a1bd0fba8e7c241dcc70 upstream.
    
    The LPSS PWM device on on Bay Trail and Cherry Trail devices has a set
    of private registers at offset 0x800, the current lpss_device_desc for
    them already sets the LPSS_SAVE_CTX flag to have these saved/restored
    over device-suspend, but the current lpss_device_desc was not setting
    the prv_offset field, leading to the regular device registers getting
    saved/restored instead.
    
    This is causing the PWM controller to no longer work, resulting in a black
    screen,  after a suspend/resume on systems where the firmware clears the
    APB clock and reset bits at offset 0x804.
    
    This commit fixes this by properly setting prv_offset to 0x800 for
    the PWM devices.
    
    Cc: stable@vger.kernel.org
    Fixes: e1c748179754 ("ACPI / LPSS: Add Intel BayTrail ACPI mode PWM")
    Fixes: 1bfbd8eb8a7f ("ACPI / LPSS: Add ACPI IDs for Intel Braswell")
    Signed-off-by: Hans de Goede <hdegoede@redhat.com>
    Acked-by: Rafael J . Wysocki <rjw@rjwysocki.net>
    Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
    Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6b1f6243b39c4f49d44bafa9e4639be4f124577f
Author: Juergen Gross <jgross@suse.com>
Date:   Thu Aug 9 16:42:16 2018 +0200

    xen/netfront: don't cache skb_shinfo()
    
    commit d472b3a6cf63cd31cae1ed61930f07e6cd6671b5 upstream.
    
    skb_shinfo() can change when calling __pskb_pull_tail(): Don't cache
    its return value.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Wei Liu <wei.liu2@citrix.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 277b161b1a1d339985b4c24e796e86eae9511382
Author: John David Anglin <dave.anglin@bell.net>
Date:   Sun Aug 5 13:30:31 2018 -0400

    parisc: Define mb() and add memory barriers to assembler unlock sequences
    
    commit fedb8da96355f5f64353625bf96dc69423ad1826 upstream.
    
    For years I thought all parisc machines executed loads and stores in
    order. However, Jeff Law recently indicated on gcc-patches that this is
    not correct. There are various degrees of out-of-order execution all the
    way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
    series has full out-of-order execution for both integer operations, and
    loads and stores.
    
    This is described in the following article:
    http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
    
    For this reason, we need to define mb() and to insert a memory barrier
    before the store unlocking spinlocks. This ensures that all memory
    accesses are complete prior to unlocking. The ldcw instruction performs
    the same function on entry.
    
    Signed-off-by: John David Anglin <dave.anglin@bell.net>
    Cc: stable@vger.kernel.org # 4.0+
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a9252a70174362912fee1556f8c3a25d66cd7637
Author: Helge Deller <deller@gmx.de>
Date:   Sat Jul 28 11:47:17 2018 +0200

    parisc: Enable CONFIG_MLONGCALLS by default
    
    commit 66509a276c8c1d19ee3f661a41b418d101c57d29 upstream.
    
    Enable the -mlong-calls compiler option by default, because otherwise in most
    cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
    relocations. This fixes building the 64-bit defconfig.
    
    Cc: stable@vger.kernel.org # 4.0+
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1e4006421429ab672c62ab25afb3c39e6f4aa94f
Author: Kees Cook <keescook@chromium.org>
Date:   Fri Apr 20 14:55:31 2018 -0700

    fork: unconditionally clear stack on fork
    
    commit e01e80634ecdde1dd113ac43b3adad21b47f3957 upstream.
    
    One of the classes of kernel stack content leaks[1] is exposing the
    contents of prior heap or stack contents when a new process stack is
    allocated.  Normally, those stacks are not zeroed, and the old contents
    remain in place.  In the face of stack content exposure flaws, those
    contents can leak to userspace.
    
    Fixing this will make the kernel no longer vulnerable to these flaws, as
    the stack will be wiped each time a stack is assigned to a new process.
    There's not a meaningful change in runtime performance; it almost looks
    like it provides a benefit.
    
    Performing back-to-back kernel builds before:
            Run times: 157.86 157.09 158.90 160.94 160.80
            Mean: 159.12
            Std Dev: 1.54
    
    and after:
            Run times: 159.31 157.34 156.71 158.15 160.81
            Mean: 158.46
            Std Dev: 1.46
    
    Instead of making this a build or runtime config, Andy Lutomirski
    recommended this just be enabled by default.
    
    [1] A noisy search for many kinds of stack content leaks can be seen here:
    https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
    
    I did some more with perf and cycle counts on running 100,000 execs of
    /bin/true.
    
    before:
    Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
    Mean:  221015379122.60
    Std Dev: 4662486552.47
    
    after:
    Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
    Mean:  217745009865.40
    Std Dev: 5935559279.99
    
    It continues to look like it's faster, though the deviation is rather
    wide, but I'm not sure what I could do that would be less noisy.  I'm
    open to ideas!
    
    Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Laura Abbott <labbott@redhat.com>
    Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    [ Srivatsa: Backported to 4.4.y ]
    Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
    Reviewed-by: Srinidhi Rao <srinidhir@vmware.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e424bee248c38266c6057d43f3e350072fc41c5d
Author: Thomas Egerer <hakke_007@gmx.de>
Date:   Mon Jan 25 12:58:44 2016 +0100

    ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV
    
    commit 32b6170ca59ccf07d0e394561e54b2cd9726038c upstream.
    
    The ESP algorithms using CBC mode require echainiv. Hence INET*_ESP have
    to select CRYPTO_ECHAINIV in order to work properly. This solves the
    issues caused by a misconfiguration as described in [1].
    The original approach, patching crypto/Kconfig was turned down by
    Herbert Xu [2].
    
    [1] https://lists.strongswan.org/pipermail/users/2015-December/009074.html
    [2] http://marc.info/?l=linux-crypto-vger&m=145224655809562&w=2
    
    Signed-off-by: Thomas Egerer <hakke_007@gmx.de>
    Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Cc: Yongqin Liu <yongqin.liu@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 215f36e128f2b476cd3bfe91339a5e12b79d010c
Author: Tadeusz Struk <tadeusz.struk@intel.com>
Date:   Tue May 22 14:37:18 2018 -0700

    tpm: fix race condition in tpm_common_write()
    
    commit 3ab2011ea368ec3433ad49e1b9e1c7b70d2e65df upstream.
    
    There is a race condition in tpm_common_write function allowing
    two threads on the same /dev/tpm<N>, or two different applications
    on the same /dev/tpmrm<N> to overwrite each other commands/responses.
    Fixed this by taking the priv->buffer_mutex early in the function.
    
    Also converted the priv->data_pending from atomic to a regular size_t
    type. There is no need for it to be atomic since it is only touched
    under the protection of the priv->buffer_mutex.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: stable@vger.kernel.org
    Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
    Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
    Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7736fcede789b412ae1c5c2f12f9bef58903319c
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sat Jul 28 08:12:04 2018 -0400

    ext4: fix check to prevent initializing reserved inodes
    
    commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream.
    
    Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is
    valid" will complain if block group zero does not have the
    EXT4_BG_INODE_ZEROED flag set.  Unfortunately, this is not correct,
    since a freshly created file system has this flag cleared.  It gets
    almost immediately after the file system is mounted read-write --- but
    the following somewhat unlikely sequence will end up triggering a
    false positive report of a corrupted file system:
    
       mkfs.ext4 /dev/vdc
       mount -o ro /dev/vdc /vdc
       mount -o remount,rw /dev/vdc
    
    Instead, when initializing the inode table for block group zero, test
    to make sure that itable_unused count is not too large, since that is
    the case that will result in some or all of the reserved inodes
    getting cleared.
    
    This fixes the failures reported by Eric Whiteney when running
    generic/230 and generic/231 in the the nojournal test case.
    
    Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid")
    Reported-by: Eric Whitney <enwlinux@gmail.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>