commit de53befa79cfd74c01fbbdeb45c700b3e9e13011 Author: Greg Kroah-Hartman Date: Sat Feb 13 13:55:19 2021 +0100 Linux 5.10.16 Tested-by: Jason Self Tested-by: Linux Kernel Functional Testing Tested-by: Shuah Khan Tested-by: Guenter Roeck Tested-by: Florian Fainelli Tested-by: Pavel Machek (CIP) Tested-by: Ross Schmidt Link: https://lore.kernel.org/r/20210211150152.885701259@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman commit bddcce15cd1fb9675ddd46a76d8fe2d0a571313b Author: Phillip Lougher Date: Tue Feb 9 13:42:00 2021 -0800 squashfs: add more sanity checks in xattr id lookup commit 506220d2ba21791314af569211ffd8870b8208fa upstream. Sysbot has reported a warning where a kmalloc() attempt exceeds the maximum limit. This has been identified as corruption of the xattr_ids count when reading the xattr id lookup table. This patch adds a number of additional sanity checks to detect this corruption and others. 1. It checks for a corrupted xattr index read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This would cause an out of bounds read. 2. It checks against corruption of the xattr_ids count. This can either lead to the above kmalloc failure, or a smaller than expected table to be read. 3. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 5e22b39b377e45ba05142e04056920820d277a5c Author: Phillip Lougher Date: Tue Feb 9 13:41:56 2021 -0800 squashfs: add more sanity checks in inode lookup commit eabac19e40c095543def79cb6ffeb3a8588aaff4 upstream. Sysbot has reported an "slab-out-of-bounds read" error which has been identified as being caused by a corrupted "ino_num" value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the inodes count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large inodes count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 6634147f5128d9055dcc5519fad3c3fcd2adeb44 Author: Phillip Lougher Date: Tue Feb 9 13:41:53 2021 -0800 squashfs: add more sanity checks in id lookup commit f37aa4c7366e23f91b81d00bafd6a7ab54e4a381 upstream. Sysbot has reported a number of "slab-out-of-bounds reads" and "use-after-free read" errors which has been identified as being caused by a corrupted index value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the ids count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large ids count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher Reported-by: syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com Reported-by: syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com Reported-by: syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com Reported-by: syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit ff3a75bda722b4a488ae095939e610bd315b371f Author: Phillip Lougher Date: Tue Feb 9 13:41:50 2021 -0800 squashfs: avoid out of bounds writes in decompressors commit e812cbbbbbb15adbbbee176baa1e8bda53059bf0 upstream. Patch series "Squashfs: fix BIO migration regression and add sanity checks". Patch [1/4] fixes a regression introduced by the "migrate from ll_rw_block usage to BIO" patch, which has produced a number of Sysbot/Syzkaller reports. Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption issues which have produced Sysbot reports in the id, inode and xattr lookup code. Each patch has been tested against the Sysbot reproducers using the given kernel configuration. They have the appropriate "Reported-by:" lines added. Additionally, all of the reproducer filesystems are indirectly fixed by patch [4/4] due to the fact they all have xattr corruption which is now detected there. Additional testing with other configurations and architectures (32bit, big endian), and normal filesystems has also been done to trap any inadvertent regressions caused by the additional sanity checks. This patch (of 4): This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Sysbot/Syskaller has reported a number of "out of bounds writes" and "unable to handle kernel paging request in squashfs_decompress" errors which have been identified as a regression introduced by the above patch. Specifically, the patch removed the following sanity check if (length < 0 || length > output->length || (index + length) > msblk->bytes_used) This check did two things: 1. It ensured any reads were not beyond the end of the filesystem 2. It ensured that the "length" field read from the filesystem was within the expected maximum length. Without this any corrupted values can over-run allocated buffers. Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com Signed-off-by: Phillip Lougher Cc: Philippe Liard Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit dd0a41bc17bb9e934e401246ab2f8d269a49c6cf Author: Johannes Weiner Date: Tue Feb 9 13:42:28 2021 -0800 Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" commit e82553c10b0899994153f9bf0af333c0a1550fd7 upstream. This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner Reported-by: Tejun Heo Acked-by: Chris Down Acked-by: Michal Hocko Cc: Roman Gushchin Cc: Shakeel Butt Cc: Michal Koutný Cc: [5.8+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 237ee28818a9cce147651f7dfd88e297ab9e778c Author: Joachim Henke Date: Tue Feb 9 13:42:36 2021 -0800 nilfs2: make splice write available again commit a35d8f016e0b68634035217d06d1c53863456b50 upstream. Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was caused by commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops"). This patch initializes the splice_write field in file_operations, like most file systems do, to restore the functionality. Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Joachim Henke Signed-off-by: Ryusuke Konishi Tested-by: Ryusuke Konishi Cc: [5.10+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4e78c33874e541979383a761cf9e3de0a24c710c Author: Ville Syrjälä Date: Thu Jan 28 17:59:44 2021 +0200 drm/i915: Skip vswing programming for TBT commit eaf5bfe37db871031232d2bf2535b6ca92afbad8 upstream. In thunderbolt mode the PHY is owned by the thunderbolt controller. We are not supposed to touch it. So skip the vswing programming as well (we already skipped the other steps not applicable to TBT). Touching this stuff could supposedly interfere with the PHY programming done by the thunderbolt controller. Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20210128155948.13678-1-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak (cherry picked from commit f8c6b615b921d8a1bcd74870f9105e62b0bceff3) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman commit 43f39b85e9bdc8c8cd42e0916a06b0bb4aaf5165 Author: Ville Syrjälä Date: Mon Dec 7 22:35:11 2020 +0200 drm/i915: Fix ICL MG PHY vswing handling commit a2a5f5628e5494ca9353f761f7fe783dfa82fb9a upstream. The MH PHY vswing table does have all the entries these days. Get rid of the old hacks in the code which claim otherwise. This hack was totally bogus anyway. The correct way to handle the lack of those two entries would have been to declare our max vswing and pre-emph to both be level 2. Cc: José Roberto de Souza Cc: Clinton Taylor Fixes: 9f7ffa297978 ("drm/i915/tc/icl: Update TC vswing tables") Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20201207203512.1718-1-ville.syrjala@linux.intel.com Reviewed-by: Imre Deak Reviewed-by: José Roberto de Souza (cherry picked from commit 5ec346476e795089b7dac8ab9dcee30c8d80ad84) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman commit 67afdc7d95b90aaf3ba3b2c7bccd6d73bda53265 Author: Daniel Borkmann Date: Fri Feb 5 17:20:14 2021 +0100 bpf: Fix verifier jsgt branch analysis on max bound commit ee114dd64c0071500345439fc79dd5e0f9d106ed upstream. Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return code for both will tell the caller whether a given conditional jump is taken or not, e.g. 1 means branch will be taken [for the involved registers] and the goto target will be executed, 0 means branch will not be taken and instead we fall-through to the next insn, and last but not least a -1 denotes that it is not known at verification time whether a branch will be taken or not. Now while the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval since the branch will also be taken for reg->s32_max_value == sval. The jgt branch analysis, for example, gets this right. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Fixes: 4f7b3e82589e ("bpf: improve verifier branch analysis") Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 1d16cc210fabd0a7ebf52d3025f81c2bde054a90 Author: Daniel Borkmann Date: Tue Feb 9 18:46:10 2021 +0000 bpf: Fix 32 bit src register truncation on div/mod commit e88b2c6e5a4d9ce30d75391e4d950da74bb2bd90 upstream. While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example: # bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...] In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6: div, 64 bit: div, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit mod, 64 bit: mod, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero. Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend Signed-off-by: John Fastabend Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 569033c0825e4d90f7e824696dd334d239adc997 Author: Daniel Borkmann Date: Fri Feb 5 20:48:21 2021 +0100 bpf: Fix verifier jmp32 pruning decision logic commit fd675184fc7abfd1e1c52d23e8e900676b5a1c1a upstream. Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 808464450 1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b4) w4 = 808464432 2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0 2: (9c) w4 %= w0 3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 3: (66) if w4 s> 0x30303030 goto pc+0 R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit propagating r0 from 6 to 7: safe 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 propagating r0 7: safe propagating r0 from 6 to 7: safe processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 The underlying program was xlated as follows: # bpftool p d x i 10 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 5: (66) if w4 s> 0x30303030 goto pc+0 6: (7f) r0 >>= r0 7: (bc) w0 = w0 8: (15) if r0 == 0x0 goto pc+1 9: (9c) w4 %= w0 10: (66) if w0 s> 0x3030 goto pc+0 11: (d6) if w0 s<= 0x303030 goto pc+1 12: (05) goto pc-1 13: (95) exit The verifier rewrote original instructions it recognized as dead code with 'goto pc-1', but reality differs from verifier simulation in that we are actually able to trigger a hang due to hitting the 'goto pc-1' instructions. Taking a closer look at the verifier analysis, the reason is that it misjudges its pruning decision at the first 'from 6 to 7: safe' occasion. What happens is that while both old/cur registers are marked as precise, they get misjudged for the jmp32 case as range_within() yields true, meaning that the prior verification path with a wider register bound could be verified successfully and therefore the current path with a narrower register bound is deemed safe as well whereas in reality it's not. R0 old/cur path's bounds compare as follows: old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff) cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff) old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff The 64 bit bounds generally look okay and while the information that got propagated from 32 to 64 bit looks correct as well, it's not precise enough for judging a conditional jmp32. Given the latter only operates on subregisters we also need to take these into account as well for a range_within() probe in order to be able to prune paths. Extending the range_within() constraint to both bounds will be able to tell us that the old signed 32 bit bounds are not wider than the cur signed 32 bit bounds. With the fix in place, the program will now verify the 'goto' branch case as it should have been: [...] 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit 7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1 The bug is quite subtle in the sense that when verifier would determine that a given branch is dead code, it would (here: wrongly) remove these instructions from the program and hard-wire the taken branch for privileged programs instead of the 'goto pc-1' rewrites which will cause hard to debug problems. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Anatoly Trosinenko Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit bf9e4307920ffc4fc03ede270f93f2b26275df5b Author: Mark Brown Date: Fri Jan 22 13:20:42 2021 +0000 regulator: Fix lockdep warning resolving supplies [ Upstream commit 14a71d509ac809dcf56d7e3ca376b15d17bd0ddd ] With commit eaa7995c529b54 (regulator: core: avoid regulator_resolve_supply() race condition) we started holding the rdev lock while resolving supplies, an operation that requires holding the regulator_list_mutex. This results in lockdep warnings since in other places we take the list mutex then the mutex on an individual rdev. Since the goal is to make sure that we don't call set_supply() twice rather than a concern about the cost of resolution pull the rdev lock and check for duplicate resolution down to immediately before we do the set_supply() and drop it again once the allocation is done. Fixes: eaa7995c529b54 (regulator: core: avoid regulator_resolve_supply() race condition) Reported-by: Marek Szyprowski Tested-by: Marek Szyprowski Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20210122132042.10306-1-broonie@kernel.org Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit fb8f9b2f7d229a8bc74db1cfa53814a0d6b42b7f Author: Baolin Wang Date: Thu Jan 28 13:58:15 2021 +0800 blk-cgroup: Use cond_resched() when destroy blkgs [ Upstream commit 6c635caef410aa757befbd8857c1eadde5cc22ed ] On !PREEMPT kernel, we can get below softlockup when doing stress testing with creating and destroying block cgroup repeatly. The reason is it may take a long time to acquire the queue's lock in the loop of blkcg_destroy_blkgs(), or the system can accumulate a huge number of blkgs in pathological cases. We can add a need_resched() check on each loop and release locks and do cond_resched() if true to avoid this issue, since the blkcg_destroy_blkgs() is not called from atomic contexts. [ 4757.010308] watchdog: BUG: soft lockup - CPU#11 stuck for 94s! [ 4757.010698] Call trace: [ 4757.010700]  blkcg_destroy_blkgs+0x68/0x150 [ 4757.010701]  cgwb_release_workfn+0x104/0x158 [ 4757.010702]  process_one_work+0x1bc/0x3f0 [ 4757.010704]  worker_thread+0x164/0x468 [ 4757.010705]  kthread+0x108/0x138 Suggested-by: Tejun Heo Signed-off-by: Baolin Wang Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit 4d00f1bade7859c6b884f267441155cbf8446f6c Author: Qii Wang Date: Sat Jan 9 16:29:50 2021 +0800 i2c: mediatek: Move suspend and resume handling to NOIRQ phase [ Upstream commit de96c3943f591018727b862f51953c1b6c55bcc3 ] Some i2c device driver indirectly uses I2C driver when it is now being suspended. The i2c devices driver is suspended during the NOIRQ phase and this cannot be changed due to other dependencies. Therefore, we also need to move the suspend handling for the I2C controller driver to the NOIRQ phase as well. Signed-off-by: Qii Wang Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin commit 518416a75c22c055ed3eab44883f4eb24be8eaaf Author: Dave Wysochanski Date: Thu Jan 21 16:17:24 2021 -0500 SUNRPC: Handle 0 length opaque XDR object data properly [ Upstream commit e4a7d1f7707eb44fd953a31dd59eff82009d879c ] When handling an auth_gss downcall, it's possible to get 0-length opaque object for the acceptor. In the case of a 0-length XDR object, make sure simple_get_netobj() fills in dest->data = NULL, and does not continue to kmemdup() which will set dest->data = ZERO_SIZE_PTR for the acceptor. The trace event code can handle NULL but not ZERO_SIZE_PTR for a string, and so without this patch the rpcgss_context trace event will crash the kernel as follows: [ 162.887992] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 162.898693] #PF: supervisor read access in kernel mode [ 162.900830] #PF: error_code(0x0000) - not-present page [ 162.902940] PGD 0 P4D 0 [ 162.904027] Oops: 0000 [#1] SMP PTI [ 162.905493] CPU: 4 PID: 4321 Comm: rpc.gssd Kdump: loaded Not tainted 5.10.0 #133 [ 162.908548] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 162.910978] RIP: 0010:strlen+0x0/0x20 [ 162.912505] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 162.920101] RSP: 0018:ffffaec900c77d90 EFLAGS: 00010202 [ 162.922263] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffde697 [ 162.925158] RDX: 000000000000002f RSI: 0000000000000080 RDI: 0000000000000010 [ 162.928073] RBP: 0000000000000010 R08: 0000000000000e10 R09: 0000000000000000 [ 162.930976] R10: ffff8e698a590cb8 R11: 0000000000000001 R12: 0000000000000e10 [ 162.933883] R13: 00000000fffde697 R14: 000000010034d517 R15: 0000000000070028 [ 162.936777] FS: 00007f1e1eb93700(0000) GS:ffff8e6ab7d00000(0000) knlGS:0000000000000000 [ 162.940067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 162.942417] CR2: 0000000000000010 CR3: 0000000104eba000 CR4: 00000000000406e0 [ 162.945300] Call Trace: [ 162.946428] trace_event_raw_event_rpcgss_context+0x84/0x140 [auth_rpcgss] [ 162.949308] ? __kmalloc_track_caller+0x35/0x5a0 [ 162.951224] ? gss_pipe_downcall+0x3a3/0x6a0 [auth_rpcgss] [ 162.953484] gss_pipe_downcall+0x585/0x6a0 [auth_rpcgss] [ 162.955953] rpc_pipe_write+0x58/0x70 [sunrpc] [ 162.957849] vfs_write+0xcb/0x2c0 [ 162.959264] ksys_write+0x68/0xe0 [ 162.960706] do_syscall_64+0x33/0x40 [ 162.962238] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 162.964346] RIP: 0033:0x7f1e1f1e57df Signed-off-by: Dave Wysochanski Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit eda725f8cfe0f2fd79d8c42ce9d618d3e30ce43c Author: Dave Wysochanski Date: Thu Jan 21 16:17:23 2021 -0500 SUNRPC: Move simple_get_bytes and simple_get_netobj into private header [ Upstream commit ba6dfce47c4d002d96cd02a304132fca76981172 ] Remove duplicated helper functions to parse opaque XDR objects and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h. In the new file carry the license and copyright from the source file net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside include/linux/sunrpc/xdr.h since lockd is not the only user of struct xdr_netobj. Signed-off-by: Dave Wysochanski Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit 6fb6d5410e415b8b4ea6020fd2f8539fe7527b61 Author: Johannes Berg Date: Fri Jan 22 14:52:42 2021 +0200 iwlwifi: queue: bail out on invalid freeing [ Upstream commit 0bed6a2a14afaae240cc431e49c260568488b51c ] If we find an entry without an SKB, we currently continue, but that will just result in an infinite loop since we won't increment the read pointer, and will try the same thing over and over again. Fix this. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210122144849.abe2dedcc3ac.Ia6b03f9eeb617fd819e56dd5376f4bb8edc7b98a@changeid Signed-off-by: Sasha Levin commit 38da9b033becfb61e8d045a5f3d2ff4fc9f90f9b Author: Johannes Berg Date: Fri Jan 22 14:52:41 2021 +0200 iwlwifi: mvm: guard against device removal in reprobe [ Upstream commit 7a21b1d4a728a483f07c638ccd8610d4b4f12684 ] If we get into a problem severe enough to attempt a reprobe, we schedule a worker to do that. However, if the problem gets more severe and the device is actually destroyed before this worker has a chance to run, we use a free device. Bump up the reference count of the device until the worker runs to avoid this situation. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210122144849.871f0892e4b2.I94819e11afd68d875f3e242b98bef724b8236f1e@changeid Signed-off-by: Sasha Levin commit 2262294d4258e790123fc1277587ceda88b44016 Author: Luca Coelho Date: Fri Jan 22 14:52:38 2021 +0200 iwlwifi: pcie: add rules to match Qu with Hr2 [ Upstream commit 16062c12edb8ed2dfb15e6a914ff4edf858ab9e0 ] Until now we have been relying on matching the PCI ID and subsystem device ID in order to recognize Qu devices with Hr2. Add rules to match these devices, so that we don't have to add a new rule for every new ID we get. Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210122144849.591ce253ddd8.Ia4b9cc2c535625890c6d6b560db97ee9f2d5ca3b@changeid Signed-off-by: Sasha Levin commit 492f762b9c16d06a915f845630c689d4fd95faef Author: Gregory Greenman Date: Fri Jan 22 14:52:37 2021 +0200 iwlwifi: mvm: invalidate IDs of internal stations at mvm start [ Upstream commit e223e42aac30bf81f9302c676cdf58cf2bf36950 ] Having sta_id not set for aux_sta and snif_sta can potentially lead to a hard to debug issue in case remove station is called without an add. In this case sta_id 0, an unrelated regular station, will be removed. In fact, we do have a FW assert that occures rarely and from the debug data analysis it looks like sta_id 0 is removed by mistake, though it's hard to pinpoint the exact flow. The WARN_ON in this patch should help to find it. Signed-off-by: Gregory Greenman Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210122144849.5dc6dd9b22d5.I2add1b5ad24d0d0a221de79d439c09f88fcaf15d@changeid Signed-off-by: Sasha Levin commit 05132a72cc1d1be8f4b20ba769bad9210d5df513 Author: Johannes Berg Date: Fri Jan 15 13:05:56 2021 +0200 iwlwifi: pcie: fix context info memory leak [ Upstream commit 2d6bc752cc2806366d9a4fd577b3f6c1f7a7e04e ] If the image loader allocation fails, we leak all the previously allocated memory. Fix this. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210115130252.97172cbaa67c.I3473233d0ad01a71aa9400832fb2b9f494d88a11@changeid Signed-off-by: Sasha Levin commit fbdf0bf97cb0e8d221ea5fc79f547b634a09e148 Author: Emmanuel Grumbach Date: Fri Jan 15 13:05:55 2021 +0200 iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap [ Upstream commit 98c7d21f957b10d9c07a3a60a3a5a8f326a197e5 ] I hit a NULL pointer exception in this function when the init flow went really bad. Signed-off-by: Emmanuel Grumbach Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210115130252.2e8da9f2c132.I0234d4b8ddaf70aaa5028a20c863255e05bc1f84@changeid Signed-off-by: Sasha Levin commit cc1d805aa544673a26c26eb57b8e4700e53f71ad Author: Johannes Berg Date: Fri Jan 15 13:05:48 2021 +0200 iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time() [ Upstream commit 5c56d862c749669d45c256f581eac4244be00d4d ] We need to take the mutex to call iwl_mvm_get_sync_time(), do it. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210115130252.4bb5ccf881a6.I62973cbb081e80aa5b0447a5c3b9c3251a65cf6b@changeid Signed-off-by: Sasha Levin commit a90e8588f7eb615527a49ee5451314fe65be5270 Author: Sara Sharon Date: Fri Jan 15 13:05:47 2021 +0200 iwlwifi: mvm: skip power command when unbinding vif during CSA [ Upstream commit bf544e9aa570034e094a8a40d5f9e1e2c4916d18 ] In the new CSA flow, we remain associated during CSA, but still do a unbind-bind to the vif. However, sending the power command right after when vif is unbound but still associated causes FW to assert (0x3400) since it cannot tell the LMAC id. Just skip this command, we will send it again in a bit, when assigning the new context. Signed-off-by: Sara Sharon Signed-off-by: Luca Coelho Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/iwlwifi.20210115130252.64a2254ac5c3.Iaa3a9050bf3d7c9cd5beaf561e932e6defc12ec3@changeid Signed-off-by: Sasha Levin commit 428831e8e9aa3fae461b8a85f1fbec505e47a7fd Author: Libin Yang Date: Mon Jan 25 10:11:17 2021 +0200 ASoC: Intel: sof_sdw: set proper flags for Dell TGL-H SKU 0A5E [ Upstream commit 9ad9bc59dde106e56dd59ce2bec7c1b08e1f0eb4 ] Add flag "SOF_RT711_JD_SRC_JD2", flag "SOF_RT715_DAI_ID_FIX" and "SOF_SDW_FOUR_SPK" to the Dell TGL-H based SKU "0A5E". Signed-off-by: Libin Yang Co-developed-by: Hui Wang Signed-off-by: Hui Wang Reviewed-by: Bard Liao Reviewed-by: Pierre-Louis Bossart Signed-off-by: Kai Vehmanen Link: https://lore.kernel.org/r/20210125081117.814488-1-kai.vehmanen@linux.intel.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit b579c572d4cf032a7c5a98ff9d043dfebe85bd43 Author: Eliot Blennerhassett Date: Fri Jan 22 21:27:08 2021 +1300 ASoC: ak4458: correct reset polarity [ Upstream commit e953daeb68b1abd8a7d44902786349fdeef5c297 ] Reset (aka power off) happens when the reset gpio is made active. Change function name to ak4458_reset to match devicetree property "reset-gpios" Signed-off-by: Eliot Blennerhassett Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/ce650f47-4ff6-e486-7846-cc3d033f3601@blennerhassett.gen.nz Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit f0e3c36a524418603d9493966817f4c6d8b7b34e Author: Bard Liao Date: Mon Jan 25 10:30:51 2021 +0200 ALSA: hda: intel-dsp-config: add PCI id for TGL-H [ Upstream commit c5b5ff607d6fe5f4284acabd07066f96ecf96ac4 ] Adding PCI id for TGL-H. Like for other TGL platforms, SOF is used if Soundwire codecs or PCH-DMIC is detected. Signed-off-by: Bard Liao Reviewed-by: Xiuli Pan Reviewed-by: Libin Yang Signed-off-by: Kai Vehmanen Link: https://lore.kernel.org/r/20210125083051.828205-1-kai.vehmanen@linux.intel.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin commit ff557bf971ad6ab34817a82209c0396b8820c9fe Author: Trond Myklebust Date: Fri Jan 22 10:05:51 2021 -0500 pNFS/NFSv4: Improve rejection of out-of-order layouts [ Upstream commit d29b468da4f940bd2bff2628ba8d2d652671d244 ] If a layoutget ends up being reordered w.r.t. a layoutreturn, e.g. due to a layoutget-on-open not knowing a priori which file to lock, then we must assume the layout is no longer being considered valid state by the server. Incrementally improve our ability to reject such states by using the cached old stateid in conjunction with the plh_barrier to try to identify them. Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit 386b142945d3a8842c615421158f6d8565ef86e7 Author: Trond Myklebust Date: Thu Jan 21 17:11:42 2021 -0500 pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process() [ Upstream commit 08bd8dbe88825760e953759d7ec212903a026c75 ] If the server returns a new stateid that does not match the one in our cache, then try to return the one we hold instead of just invalidating it on the client side. This ensures that both client and server will agree that the stateid is invalid. Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin commit 8007199fe372a7de7a3bc4dc0dca364f2129923a Author: Pan Bian Date: Thu Jan 21 06:57:38 2021 -0800 chtls: Fix potential resource leak [ Upstream commit b6011966ac6f402847eb5326beee8da3a80405c7 ] The dst entry should be released if no neighbour is found. Goto label free_dst to fix the issue. Besides, the check of ndev against NULL is redundant. Signed-off-by: Pan Bian Link: https://lore.kernel.org/r/20210121145738.51091-1-bianpan2016@163.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 439ac48a33c5d2bc0103b1f6a867d64ce0f54868 Author: Ricardo Ribalda Date: Thu Jan 21 18:16:44 2021 +0100 ASoC: Intel: Skylake: Zero snd_ctl_elem_value [ Upstream commit 1d8fe0648e118fd495a2cb393a34eb8d428e7808 ] Clear struct snd_ctl_elem_value before calling ->put() to avoid any data leak. Signed-off-by: Ricardo Ribalda Reviewed-by: Cezary Rojewski Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20210121171644.131059-2-ribalda@chromium.org Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 4618aea344486e956e4a34ce2b1ad7a8d5992bcf Author: Shay Bar Date: Tue Dec 22 08:47:14 2020 +0200 mac80211: 160MHz with extended NSS BW in CSA [ Upstream commit dcf3c8fb32ddbfa3b8227db38aa6746405bd4527 ] Upon receiving CSA with 160MHz extended NSS BW from associated AP, STA should set the HT operation_mode based on new_center_freq_seg1 because it is later used as ccfs2 in ieee80211_chandef_vht_oper(). Signed-off-by: Aviad Brikman Signed-off-by: Shay Bar Link: https://lore.kernel.org/r/20201222064714.24888-1-shay.bar@celeno.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 676575b93ddf5bd420453ca5b942a1a7a85121ba Author: Ben Skeggs Date: Tue Jan 19 15:53:35 2021 +1000 drm/nouveau/nvif: fix method count when pushing an array [ Upstream commit d502297008142645edf5c791af424ed321e5da84 ] Reported-by: Lyude Paul Signed-off-by: Ben Skeggs Signed-off-by: Sasha Levin commit 4b877845e388964afed4de987763e3149f8dc375 Author: James Schulman Date: Fri Jan 15 14:11:05 2021 -0600 ASoC: wm_adsp: Fix control name parsing for multi-fw [ Upstream commit a8939f2e138e418c2b059056ff5b501eaf2eae54 ] When switching between firmware types, the wrong control can be selected when requesting control in kernel API. Use the currently selected DSP firwmare type to select the proper mixer control. Signed-off-by: James Schulman Acked-by: Charles Keepax Link: https://lore.kernel.org/r/20210115201105.14075-1-james.schulman@cirrus.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 61e97f32fded4d6d76a4996486304c5af091a91f Author: David Collins Date: Thu Jan 7 17:16:02 2021 -0800 regulator: core: avoid regulator_resolve_supply() race condition [ Upstream commit eaa7995c529b54d68d97a30f6344cc6ca2f214a7 ] The final step in regulator_register() is to call regulator_resolve_supply() for each registered regulator (including the one in the process of being registered). The regulator_resolve_supply() function first checks if rdev->supply is NULL, then it performs various steps to try to find the supply. If successful, rdev->supply is set inside of set_supply(). This procedure can encounter a race condition if two concurrent tasks call regulator_register() near to each other on separate CPUs and one of the regulators has rdev->supply_name specified. There is currently nothing guaranteeing atomicity between the rdev->supply check and set steps. Thus, both tasks can observe rdev->supply==NULL in their regulator_resolve_supply() calls. This then results in both creating a struct regulator for the supply. One ends up actually stored in rdev->supply and the other is lost (though still present in the supply's consumer_list). Here is a kernel log snippet showing the issue: [ 12.421768] gpu_cc_gx_gdsc: supplied by pm8350_s5_level [ 12.425854] gpu_cc_gx_gdsc: supplied by pm8350_s5_level [ 12.429064] debugfs: Directory 'regulator.4-SUPPLY' with parent '17a00000.rsc:rpmh-regulator-gfxlvl-pm8350_s5_level' already present! Avoid this race condition by holding the rdev->mutex lock inside of regulator_resolve_supply() while checking and setting rdev->supply. Signed-off-by: David Collins Link: https://lore.kernel.org/r/1610068562-4410-1-git-send-email-collinsd@codeaurora.org Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 1c19d6ae581b883cdbd20deee16ff3da7ced1559 Author: Cong Wang Date: Sat Dec 26 16:50:20 2020 -0800 af_key: relax availability checks for skb size calculation [ Upstream commit afbc293add6466f8f3f0c3d944d85f53709c170f ] xfrm_probe_algs() probes kernel crypto modules and changes the availability of struct xfrm_algo_desc. But there is a small window where ealg->available and aalg->available get changed between count_ah_combs()/count_esp_combs() and dump_ah_combs()/dump_esp_combs(), in this case we may allocate a smaller skb but later put a larger amount of data and trigger the panic in skb_put(). Fix this by relaxing the checks when counting the size, that is, skipping the test of ->available. We may waste some memory for a few of sizeof(struct sadb_comb), but it is still much better than a panic. Reported-by: syzbot+b2bf2652983d23734c5c@syzkaller.appspotmail.com Cc: Steffen Klassert Cc: Herbert Xu Signed-off-by: Cong Wang Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin commit 7f546959b378f97bed2c1894b74f2c4bb84e0530 Author: Raoni Fassina Firmino Date: Mon Feb 1 17:05:05 2021 -0300 powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics commit 24321ac668e452a4942598533d267805f291fdc9 upstream. Commit 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in signal trampoline") changed __kernel_sigtramp_rt64() VDSO and trampoline code, and introduced a regression in the way glibc's backtrace()[1] detects the signal-handler stack frame. Apart from the practical implications, __kernel_sigtramp_rt64() was a VDSO function with the semantics that it is a function you can call from userspace to end a signal handling. Now this semantics are no longer valid. I believe the aforementioned change affects all releases since 5.9. This patch tries to fix both the semantics and practical aspect of __kernel_sigtramp_rt64() returning it to the previous code, whilst keeping the intended behaviour of 0138ba5783ae by adding a new symbol to serve as the jump target from the kernel to the trampoline. Now the trampoline has two parts, a new entry point and the old return point. [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-January/223194.html Fixes: 0138ba5783ae ("powerpc/64/signal: Balance return predictor stack in signal trampoline") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Raoni Fassina Firmino Acked-by: Nicholas Piggin [mpe: Minor tweaks to change log formatting, add stable tag] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210201200505.iz46ubcizipnkcxe@work-tp Signed-off-by: Greg Kroah-Hartman commit 3cb8393c4143a82bea94fefd8b6eb7b97f214931 Author: Kent Gibson Date: Thu Jan 21 22:10:38 2021 +0800 gpiolib: cdev: clear debounce period if line set to output commit 03a58ea5905fdbd93ff9e52e670d802600ba38cd upstream. When set_config changes a line from input to output debounce is implicitly disabled, as debounce makes no sense for outputs, but the debounce period is not being cleared and is still reported in the line info. So clear the debounce period when the debouncer is stopped in edge_detector_stop(). Fixes: 65cff7046406 ("gpiolib: cdev: support setting debounce") Cc: stable@vger.kernel.org Signed-off-by: Kent Gibson Reviewed-by: Linus Walleij Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman commit 5592eae7846ca1279591624ecf89513dfb5840bb Author: Pavel Begunkov Date: Tue Feb 9 04:47:50 2021 +0000 io_uring: drop mm/files between task_work_submit [ Upstream commit aec18a57edad562d620f7d19016de1fc0cc2208c ] Since SQPOLL task can be shared and so task_work entries can be a mix of them, we need to drop mm and files before trying to issue next request. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 88dbd085a51ec78c83dde79ad63bca8aa4272a9d Author: Pavel Begunkov Date: Tue Feb 9 04:47:49 2021 +0000 io_uring: reinforce cancel on flush during exit [ Upstream commit 3a7efd1ad269ccaf9c1423364d97c9661ba6dafa ] What 84965ff8a84f0 ("io_uring: if we see flush on exit, cancel related tasks") really wants is to cancel all relevant REQ_F_INFLIGHT requests reliably. That can be achieved by io_uring_cancel_files(), but we'll miss it calling io_uring_cancel_task_requests(files=NULL) from io_uring_flush(), because it will go through __io_uring_cancel_task_requests(). Just always call io_uring_cancel_files() during cancel, it's good enough for now. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit aa435155d396ccccef1af1ba7b1f0b650e1c80e0 Author: Pavel Begunkov Date: Tue Feb 9 04:47:48 2021 +0000 io_uring: fix sqo ownership false positive warning [ Upstream commit 70b2c60d3797bffe182dddb9bb55975b9be5889a ] WARNING: CPU: 0 PID: 21359 at fs/io_uring.c:9042 io_uring_cancel_task_requests+0xe55/0x10c0 fs/io_uring.c:9042 Call Trace: io_uring_flush+0x47b/0x6e0 fs/io_uring.c:9227 filp_close+0xb4/0x170 fs/open.c:1295 close_files fs/file.c:403 [inline] put_files_struct fs/file.c:418 [inline] put_files_struct+0x1cc/0x350 fs/file.c:415 exit_files+0x7e/0xa0 fs/file.c:435 do_exit+0xc22/0x2ae0 kernel/exit.c:820 do_group_exit+0x125/0x310 kernel/exit.c:922 get_signal+0x427/0x20f0 kernel/signal.c:2773 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Now io_uring_cancel_task_requests() can be called not through file notes but directly, remove a WARN_ONCE() there that give us false positives. That check is not very important and we catch it in other places. Fixes: 84965ff8a84f0 ("io_uring: if we see flush on exit, cancel related tasks") Cc: stable@vger.kernel.org # 5.9+ Reported-by: syzbot+3e3d9bd0c6ce9efbc3ef@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 8c7febfc919a370b502714958e88b186df5538c4 Author: Pavel Begunkov Date: Tue Feb 9 04:47:47 2021 +0000 io_uring: fix list corruption for splice file_get [ Upstream commit f609cbb8911e40e15f9055e8f945f926ac906924 ] kernel BUG at lib/list_debug.c:29! Call Trace: __list_add include/linux/list.h:67 [inline] list_add include/linux/list.h:86 [inline] io_file_get+0x8cc/0xdb0 fs/io_uring.c:6466 __io_splice_prep+0x1bc/0x530 fs/io_uring.c:3866 io_splice_prep fs/io_uring.c:3920 [inline] io_req_prep+0x3546/0x4e80 fs/io_uring.c:6081 io_queue_sqe+0x609/0x10d0 fs/io_uring.c:6628 io_submit_sqe fs/io_uring.c:6705 [inline] io_submit_sqes+0x1495/0x2720 fs/io_uring.c:6953 __do_sys_io_uring_enter+0x107d/0x1f30 fs/io_uring.c:9353 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 io_file_get() may be called from splice, and so REQ_F_INFLIGHT may already be set. Fixes: 02a13674fa0e8 ("io_uring: account io_uring internal files as REQ_F_INFLIGHT") Cc: stable@vger.kernel.org # 5.9+ Reported-by: syzbot+6879187cf57845801267@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 7250f333ce03a41791c00b0975b64799838751a6 Author: Hao Xu Date: Tue Feb 9 04:47:46 2021 +0000 io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE [ Upstream commit 6195ba09822c87cad09189bbf550d0fbe714687a ] Abaci reported the follow warning: [ 27.073425] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_exclusive+0x3a/0xc0 [ 27.075805] WARNING: CPU: 0 PID: 951 at kernel/sched/core.c:7853 __might_sleep+0x80/0xa0 [ 27.077604] Modules linked in: [ 27.078379] CPU: 0 PID: 951 Comm: a.out Not tainted 5.11.0-rc3+ #1 [ 27.079637] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 27.080852] RIP: 0010:__might_sleep+0x80/0xa0 [ 27.081835] Code: 65 48 8b 04 25 80 71 01 00 48 8b 90 c0 15 00 00 48 8b 70 18 48 c7 c7 08 39 95 82 c6 05 f9 5f de 08 01 48 89 d1 e8 00 c6 fa ff 0b eb bf 41 0f b6 f5 48 c7 c7 40 23 c9 82 e8 f3 48 ec 00 eb a7 [ 27.084521] RSP: 0018:ffffc90000fe3ce8 EFLAGS: 00010286 [ 27.085350] RAX: 0000000000000000 RBX: ffffffff82956083 RCX: 0000000000000000 [ 27.086348] RDX: ffff8881057a0000 RSI: ffffffff8118cc9e RDI: ffff88813bc28570 [ 27.087598] RBP: 00000000000003a7 R08: 0000000000000001 R09: 0000000000000001 [ 27.088819] R10: ffffc90000fe3e00 R11: 00000000fffef9f0 R12: 0000000000000000 [ 27.089819] R13: 0000000000000000 R14: ffff88810576eb80 R15: ffff88810576e800 [ 27.091058] FS: 00007f7b144cf740(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 [ 27.092775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 27.093796] CR2: 00000000022da7b8 CR3: 000000010b928002 CR4: 00000000003706f0 [ 27.094778] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 27.095780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 27.097011] Call Trace: [ 27.097685] __mutex_lock+0x5d/0xa30 [ 27.098565] ? prepare_to_wait_exclusive+0x71/0xc0 [ 27.099412] ? io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.100441] ? lockdep_hardirqs_on_prepare+0xe9/0x1c0 [ 27.101537] ? _raw_spin_unlock_irqrestore+0x2d/0x40 [ 27.102656] ? trace_hardirqs_on+0x46/0x110 [ 27.103459] ? io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.104317] io_cqring_overflow_flush.part.101+0x6d/0x70 [ 27.105113] io_cqring_wait+0x36e/0x4d0 [ 27.105770] ? find_held_lock+0x28/0xb0 [ 27.106370] ? io_uring_remove_task_files+0xa0/0xa0 [ 27.107076] __x64_sys_io_uring_enter+0x4fb/0x640 [ 27.107801] ? rcu_read_lock_sched_held+0x59/0xa0 [ 27.108562] ? lockdep_hardirqs_on_prepare+0xe9/0x1c0 [ 27.109684] ? syscall_enter_from_user_mode+0x26/0x70 [ 27.110731] do_syscall_64+0x2d/0x40 [ 27.111296] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 27.112056] RIP: 0033:0x7f7b13dc8239 [ 27.112663] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 27.115113] RSP: 002b:00007ffd6d7f5c88 EFLAGS: 00000286 ORIG_RAX: 00000000000001aa [ 27.116562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b13dc8239 [ 27.117961] RDX: 000000000000478e RSI: 0000000000000000 RDI: 0000000000000003 [ 27.118925] RBP: 00007ffd6d7f5cb0 R08: 0000000020000040 R09: 0000000000000008 [ 27.119773] R10: 0000000000000001 R11: 0000000000000286 R12: 0000000000400480 [ 27.120614] R13: 00007ffd6d7f5d90 R14: 0000000000000000 R15: 0000000000000000 [ 27.121490] irq event stamp: 5635 [ 27.121946] hardirqs last enabled at (5643): [] console_unlock+0x5c4/0x740 [ 27.123476] hardirqs last disabled at (5652): [] console_unlock+0x4e7/0x740 [ 27.125192] softirqs last enabled at (5272): [] __do_softirq+0x3c5/0x5aa [ 27.126430] softirqs last disabled at (5267): [] asm_call_irq_on_stack+0xf/0x20 [ 27.127634] ---[ end trace 289d7e28fa60f928 ]--- This is caused by calling io_cqring_overflow_flush() which may sleep after calling prepare_to_wait_exclusive() which set task state to TASK_INTERRUPTIBLE Reported-by: Abaci Fixes: 6c503150ae33 ("io_uring: patch up IOPOLL overflow_flush sync") Reviewed-by: Pavel Begunkov Signed-off-by: Hao Xu Signed-off-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: Greg Kroah-Hartman commit d300d03a93a221e8665ae4a26985b07f7bdc375b Author: Pavel Begunkov Date: Tue Feb 9 04:47:45 2021 +0000 io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE [ Upstream commit ca70f00bed6cb255b7a9b91aa18a2717c9217f70 ] do not call blocking ops when !TASK_RUNNING; state=2 set at [<00000000ced9dbfc>] prepare_to_wait+0x1f4/0x3b0 kernel/sched/wait.c:262 WARNING: CPU: 1 PID: 19888 at kernel/sched/core.c:7853 __might_sleep+0xed/0x100 kernel/sched/core.c:7848 RIP: 0010:__might_sleep+0xed/0x100 kernel/sched/core.c:7848 Call Trace: __mutex_lock_common+0xc4/0x2ef0 kernel/locking/mutex.c:935 __mutex_lock kernel/locking/mutex.c:1103 [inline] mutex_lock_nested+0x1a/0x20 kernel/locking/mutex.c:1118 io_wq_submit_work+0x39a/0x720 fs/io_uring.c:6411 io_run_cancel fs/io-wq.c:856 [inline] io_wqe_cancel_pending_work fs/io-wq.c:990 [inline] io_wq_cancel_cb+0x614/0xcb0 fs/io-wq.c:1027 io_uring_cancel_files fs/io_uring.c:8874 [inline] io_uring_cancel_task_requests fs/io_uring.c:8952 [inline] __io_uring_files_cancel+0x115d/0x19e0 fs/io_uring.c:9038 io_uring_files_cancel include/linux/io_uring.h:51 [inline] do_exit+0x2e6/0x2490 kernel/exit.c:780 do_group_exit+0x168/0x2d0 kernel/exit.c:922 get_signal+0x16b5/0x2030 kernel/signal.c:2770 arch_do_signal_or_restart+0x8e/0x6a0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x48/0x190 kernel/entry/common.c:302 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Rewrite io_uring_cancel_files() to mimic __io_uring_task_cancel()'s counting scheme, so it does all the heavy work before setting TASK_UNINTERRUPTIBLE. Cc: stable@vger.kernel.org # 5.9+ Reported-by: syzbot+f655445043a26a7cfab8@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov [axboe: fix inverted task check] Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 52382df81d292607f0dfabf6daa32b7d5e56b633 Author: Pavel Begunkov Date: Tue Feb 9 04:47:44 2021 +0000 io_uring: replace inflight_wait with tctx->wait [ Upstream commit c98de08c990e190fc7cc3aaf8079b4a0674c6425 ] As tasks now cancel only theirs requests, and inflight_wait is awaited only in io_uring_cancel_files(), which should be called with ->in_idle set, instead of keeping a separate inflight_wait use tctx->wait. That will add some spurious wakeups but actually is safer from point of not hanging the task. e.g. task1 | IRQ | *start* io_complete_rw_common(link) | link: req1 -> req2 -> req3(with files) *cancel_files() | io_wq_cancel(), etc. | | put_req(link), adds to io-wq req2 schedule() | So, task1 will never try to cancel req2 or req3. If req2 is long-standing (e.g. read(empty_pipe)), this may hang. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit b462a7beab3fb9fdec832bcf064077828125abf0 Author: Pavel Begunkov Date: Tue Feb 9 04:47:43 2021 +0000 io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE [ Upstream commit a1bb3cd58913338e1b627ea6b8c03c2ae82d293f ] If the tctx inflight number haven't changed because of cancellation, __io_uring_task_cancel() will continue leaving the task in TASK_UNINTERRUPTIBLE state, that's not expected by __io_uring_files_cancel(). Ensure we always call finish_wait() before retrying. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit f0ff1a95bfa873e9b5e5883cc07d37fc4ae6bbca Author: Jens Axboe Date: Tue Feb 9 04:47:42 2021 +0000 io_uring: if we see flush on exit, cancel related tasks [ Upstream commit 84965ff8a84f0368b154c9b367b62e59c1193f30 ] Ensure we match tasks that belong to a dead or dying task as well, as we need to reap those in addition to those belonging to the exiting task. Cc: stable@vger.kernel.org # 5.9+ Reported-by: Josef Grieb Signed-off-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: Greg Kroah-Hartman commit d16692a34e8e60c76e0064ee7805bd5db1b0ef3b Author: Jens Axboe Date: Tue Feb 9 04:47:41 2021 +0000 io_uring: account io_uring internal files as REQ_F_INFLIGHT [ Upstream commit 02a13674fa0e8dd326de8b9f4514b41b03d99003 ] We need to actively cancel anything that introduces a potential circular loop, where io_uring holds a reference to itself. If the file in question is an io_uring file, then add the request to the inflight list. Cc: stable@vger.kernel.org # 5.9+ Signed-off-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: Greg Kroah-Hartman commit 1e7eb063a0f084cbed2cd8db39e9644642130ff0 Author: Pavel Begunkov Date: Tue Feb 9 04:47:40 2021 +0000 io_uring: fix files cancellation [ Upstream commit bee749b187ac57d1faf00b2ab356ff322230fce8 ] io_uring_cancel_files()'s task check condition mistakenly got flipped. 1. There can't be a request in the inflight list without IO_WQ_WORK_FILES, kill this check to keep the whole condition simpler. 2. Also, don't call the function for files==NULL to not do such a check, all that staff is already handled well by its counter part, __io_uring_cancel_task_requests(). With that just flip the task check. Also, it iowq-cancels all request of current task there, don't forget to set right ->files into struct io_task_cancel. Fixes: c1973b38bf639 ("io_uring: cancel only requests of current task") Reported-by: syzbot+c0d52d0b3c0c3ffb9525@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit dbdcde4422dfb1a3da6d41abffc546c74190c25a Author: Pavel Begunkov Date: Tue Feb 9 04:47:39 2021 +0000 io_uring: always batch cancel in *cancel_files() [ Upstream commit f6edbabb8359798c541b0776616c5eab3a840d3d ] Instead of iterating over each request and cancelling it individually in io_uring_cancel_files(), try to cancel all matching requests and use ->inflight_list only to check if there anything left. In many cases it should be faster, and we can reuse a lot of code from task cancellation. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit f8fbdbb6079314f5f4076303cb0552f815a47aa0 Author: Pavel Begunkov Date: Tue Feb 9 04:47:38 2021 +0000 io_uring: pass files into kill timeouts/poll [ Upstream commit 6b81928d4ca8668513251f9c04cdcb9d38ef51c7 ] Make io_poll_remove_all() and io_kill_timeouts() to match against files as well. A preparation patch, effectively not used by now. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 49250f33bb436a29387f80cc64d1f40eba1ae19e Author: Pavel Begunkov Date: Tue Feb 9 04:47:37 2021 +0000 io_uring: don't iterate io_uring_cancel_files() [ Upstream commit b52fda00dd9df8b4a6de5784df94f9617f6133a1 ] io_uring_cancel_files() guarantees to cancel all matching requests, that's not necessary to do that in a loop. Move it up in the callchain into io_uring_cancel_task_requests(). Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit f6d93f855553b96d7b53ceddc0438d28de5b94df Author: Pavel Begunkov Date: Tue Feb 9 04:47:36 2021 +0000 io_uring: add a {task,files} pair matching helper [ Upstream commit 08d23634643c239ddae706758f54d3a8e0c24962 ] Add io_match_task() that matches both task and files. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit fe9334186a50166f4d5f1e9bfedd257d22e6c4a9 Author: Pavel Begunkov Date: Tue Feb 9 04:47:35 2021 +0000 io_uring: simplify io_task_match() [ Upstream commit 06de5f5973c641c7ae033f133ecfaaf64fe633a6 ] If IORING_SETUP_SQPOLL is set all requests belong to the corresponding SQPOLL task, so skip task checking in that case and always match. Signed-off-by: Pavel Begunkov Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman