summaryrefslogtreecommitdiffstats
path: root/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* mm, vmscan: move LRU lists to nodeMel Gorman2016-07-298-56/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the LRU lists from the zone to the node and related data such as counters, tracing, congestion tracking and writeback tracking. Unfortunately, due to reclaim and compaction retry logic, it is necessary to account for the number of LRU pages on both zone and node logic. Most reclaim logic is based on the node counters but the retry logic uses the zone counters which do not distinguish inactive and active sizes. It would be possible to leave the LRU counters on a per-zone basis but it's a heavier calculation across multiple cache lines that is much more frequent than the retry checks. Other than the LRU counters, this is mostly a mechanical patch but note that it introduces a number of anomalies. For example, the scans are per-zone but using per-node counters. We also mark a node as congested when a zone is congested. This causes weird problems that are fixed later but is easier to review. In the event that there is excessive overhead on 32-bit systems due to the nodes being on LRU then there are two potential solutions 1. Long-term isolation of highmem pages when reclaim is lowmem When pages are skipped, they are immediately added back onto the LRU list. If lowmem reclaim persisted for long periods of time, the same highmem pages get continually scanned. The idea would be that lowmem keeps those pages on a separate list until a reclaim for highmem pages arrives that splices the highmem pages back onto the LRU. It potentially could be implemented similar to the UNEVICTABLE list. That would reduce the skip rate with the potential corner case is that highmem pages have to be scanned and reclaimed to free lowmem slab pages. 2. Linear scan lowmem pages if the initial LRU shrink fails This will break LRU ordering but may be preferable and faster during memory pressure than skipping LRU pages. Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, vmscan: move lru_lock to the nodeMel Gorman2016-07-292-3/+9
| | | | | | | | | | | | | | | | | | | | | Node-based reclaim requires node-based LRUs and locking. This is a preparation patch that just moves the lru_lock to the node so later patches are easier to review. It is a mechanical change but note this patch makes contention worse because the LRU lock is hotter and direct reclaim and kswapd can contend on the same lock even when reclaiming from different zones. Link: http://lkml.kernel.org/r/1467970510-21195-3-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, vmstat: add infrastructure for per-node vmstatsMel Gorman2016-07-293-12/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patchset: "Move LRU page reclaim from zones to nodes v9" This series moves LRUs from the zones to the node. While this is a current rebase, the test results were based on mmotm as of June 23rd. Conceptually, this series is simple but there are a lot of details. Some of the broad motivations for this are; 1. The residency of a page partially depends on what zone the page was allocated from. This is partially combatted by the fair zone allocation policy but that is a partial solution that introduces overhead in the page allocator paths. 2. Currently, reclaim on node 0 behaves slightly different to node 1. For example, direct reclaim scans in zonelist order and reclaims even if the zone is over the high watermark regardless of the age of pages in that LRU. Kswapd on the other hand starts reclaim on the highest unbalanced zone. A difference in distribution of file/anon pages due to when they were allocated results can result in a difference in again. While the fair zone allocation policy mitigates some of the problems here, the page reclaim results on a multi-zone node will always be different to a single-zone node. it was scheduled on as a result. 3. kswapd and the page allocator scan zones in the opposite order to avoid interfering with each other but it's sensitive to timing. This mitigates the page allocator using pages that were allocated very recently in the ideal case but it's sensitive to timing. When kswapd is allocating from lower zones then it's great but during the rebalancing of the highest zone, the page allocator and kswapd interfere with each other. It's worse if the highest zone is small and difficult to balance. 4. slab shrinkers are node-based which makes it harder to identify the exact relationship between slab reclaim and LRU reclaim. The reason we have zone-based reclaim is that we used to have large highmem zones in common configurations and it was necessary to quickly find ZONE_NORMAL pages for reclaim. Today, this is much less of a concern as machines with lots of memory will (or should) use 64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are rare. Machines that do use highmem should have relatively low highmem:lowmem ratios than we worried about in the past. Conceptually, moving to node LRUs should be easier to understand. The page allocator plays fewer tricks to game reclaim and reclaim behaves similarly on all nodes. The series has been tested on a 16 core UMA machine and a 2-socket 48 core NUMA machine. The UMA results are presented in most cases as the NUMA machine behaved similarly. pagealloc --------- This is a microbenchmark that shows the benefit of removing the fair zone allocation policy. It was tested uip to order-4 but only orders 0 and 1 are shown as the other orders were comparable. 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v9 Min total-odr0-1 490.00 ( 0.00%) 457.00 ( 6.73%) Min total-odr0-2 347.00 ( 0.00%) 329.00 ( 5.19%) Min total-odr0-4 288.00 ( 0.00%) 273.00 ( 5.21%) Min total-odr0-8 251.00 ( 0.00%) 239.00 ( 4.78%) Min total-odr0-16 234.00 ( 0.00%) 222.00 ( 5.13%) Min total-odr0-32 223.00 ( 0.00%) 211.00 ( 5.38%) Min total-odr0-64 217.00 ( 0.00%) 208.00 ( 4.15%) Min total-odr0-128 214.00 ( 0.00%) 204.00 ( 4.67%) Min total-odr0-256 250.00 ( 0.00%) 230.00 ( 8.00%) Min total-odr0-512 271.00 ( 0.00%) 269.00 ( 0.74%) Min total-odr0-1024 291.00 ( 0.00%) 282.00 ( 3.09%) Min total-odr0-2048 303.00 ( 0.00%) 296.00 ( 2.31%) Min total-odr0-4096 311.00 ( 0.00%) 309.00 ( 0.64%) Min total-odr0-8192 316.00 ( 0.00%) 314.00 ( 0.63%) Min total-odr0-16384 317.00 ( 0.00%) 315.00 ( 0.63%) Min total-odr1-1 742.00 ( 0.00%) 712.00 ( 4.04%) Min total-odr1-2 562.00 ( 0.00%) 530.00 ( 5.69%) Min total-odr1-4 457.00 ( 0.00%) 433.00 ( 5.25%) Min total-odr1-8 411.00 ( 0.00%) 381.00 ( 7.30%) Min total-odr1-16 381.00 ( 0.00%) 356.00 ( 6.56%) Min total-odr1-32 372.00 ( 0.00%) 346.00 ( 6.99%) Min total-odr1-64 372.00 ( 0.00%) 343.00 ( 7.80%) Min total-odr1-128 375.00 ( 0.00%) 351.00 ( 6.40%) Min total-odr1-256 379.00 ( 0.00%) 351.00 ( 7.39%) Min total-odr1-512 385.00 ( 0.00%) 355.00 ( 7.79%) Min total-odr1-1024 386.00 ( 0.00%) 358.00 ( 7.25%) Min total-odr1-2048 390.00 ( 0.00%) 362.00 ( 7.18%) Min total-odr1-4096 390.00 ( 0.00%) 362.00 ( 7.18%) Min total-odr1-8192 388.00 ( 0.00%) 363.00 ( 6.44%) This shows a steady improvement throughout. The primary benefit is from reduced system CPU usage which is obvious from the overall times; 4.7.0-rc4 4.7.0-rc4 mmotm-20160623nodelru-v8 User 189.19 191.80 System 2604.45 2533.56 Elapsed 2855.30 2786.39 The vmstats also showed that the fair zone allocation policy was definitely removed as can be seen here; 4.7.0-rc3 4.7.0-rc3 mmotm-20160623 nodelru-v8 DMA32 allocs 28794729769 0 Normal allocs 48432501431 77227309877 Movable allocs 0 0 tiobench on ext4 ---------------- tiobench is a benchmark that artifically benefits if old pages remain resident while new pages get reclaimed. The fair zone allocation policy mitigates this problem so pages age fairly. While the benchmark has problems, it is important that tiobench performance remains constant as it implies that page aging problems that the fair zone allocation policy fixes are not re-introduced. 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v9 Min PotentialReadSpeed 89.65 ( 0.00%) 90.21 ( 0.62%) Min SeqRead-MB/sec-1 82.68 ( 0.00%) 82.01 ( -0.81%) Min SeqRead-MB/sec-2 72.76 ( 0.00%) 72.07 ( -0.95%) Min SeqRead-MB/sec-4 75.13 ( 0.00%) 74.92 ( -0.28%) Min SeqRead-MB/sec-8 64.91 ( 0.00%) 65.19 ( 0.43%) Min SeqRead-MB/sec-16 62.24 ( 0.00%) 62.22 ( -0.03%) Min RandRead-MB/sec-1 0.88 ( 0.00%) 0.88 ( 0.00%) Min RandRead-MB/sec-2 0.95 ( 0.00%) 0.92 ( -3.16%) Min RandRead-MB/sec-4 1.43 ( 0.00%) 1.34 ( -6.29%) Min RandRead-MB/sec-8 1.61 ( 0.00%) 1.60 ( -0.62%) Min RandRead-MB/sec-16 1.80 ( 0.00%) 1.90 ( 5.56%) Min SeqWrite-MB/sec-1 76.41 ( 0.00%) 76.85 ( 0.58%) Min SeqWrite-MB/sec-2 74.11 ( 0.00%) 73.54 ( -0.77%) Min SeqWrite-MB/sec-4 80.05 ( 0.00%) 80.13 ( 0.10%) Min SeqWrite-MB/sec-8 72.88 ( 0.00%) 73.20 ( 0.44%) Min SeqWrite-MB/sec-16 75.91 ( 0.00%) 76.44 ( 0.70%) Min RandWrite-MB/sec-1 1.18 ( 0.00%) 1.14 ( -3.39%) Min RandWrite-MB/sec-2 1.02 ( 0.00%) 1.03 ( 0.98%) Min RandWrite-MB/sec-4 1.05 ( 0.00%) 0.98 ( -6.67%) Min RandWrite-MB/sec-8 0.89 ( 0.00%) 0.92 ( 3.37%) Min RandWrite-MB/sec-16 0.92 ( 0.00%) 0.93 ( 1.09%) 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 approx-v9 User 645.72 525.90 System 403.85 331.75 Elapsed 6795.36 6783.67 This shows that the series has little or not impact on tiobench which is desirable and a reduction in system CPU usage. It indicates that the fair zone allocation policy was removed in a manner that didn't reintroduce one class of page aging bug. There were only minor differences in overall reclaim activity 4.7.0-rc4 4.7.0-rc4 mmotm-20160623nodelru-v8 Minor Faults 645838 647465 Major Faults 573 640 Swap Ins 0 0 Swap Outs 0 0 DMA allocs 0 0 DMA32 allocs 46041453 44190646 Normal allocs 78053072 79887245 Movable allocs 0 0 Allocation stalls 24 67 Stall zone DMA 0 0 Stall zone DMA32 0 0 Stall zone Normal 0 2 Stall zone HighMem 0 0 Stall zone Movable 0 65 Direct pages scanned 10969 30609 Kswapd pages scanned 93375144 93492094 Kswapd pages reclaimed 93372243 93489370 Direct pages reclaimed 10969 30609 Kswapd efficiency 99% 99% Kswapd velocity 13741.015 13781.934 Direct efficiency 100% 100% Direct velocity 1.614 4.512 Percentage direct scans 0% 0% kswapd activity was roughly comparable. There were differences in direct reclaim activity but negligible in the context of the overall workload (velocity of 4 pages per second with the patches applied, 1.6 pages per second in the baseline kernel). pgbench read-only large configuration on ext4 --------------------------------------------- pgbench is a database benchmark that can be sensitive to page reclaim decisions. This also checks if removing the fair zone allocation policy is safe pgbench Transactions 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v8 Hmean 1 188.26 ( 0.00%) 189.78 ( 0.81%) Hmean 5 330.66 ( 0.00%) 328.69 ( -0.59%) Hmean 12 370.32 ( 0.00%) 380.72 ( 2.81%) Hmean 21 368.89 ( 0.00%) 369.00 ( 0.03%) Hmean 30 382.14 ( 0.00%) 360.89 ( -5.56%) Hmean 32 428.87 ( 0.00%) 432.96 ( 0.95%) Negligible differences again. As with tiobench, overall reclaim activity was comparable. bonnie++ on ext4 ---------------- No interesting performance difference, negligible differences on reclaim stats. paralleldd on ext4 ------------------ This workload uses varying numbers of dd instances to read large amounts of data from disk. 4.7.0-rc3 4.7.0-rc3 mmotm-20160623 nodelru-v9 Amean Elapsd-1 186.04 ( 0.00%) 189.41 ( -1.82%) Amean Elapsd-3 192.27 ( 0.00%) 191.38 ( 0.46%) Amean Elapsd-5 185.21 ( 0.00%) 182.75 ( 1.33%) Amean Elapsd-7 183.71 ( 0.00%) 182.11 ( 0.87%) Amean Elapsd-12 180.96 ( 0.00%) 181.58 ( -0.35%) Amean Elapsd-16 181.36 ( 0.00%) 183.72 ( -1.30%) 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v9 User 1548.01 1552.44 System 8609.71 8515.08 Elapsed 3587.10 3594.54 There is little or no change in performance but some drop in system CPU usage. 4.7.0-rc3 4.7.0-rc3 mmotm-20160623 nodelru-v9 Minor Faults 362662 367360 Major Faults 1204 1143 Swap Ins 22 0 Swap Outs 2855 1029 DMA allocs 0 0 DMA32 allocs 31409797 28837521 Normal allocs 46611853 49231282 Movable allocs 0 0 Direct pages scanned 0 0 Kswapd pages scanned 40845270 40869088 Kswapd pages reclaimed 40830976 40855294 Direct pages reclaimed 0 0 Kswapd efficiency 99% 99% Kswapd velocity 11386.711 11369.769 Direct efficiency 100% 100% Direct velocity 0.000 0.000 Percentage direct scans 0% 0% Page writes by reclaim 2855 1029 Page writes file 0 0 Page writes anon 2855 1029 Page reclaim immediate 771 1628 Sector Reads 293312636 293536360 Sector Writes 18213568 18186480 Page rescued immediate 0 0 Slabs scanned 128257 132747 Direct inode steals 181 56 Kswapd inode steals 59 1131 It basically shows that kswapd was active at roughly the same rate in both kernels. There was also comparable slab scanning activity and direct reclaim was avoided in both cases. There appears to be a large difference in numbers of inodes reclaimed but the workload has few active inodes and is likely a timing artifact. stutter ------- stutter simulates a simple workload. One part uses a lot of anonymous memory, a second measures mmap latency and a third copies a large file. The primary metric is checking for mmap latency. stutter 4.7.0-rc4 4.7.0-rc4 mmotm-20160623 nodelru-v8 Min mmap 16.6283 ( 0.00%) 13.4258 ( 19.26%) 1st-qrtle mmap 54.7570 ( 0.00%) 34.9121 ( 36.24%) 2nd-qrtle mmap 57.3163 ( 0.00%) 46.1147 ( 19.54%) 3rd-qrtle mmap 58.9976 ( 0.00%) 47.1882 ( 20.02%) Max-90% mmap 59.7433 ( 0.00%) 47.4453 ( 20.58%) Max-93% mmap 60.1298 ( 0.00%) 47.6037 ( 20.83%) Max-95% mmap 73.4112 ( 0.00%) 82.8719 (-12.89%) Max-99% mmap 92.8542 ( 0.00%) 88.8870 ( 4.27%) Max mmap 1440.6569 ( 0.00%) 121.4201 ( 91.57%) Mean mmap 59.3493 ( 0.00%) 42.2991 ( 28.73%) Best99%Mean mmap 57.2121 ( 0.00%) 41.8207 ( 26.90%) Best95%Mean mmap 55.9113 ( 0.00%) 39.9620 ( 28.53%) Best90%Mean mmap 55.6199 ( 0.00%) 39.3124 ( 29.32%) Best50%Mean mmap 53.2183 ( 0.00%) 33.1307 ( 37.75%) Best10%Mean mmap 45.9842 ( 0.00%) 20.4040 ( 55.63%) Best5%Mean mmap 43.2256 ( 0.00%) 17.9654 ( 58.44%) Best1%Mean mmap 32.9388 ( 0.00%) 16.6875 ( 49.34%) This shows a number of improvements with the worst-case outlier greatly improved. Some of the vmstats are interesting 4.7.0-rc4 4.7.0-rc4 mmotm-20160623nodelru-v8 Swap Ins 163 502 Swap Outs 0 0 DMA allocs 0 0 DMA32 allocs 618719206 1381662383 Normal allocs 891235743 564138421 Movable allocs 0 0 Allocation stalls 2603 1 Direct pages scanned 216787 2 Kswapd pages scanned 50719775 41778378 Kswapd pages reclaimed 41541765 41777639 Direct pages reclaimed 209159 0 Kswapd efficiency 81% 99% Kswapd velocity 16859.554 14329.059 Direct efficiency 96% 0% Direct velocity 72.061 0.001 Percentage direct scans 0% 0% Page writes by reclaim 6215049 0 Page writes file 6215049 0 Page writes anon 0 0 Page reclaim immediate 70673 90 Sector Reads 81940800 81680456 Sector Writes 100158984 98816036 Page rescued immediate 0 0 Slabs scanned 1366954 22683 While this is not guaranteed in all cases, this particular test showed a large reduction in direct reclaim activity. It's also worth noting that no page writes were issued from reclaim context. This series is not without its hazards. There are at least three areas that I'm concerned with even though I could not reproduce any problems in that area. 1. Reclaim/compaction is going to be affected because the amount of reclaim is no longer targetted at a specific zone. Compaction works on a per-zone basis so there is no guarantee that reclaiming a few THP's worth page pages will have a positive impact on compaction success rates. 2. The Slab/LRU reclaim ratio is affected because the frequency the shrinkers are called is now different. This may or may not be a problem but if it is, it'll be because shrinkers are not called enough and some balancing is required. 3. The anon/file reclaim ratio may be affected. Pages about to be dirtied are distributed between zones and the fair zone allocation policy used to do something very similar for anon. The distribution is now different but not necessarily in any way that matters but it's still worth bearing in mind. VM statistic counters for reclaim decisions are zone-based. If the kernel is to reclaim on a per-node basis then we need to track per-node statistics but there is no infrastructure for that. The most notable change is that the old node_page_state is renamed to sum_zone_node_page_state. The new node_page_state takes a pglist_data and uses per-node stats but none exist yet. There is some renaming such as vm_stat to vm_zone_stat and the addition of vm_node_stat and the renaming of mod_state to mod_zone_state. Otherwise, this is mostly a mechanical patch with no functional change. There is a lot of similarity between the node and zone helpers which is unfortunate but there was no obvious way of reusing the code and maintaining type safety. Link: http://lkml.kernel.org/r/1467970510-21195-2-git-send-email-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix vm-scalability regression in cgroup-aware workingset codeJohannes Weiner2016-07-292-1/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 23047a96d7cf ("mm: workingset: per-cgroup cache thrash detection") added a page->mem_cgroup lookup to the cache eviction, refault, and activation paths, as well as locking to the activation path, and the vm-scalability tests showed a regression of -23%. While the test in question is an artificial worst-case scenario that doesn't occur in real workloads - reading two sparse files in parallel at full CPU speed just to hammer the LRU paths - there is still some optimizations that can be done in those paths. Inline the lookup functions to eliminate calls. Also, page->mem_cgroup doesn't need to be stabilized when counting an activation; we merely need to hold the RCU lock to prevent the memcg from being freed. This cuts down on overhead quite a bit: 23047a96d7cfcfca 063f6715e77a7be5770d6081fe ---------------- -------------------------- %stddev %change %stddev \ | \ 21621405 +- 0% +11.3% 24069657 +- 2% vm-scalability.throughput [linux@roeck-us.net: drop unnecessary include file] [hannes@cmpxchg.org: add WARN_ON_ONCE()s] Link: http://lkml.kernel.org/r/20160707194024.GA26580@cmpxchg.org Link: http://lkml.kernel.org/r/20160624175101.GA3024@cmpxchg.org Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, oom_reaper: do not attempt to reap a task more than twiceMichal Hocko2016-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | oom_reaper relies on the mmap_sem for read to do its job. Many places which might block readers have been converted to use down_write_killable and that has reduced chances of the contention a lot. Some paths where the mmap_sem is held for write can take other locks and they might either be not prepared to fail due to fatal signal pending or too impractical to be changed. This patch introduces MMF_OOM_NOT_REAPABLE flag which gets set after the first attempt to reap a task's mm fails. If the flag is present after the failure then we set MMF_OOM_REAPED to hide this mm from the oom killer completely so it can go and chose another victim. As a result a risk of OOM deadlock when the oom victim would be blocked indefinetly and so the oom killer cannot make any progress should be mitigated considerably while we still try really hard to perform all reclaim attempts and stay predictable in the behavior. Link: http://lkml.kernel.org/r/1466426628-15074-10-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, oom: fortify task_will_free_mem()Michal Hocko2016-07-291-23/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | task_will_free_mem is rather weak. It doesn't really tell whether the task has chance to drop its mm. 98748bd72200 ("oom: consider multi-threaded tasks in task_will_free_mem") made a first step into making it more robust for multi-threaded applications so now we know that the whole process is going down and probably drop the mm. This patch builds on top for more complex scenarios where mm is shared between different processes - CLONE_VM without CLONE_SIGHAND, or in kernel use_mm(). Make sure that all processes sharing the mm are killed or exiting. This will allow us to replace try_oom_reaper by wake_oom_reaper because task_will_free_mem implies the task is reapable now. Therefore all paths which bypass the oom killer are now reapable and so they shouldn't lock up the oom killer. Link: http://lkml.kernel.org/r/1466426628-15074-8-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, oom: skip vforked tasks from being selectedMichal Hocko2016-07-291-0/+26
| | | | | | | | | | | | | | | | | | | | | | vforked tasks are not really sitting on any memory. They are sharing the mm with parent until they exec into a new code. Until then it is just pinning the address space. OOM killer will kill the vforked task along with its parent but we still can end up selecting vforked task when the parent wouldn't be selected. E.g. init doing vfork to launch a task or vforked being a child of oom unkillable task with an updated oom_score_adj to be killable. Add a new helper to check whether a task is in the vfork sharing memory with its parent and use it in oom_badness to skip over these tasks. Link: http://lkml.kernel.org/r/1466426628-15074-6-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, oom_adj: make sure processes sharing mm have same view of oom_score_adjMichal Hocko2016-07-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | oom_score_adj is shared for the thread groups (via struct signal) but this is not sufficient to cover processes sharing mm (CLONE_VM without CLONE_SIGHAND) and so we can easily end up in a situation when some processes update their oom_score_adj and confuse the oom killer. In the worst case some of those processes might hide from the oom killer altogether via OOM_SCORE_ADJ_MIN while others are eligible. OOM killer would then pick up those eligible but won't be allowed to kill others sharing the same mm so the mm wouldn't release the mm and so the memory. It would be ideal to have the oom_score_adj per mm_struct because that is the natural entity OOM killer considers. But this will not work because some programs are doing vfork() set_oom_adj() exec() We can achieve the same though. oom_score_adj write handler can set the oom_score_adj for all processes sharing the same mm if the task is not in the middle of vfork. As a result all the processes will share the same oom_score_adj. The current implementation is rather pessimistic and checks all the existing processes by default if there is more than 1 holder of the mm but we do not have any reliable way to check for external users yet. Link: http://lkml.kernel.org/r/1466426628-15074-5-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'hsi-for-4.8' of ↵Linus Torvalds2016-07-281-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI updates from Sebastian Reichel: - proper runtime pm support for omap-ssi and ssi-protocol - misc fixes * tag 'hsi-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: (24 commits) HSI: omap_ssi: drop pm_runtime_irq_safe HSI: omap_ssi_port: use rpm autosuspend API HSI: omap_ssi: call msg->complete() from process context HSI: omap_ssi_port: ensure clocks are kept enabled during transfer HSI: omap_ssi_port: replace pm_runtime_put_sync with non-sync variant HSI: omap_ssi_port: avoid calling runtime_pm_*_sync inside spinlock HSI: omap_ssi_port: avoid pm_runtime_get_sync in ssi_start_dma and ssi_start_pio HSI: omap_ssi_port: switch to threaded pio irq HSI: omap_ssi_core: remove pm_runtime_get_sync call from tasklet HSI: omap_ssi_core: use pm_runtime_put instead of pm_runtime_put_sync HSI: omap_ssi_port: prepare start_tx/stop_tx for blocking pm_runtime calls HSI: core: switch port event notifier from atomic to blocking HSI: omap_ssi_port: replace wkin_cken with atomic bitmap operations HSI: omap_ssi: convert cawake irq handler to thread HSI: ssi_protocol: fix ssip_xmit invocation HSI: ssi_protocol: replace spin_lock with spin_lock_bh HSI: ssi_protocol: avoid ssi_waketest call with held spinlock HSI: omap_ssi: do not reset module HSI: omap_ssi_port: remove useless newline hsi: Only descend into hsi directory when CONFIG_HSI is set ...
| * HSI: core: switch port event notifier from atomic to blockingSebastian Reichel2016-06-281-1/+1
| | | | | | | | | | | | | | | | port events should be sent from process context after irq_safe runtime pm flag is removed in omap-ssi. Signed-off-By: Sebastian Reichel <sre@kernel.org> Tested-by: Pavel Machek <pavel@ucw.cz>
* | Merge tag 'random_for_linus' of ↵Linus Torvalds2016-07-281-0/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random driver updates from Ted Ts'o: "A number of improvements for the /dev/random driver; the most important is the use of a ChaCha20-based CRNG for /dev/urandom, which is faster, more efficient, and easier to make scalable for silly/abusive userspace programs that want to read from /dev/urandom in a tight loop on NUMA systems. This set of patches also improves entropy gathering on VM's running on Microsoft Azure, and will take advantage of a hw random number generator (if present) to initialize the /dev/urandom pool" (It turns out that the random tree hadn't been in linux-next this time around, because it had been dropped earlier as being too quiet. Oh well). * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: strengthen input validation for RNDADDTOENTCNT random: add backtracking protection to the CRNG random: make /dev/urandom scalable for silly userspace programs random: replace non-blocking pool with a Chacha20-based CRNG random: properly align get_random_int_hash random: add interrupt callback to VMBus IRQ handler random: print a warning for the first ten uninitialized random users random: initialize the non-blocking pool via add_hwgenerator_randomness()
| * | random: replace non-blocking pool with a Chacha20-based CRNGTheodore Ts'o2016-07-031-0/+1
| |/ | | | | | | | | | | | | The CRNG is faster, and we don't pretend to track entropy usage in the CRNG any more. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | Merge tag 'media/v4.8-4' of ↵Linus Torvalds2016-07-2720-806/+1913
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media documentation updates from Mauro Carvalho Chehab: "This patch series does the conversion of all media documentation stuff to Restrutured Text markup format and add them to the Documentation/index.rst file. The media documentation was grouped into 4 books: - media uAPI - media kAPI - V4L driver-specific documentation - DVB driver-specific documentation It also contains several documentation improvements and one fixup patch for a core issue with cropcap. PS. After this patch series, the media DocBook is deprecated and should be removed. I'll add such patch on a future pull request" * tag 'media/v4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (322 commits) [media] cx23885-cardlist.rst: add a new card [media] doc-rst: add some needed escape codes [media] doc-rst: kapi: use :c:func: instead of :cpp:func doc-rst: kernel-doc: fix a change introduced by mistake [media] v4l2-ioctl.h add debug info for struct v4l2_ioctl_ops [media] dvb_ringbuffer.h: some documentation improvements [media] v4l2-ctrls.h: fully document the header file [media] doc-rst: Fix some typedef ugly warnings [media] doc-rst: reorganize the kAPI v4l2 chapters [media] rename v4l2-framework.rst to v4l2-intro.rst [media] move V4L2 clocks to a separate .rst file [media] v4l2-fh.rst: add cross references and markups [media] v4l2-fh.rst: add fh contents from v4l2-framework.rst [media] v4l2-fh.h: add documentation for it [media] v4l2-event.rst: add cross-references and markups [media] v4l2-event.h: document all functions [media] v4l2-event.rst: add text from v4l2-framework.rst [media] v4l2-framework.rst: remove videobuf quick chapter [media] v4l2-dev: add cross-references and improve markup [media] doc-rst: move v4l2-dev doc to a separate file ...
| * \ Merge branch 'topic/docs-next' into v4l_for_linusMauro Carvalho Chehab2016-07-2720-806/+1913
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * topic/docs-next: (322 commits) [media] cx23885-cardlist.rst: add a new card [media] doc-rst: add some needed escape codes [media] doc-rst: kapi: use :c:func: instead of :cpp:func doc-rst: kernel-doc: fix a change introduced by mistake [media] v4l2-ioctl.h add debug info for struct v4l2_ioctl_ops [media] dvb_ringbuffer.h: some documentation improvements [media] v4l2-ctrls.h: fully document the header file [media] doc-rst: Fix some typedef ugly warnings [media] doc-rst: reorganize the kAPI v4l2 chapters [media] rename v4l2-framework.rst to v4l2-intro.rst [media] move V4L2 clocks to a separate .rst file [media] v4l2-fh.rst: add cross references and markups [media] v4l2-fh.rst: add fh contents from v4l2-framework.rst [media] v4l2-fh.h: add documentation for it [media] v4l2-event.rst: add cross-references and markups [media] v4l2-event.h: document all functions [media] v4l2-event.rst: add text from v4l2-framework.rst [media] v4l2-framework.rst: remove videobuf quick chapter [media] v4l2-dev: add cross-references and improve markup [media] doc-rst: move v4l2-dev doc to a separate file ...
| | * | [media] doc-rst: add some needed escape codesMauro Carvalho Chehab2016-07-233-28/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some extra escape codes are needed to avoid Sphinx to not identify the tags. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-ioctl.h add debug info for struct v4l2_ioctl_opsMauro Carvalho Chehab2016-07-231-0/+266
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This struct is mentioned at the kAPI docbook. So, let's document it. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-ctrls.h: fully document the header fileMauro Carvalho Chehab2016-07-231-86/+279
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are lots of undocumented stuff on this header. Document them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] doc-rst: Fix some typedef ugly warningsMauro Carvalho Chehab2016-07-232-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Sphinx can't handle well typedefs. Change two typedef occurrences, in order to cleanup some of such warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-fh.h: add documentation for itMauro Carvalho Chehab2016-07-231-32/+96
| | | | | | | | | | | | | | | | | | | | | | | | This header file was undocumented. Add documentation for it. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-event.h: document all functionsMauro Carvalho Chehab2016-07-231-12/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several functions weren't documented. Document them all. While here, makes checkpatch.pl happy. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] doc-rst: document v4l2-dev.hMauro Carvalho Chehab2016-07-231-49/+315
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add documentation for v4l2-dev.h, and put it at v4l2-framework.rst, where struct video_device is currently documented. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-common.h: Add documentation for other functionsMauro Carvalho Chehab2016-07-231-5/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Not all functions at v4l2-common.h are documented. Add documentation for some other ones. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-common.h: document the subdev functionsMauro Carvalho Chehab2016-07-231-7/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some subdev-specific functions at v4l2-common.h that are mentioned at v4l2-subdev.rst. Document them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-async: document the remaining stuffMauro Carvalho Chehab2016-07-231-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are one enum and 4 functions undocumented there. Document them. That will fix the broken links at the v4l2-subdev.rst file. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-subdev.h: Improve documentationMauro Carvalho Chehab2016-07-231-189/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This header were poorly documented, and weren't using the kernel-doc format. Document everything but the macros using the right format. While here, also fix the other comments to match the Linux CodingStyle. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] v4l2-device.h: document functionsMauro Carvalho Chehab2016-07-231-55/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions at v4l2-device.h are not using the proper markups. Add it, and include at the v4l2-core.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] media-entry.h: Fix a note markupMauro Carvalho Chehab2016-07-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Sphinx note markup for media_remove_intf_links() is wrong: there's a missing space. While here, let's auto-numerate the two notes. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | [media] doc-rst: Fix some Sphinx warningsMauro Carvalho Chehab2016-07-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix all remaining media warnings with ReST that are fixable without changing at the Sphinx code. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | Merge branch 'patchwork' into topic/docs-nextMauro Carvalho Chehab2016-07-2393-358/+708
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * patchwork: (1492 commits) [media] cec: always check all_device_types and features [media] cec: poll should check if there is room in the tx queue [media] vivid: support monitor all mode [media] cec: fix test for unconfigured adapter in main message loop [media] cec: limit the size of the transmit queue [media] cec: zero unused msg part after msg->len [media] cec: don't set fh to NULL in CEC_TRANSMIT [media] cec: clear all status fields before transmit and always fill in sequence [media] cec: CEC_RECEIVE overwrote the timeout field [media] cxd2841er: Reading SNR for DVB-C added [media] cxd2841er: Reading BER and UCB for DVB-C added [media] cxd2841er: fix switch-case for DVB-C [media] cxd2841er: fix signal strength scale for ISDB-T [media] cxd2841er: adjust the dB scale for DVB-C [media] cxd2841er: provide signal strength for DVB-C [media] cxd2841er: fix BER report via DVBv5 stats API [media] mb86a20s: apply mask to val after checking for read failure [media] airspy: fix error logic during device register [media] s5p-cec/TODO: add TODO item [media] cec/TODO: drop comment about sphinx documentation ... Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | | [media] doc-rst: Fix conversion for MC core functionsMauro Carvalho Chehab2016-07-172-49/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were lots of issues at the media controller side, after the conversion: - Some documentation at the header files weren't using the kernel-doc start block; - Now, the C files with the exported symbols also need to be added. So, all headers need to be included twice: one to get the structs/enums/.. and another one for the functions; - Notes should use the ReST tag, as kernel-doc doesn't recognizes it anymore; - Identation needs to be fixed, as ReST uses it to identify when a format "tag" ends. - Fix the cross-references at the media controller description. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | | [media] doc-rst: Fix conversion for v4l2 core functionsMauro Carvalho Chehab2016-07-175-24/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion from DocBook lead into some conversion issues, basically due to the lack of proper support at kernel-doc. So, address them: - Now, the C files with the exported symbols also need to be added. So, all headers need to be included twice: one to get the structs/enums/.. and another one for the functions; - Notes should use the ReST tag, as kernel-doc doesn't recognizes it anymore; - Identation needs to be fixed, as ReST uses it to identify when a format "tag" ends. - kernel-doc doesn't escape things like *pointer, so we need to manually add a escape char before it. - On some cases, kernel-doc conversion requires violating the 80-cols, as otherwise it won't properly parse the source code. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | | [media] doc-rst: Fix issues with RC documentationMauro Carvalho Chehab2016-07-173-4/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel-doc script is now broken if it doesn't find all exported symbols documented. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | | [media] doc-rst: Convert media API to rstMauro Carvalho Chehab2016-07-171-231/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the contents of the media section at DocBooks/DocBook/device-drivers.tmpl to a new ReST book. For now, the contents is kept as-is. Next patches will fix the warnings and add cross-references that were removed due to the conversion. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | | [media] lirc.h: remove several unused ioctlsMauro Carvalho Chehab2016-07-111-37/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While reviewing the documentation gaps on LIRC, it was noticed that several ioctls aren't used by any LIRC drivers (nor at staging or mainstream). It doesn't make sense to document them, as they're not used anywhere. So, let's remove those from the lirc header. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
| | * | | Merge branch 'topic/cec' into topic/docs-nextMauro Carvalho Chehab2016-07-098-6/+3247
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * topic/cec: [media] DocBook/media: add CEC documentation [media] s5p_cec: get rid of an unused var [media] move s5p-cec to staging [media] vivid: add CEC emulation [media] cec: s5p-cec: Add s5p-cec driver [media] cec: adv7511: add cec support [media] cec: adv7842: add cec support [media] cec: adv7604: add cec support [media] cec: add compat32 ioctl support [media] cec/TODO: add TODO file so we know why this is still in staging [media] cec: add HDMI CEC framework (api) [media] cec: add HDMI CEC framework (adapter) [media] cec: add HDMI CEC framework (core) [media] cec-funcs.h: static inlines to pack/unpack CEC messages [media] cec.h: add cec header [media] cec-edid: add module for EDID CEC helper functions [media] cec.txt: add CEC framework documentation [media] rc: Add HDMI CEC protocol handling Input: add HDMI CEC specific keycodes Input: add BUS_CEC type
* | | \ \ \ Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2016-07-275-18/+36
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This update includes the usual round of driver updates (fcoe, lpfc, ufs, qla2xxx, hisi_sas). The most important other change is removing the flag to allow non-blk_mq on a per host basis (it's unused); there is still a global module parameter for all of SCSI just in case. The rest are an assortment of minor fixes and typo updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (101 commits) scsi:libsas: fix oops caused by assigning a freed task to ->lldd_task fnic: pci_dma_mapping_error() doesn't return an error code scsi: lpfc: avoid harmless comparison warning fcoe: implement FIP VLAN responder fcoe: Rename 'fip_frame' to 'fip_vn2vn_notify_frame' lpfc: call lpfc_sli_validate_fcp_iocb() with the hbalock held scsi: ufs: remove unnecessary goto label hpsa: change hpsa_passthru_ioctl timeout hpsa: correct skipping masked peripherals qla2xxx: Update driver version to 8.07.00.38-k qla2xxx: Fix BBCR offset qla2xxx: Fix duplicate message id. qla2xxx: Disable the adapter and skip error recovery in case of register disconnect. qla2xxx: Separate ISP type bits out from device type. qla2xxx: Correction to function qla26xx_dport_diagnostics(). qla2xxx: Add support to handle Loop Init error Asynchronus event. qla2xxx: Let DPORT be enabled purely by nvram. qla2xxx: Add bsg interface to support statistics counter reset. qla2xxx: Add bsg interface to support D_Port Diagnostics. qla2xxx: Check for device state before unloading the driver. ...
| * | | | | | fcoe: implement FIP VLAN responderHannes Reinecke2016-07-212-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running in VN2VN mode there is no central instance which would send out any FIP VLAN discovery notifications. So this patch adds a new sysfs attribute 'fip_vlan_responder' which will activate a FIP VLAN discovery responder. Signed-off-by: Hannes Reinecke <hare@suse.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | scsi: remove the disable_blk_mq host flagChristoph Hellwig2016-07-151-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've had scsi-mq for 2.5 years now, so we can remove the unused flag to disable the code on a per-host basis that was put in for unexpected emergencies during bringup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | scsi: remove current_cmnd field from struct scsi_deviceChristoph Hellwig2016-07-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The field is only used by the 53c700 driver, so move it into the driver-private device data instead of having it in the common structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | fcoe: use enum for fip_modeHannes Reinecke2016-07-141-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FIP mode is independent on the FIP state machine, so use a separate enum for that instead of overloading it with state machine values. Signed-off-by: Hannes Reinecke <hare@suse.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | fc_fip: Update to latest FC-BB-6 draftHannes Reinecke2016-07-141-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update to latest FC-BB-6 draft to include FIP VN2VN VLAN notifications and additional flags. Signed-off-by: Hannes Reinecke <hare@suse.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | | | fcoe: convert to kworkerSebastian Andrzej Siewior2016-07-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver creates its own per-CPU threads which are updated based on CPU hotplug events. It is also possible to use kworkers and remove some of the kthread infrastrucure. The code checked ->thread to decide if there is an active per-CPU thread. By using the kworker infrastructure this is no longer possible (or required). The thread pointer is saved in `kthread' instead of `thread' so anything trying to use thread is caught by the compiler. Currently only the bnx2fc driver is using struct fcoe_percpu_s and the kthread member. After a CPU went offline, we may still enqueue items on the "offline" CPU. This isn't much of a problem. The work will be done on a random CPU. The allocated crc_eof_page page won't be cleaned up. It is probably expected that the CPU comes up at some point so it should not be a problem. The crc_eof_page memory is released of course once the module is removed. This patch was only compile-tested due to -ENODEV. Cc: Vasu Dev <vasu.dev@intel.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: fcoe-devel@open-fcoe.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-07-272-2/+21
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: "Updates for the input subsystem. This contains the following new drivers promised in the last merge window: - driver for touchscreen controller found in Surface 3 - driver for Pegasus Notetaker tablet - driver for Atmel Captouch Buttons - driver for Raydium I2C touchscreen controllers - powerkey driver for HISI 65xx SoC plus a few fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (40 commits) Input: tty/vt/keyboard - use memdup_user() Input: pegasus_notetaker - set device mode in reset_resume() if in use Input: pegasus_notetaker - cancel workqueue's work in suspend() Input: pegasus_notetaker - fix usb_autopm calls to be balanced Input: pegasus_notetaker - handle usb control msg errors Input: wacom_w8001 - handle errors from input_mt_init_slots() Input: wacom_w8001 - resolution wasn't set for ABS_MT_POSITION_X/Y Input: pixcir_ts - add support for axis inversion / swapping Input: icn8318 - use of_touchscreen helpers for inverting / swapping axes Input: edt-ft5x06 - add support for inverting / swapping axes Input: of_touchscreen - add support for inverted / swapped axes Input: synaptics-rmi4 - use the RMI_F11_REL_BYTES define in rmi_f11_rel_pos_report Input: synaptics-rmi4 - remove unneeded variable Input: synaptics-rmi4 - remove pointer to rmi_function in f12_data Input: synaptics-rmi4 - support regulator supplies Input: raydium_i2c_ts - check CRC of incoming packets Input: xen-kbdfront - prefer xenbus_write() over xenbus_printf() where possible Input: fix a double word "is is" in include/linux/input.h Input: add powerkey driver for HISI 65xx SoC Input: apanel - spelling mistake - "skiping" -> "skipping" ...
| * \ \ \ \ \ \ Merge branch 'for-linus' into nextDmitry Torokhov2016-07-19604-3883/+17284
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Sync up to bring in wacom_w8001 changes to avoid merge conflicts later.
| * | | | | | | | Input: of_touchscreen - add support for inverted / swapped axesHans de Goede2016-07-151-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend touchscreen_parse_properties() with support for the touchscreen-inverted-x/y and touchscreen-swapped-x-y properties and add touchscreen_set_mt_pos() and touchscreen_report_pos() helper functions for storing coordinates into a input_mt_pos struct, or directly reporting them, taking these properties into account. This commit also modifies the existing callers of touchscreen_parse_properties() to pass in NULL for the new third argument, keeping the existing behavior. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
| * | | | | | | | Input: fix a double word "is is" in include/linux/input.hMasanari Iida2016-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix a double word "is is" found in in Documentation/DocBook/device-drivers.xml. It is because the file was created from comments in sources, so I have to fix the double words in include/linux/input.h Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
* | | | | | | | | Merge branch 'i2c/for-4.8' of ↵Linus Torvalds2016-07-274-5/+51
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "Here is the I2C pull request for 4.8: - the core and i801 driver gained support for SMBus Host Notify - core support for more than one address in DT - i2c_add_adapter() has now better error messages. We can remove all error messages from drivers calling it as a next step. - bigger updates to rk3x driver to support rk3399 SoC - the at24 eeprom driver got refactored and can now read special variants with unique serials or fixed MAC addresses. The rest is regular driver updates and bugfixes" * 'i2c/for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (66 commits) i2c: i801: use IS_ENABLED() instead of checking for built-in or module Documentation: i2c: slave: give proper example for pm usage Documentation: i2c: slave: describe buffer problems a bit better i2c: bcm2835: Don't complain on -EPROBE_DEFER from getting our clock i2c: i2c-smbus: drop useless stubs i2c: efm32: fix a failure path in efm32_i2c_probe() Revert "i2c: core: Cleanup I2C ACPI namespace" Revert "i2c: core: Add function for finding the bus speed from ACPI" i2c: Update the description of I2C_SMBUS i2c: i2c-smbus: fix i2c_handle_smbus_host_notify documentation eeprom: at24: tweak the loop_until_timeout() macro eeprom: at24: add support for at24mac series eeprom: at24: support reading the serial number for 24csxx eeprom: at24: platform_data: use BIT() macro eeprom: at24: split at24_eeprom_write() into specialized functions eeprom: at24: split at24_eeprom_read() into specialized functions eeprom: at24: hide the read/write loop behind a macro eeprom: at24: call read/write functions via function pointers eeprom: at24: coding style fixes eeprom: at24: move at24_read() below at24_eeprom_write() ...
| * | | | | | | | | i2c: i2c-smbus: drop useless stubsJean Delvare2016-07-221-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers which use the SMBus extensions select I2C_SMBUS, so the stubs are not needed. Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | | | | | | | | Revert "i2c: core: Add function for finding the bus speed from ACPI"Wolfram Sang2016-07-191-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 55d38d060e999ca1a3ea6eb132105a0301e4cd04. There were too heavy merge conflicts and the driver code making use of this was not ready yet anyhow. So, we wait one cycle. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | | | | | | | | eeprom: at24: add support for at24mac seriesBartosz Golaszewski2016-07-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new read function to the at24 driver allowing to retrieve the factory-programmed mac address embedded in chips from the at24mac family. These chips can be instantiated similarily to the at24cs family, except that there's no way of having access to both the serial number and the mac address at the same time - the user must instantiate either an at24cs or at24mac device as both special memory areas are accessible on the same slave address. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>