summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ata: sata_dwc_460ex: fix crash on offline links without an attached driveChristian Lamparter2016-05-101-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes Machine Check "Data Write PLB Error" which happens when libata-sff's ata_sff_dev_select is trying to write into the device_addr in order to select a drive. However, SATA has no master or slave devices like the old ATA Bus, therefore selecting a different drive is kind of pointless. Data Write PLB Error Oops: Machine check, sig: 7 [#1] PowerPC 44x Platform Modules linked in: CPU: 0 PID: 508 Comm: scsi_eh_0 Not tainted 4.6.0-rc3-next-20160412+ #10 [...] NIP [c027e820] ata_sff_dev_select+0x3c/0x44 LR [c027e810] ata_sff_dev_select+0x2c/0x44 Call Trace: [cec31cd0] [c027da00] ata_sff_postreset+0x40/0xb4 (unreliable) [cec31ce0] [c027a03c] ata_eh_reset+0x5cc/0x928 [cec31d60] [c027a840] ata_eh_recover+0x330/0x10bc [cec31df0] [c027bae0] ata_do_eh+0x4c/0xa4 [...] Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* ata: sata_dwc_460ex: remove incorrect lockingMans Rullgard2016-05-101-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This lock is already taken in ata_scsi_queuecmd() a few levels up the call stack so attempting to take it here is an error. Moreover, it is pointless in the first place since it only protects a single, atomic assignment. Enabling lock debugging gives the following output: ============================================= [ INFO: possible recursive locking detected ] 4.4.0-rc5+ #189 Not tainted --------------------------------------------- kworker/u2:3/37 is trying to acquire lock: (&(&host->lock)->rlock){-.-...}, at: [<90283294>] sata_dwc_exec_command_by_tag.constprop.14+0x44/0x8c but task is already holding lock: (&(&host->lock)->rlock){-.-...}, at: [<902761ac>] ata_scsi_queuecmd+0x2c/0x330 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&host->lock)->rlock); lock(&(&host->lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/u2:3/37: #0: ("events_unbound"){.+.+.+}, at: [<9003a0a4>] process_one_work+0x12c/0x430 #1: ((&entry->work)){+.+.+.}, at: [<9003a0a4>] process_one_work+0x12c/0x430 #2: (&bdev->bd_mutex){+.+.+.}, at: [<9011fd54>] __blkdev_get+0x50/0x380 #3: (&(&host->lock)->rlock){-.-...}, at: [<902761ac>] ata_scsi_queuecmd+0x2c/0x330 stack backtrace: CPU: 0 PID: 37 Comm: kworker/u2:3 Not tainted 4.4.0-rc5+ #189 Workqueue: events_unbound async_run_entry_fn Stack : 90b38e30 00000021 00000003 9b2a6040 00000000 9005f3f0 904fc8dc 00000025 906b96e4 00000000 90528648 9b3336c4 904fc8dc 9009bf18 00000002 00000004 00000000 00000000 9b3336c4 9b3336e4 904fc8dc 9003d074 00000000 90500000 9005e738 00000000 00000000 00000000 00000000 00000000 00000000 00000000 6e657665 755f7374 756f626e 0000646e 00000000 00000000 9b00ca00 9b025000 ... Call Trace: [<90009d6c>] show_stack+0x88/0xa4 [<90057744>] __lock_acquire+0x1ce8/0x2154 [<900583e4>] lock_acquire+0x64/0x8c [<9045ff10>] _raw_spin_lock_irqsave+0x54/0x78 [<90283294>] sata_dwc_exec_command_by_tag.constprop.14+0x44/0x8c [<90283484>] sata_dwc_qc_issue+0x1a8/0x24c [<9026b39c>] ata_qc_issue+0x1f0/0x410 [<90273c6c>] ata_scsi_translate+0xb4/0x200 [<90276234>] ata_scsi_queuecmd+0xb4/0x330 [<9025800c>] scsi_dispatch_cmd+0xd0/0x128 [<90259934>] scsi_request_fn+0x58c/0x638 [<901a3e50>] __blk_run_queue+0x40/0x5c [<901a83d4>] blk_queue_bio+0x27c/0x28c [<901a5914>] generic_make_request+0xf0/0x188 [<901a5a54>] submit_bio+0xa8/0x194 [<9011adcc>] submit_bh_wbc.isra.23+0x15c/0x17c [<9011c908>] block_read_full_page+0x3e4/0x428 [<9009e2e0>] do_read_cache_page+0xac/0x210 [<9009fd90>] read_cache_page+0x18/0x24 [<901bbd18>] read_dev_sector+0x38/0xb0 [<901bd174>] msdos_partition+0xb4/0x5c0 [<901bcb8c>] check_partition+0x140/0x274 [<901bba60>] rescan_partitions+0xa0/0x2b0 [<9011ff68>] __blkdev_get+0x264/0x380 [<901201ac>] blkdev_get+0x128/0x36c [<901b9378>] add_disk+0x3c0/0x4bc [<90268268>] sd_probe_async+0x100/0x224 [<90043a44>] async_run_entry_fn+0x50/0x124 [<9003a11c>] process_one_work+0x1a4/0x430 [<9003a4f4>] worker_thread+0x14c/0x4fc [<900408f4>] kthread+0xd0/0xe8 [<90004338>] ret_from_kernel_thread+0x14/0x1c Fixes: 62936009f35a ("[libata] Add 460EX on-chip SATA driver, sata_dwc_460ex") Tested-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Mans Rullgard <mans@mansr.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* dmaengine: dw: pass platform data via struct dw_dma_chipAndy Shevchenko2016-05-026-11/+17
| | | | | | | | | | | We pass struct dw_dma_chip to dw_dma_probe() anyway, thus we may use it to pass a platform data as well. While here, constify the source of the platform data. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: keep entire platform data in struct dw_dmaAndy Shevchenko2016-05-024-21/+20
| | | | | | | | | Keep the entire platform data in the struct dw_dma. It makes the driver a bit cleaner. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: revisit data_width propertyAndy Shevchenko2016-05-026-38/+24
| | | | | | | | | | | | | | | | | | | | There several changes are done here: - Convert the property to be in bytes Besides that this is a common practice for such property, the use of a value in bytes much more convenient than handling the encoded one. - Rename data_width to data-width in the device tree bindings The change leaves the support for the old format as well just in case someone will use a newer kernel with an old device tree blob. - While here, replace dwc_fast_ffs() by __ffs() Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: platform: check nr_masters to be non-zeroAndy Shevchenko2016-05-021-10/+10
| | | | | | | | | | | The value of nr_masters equal to 0 is invalid since this DMA controller has to have at least one master. Check this before we proceed with the rest of properties. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: lazy allocation of dma descriptorsChristian Lamparter2016-04-192-119/+48
| | | | | | | | | | | | | This patch changes the driver to allocate DMA descriptors when needed. This stops memory resources to be wasted and letting them sit idle in the free_list structure when the device doesn't need it... This also solves the problem, that a driver has to guess the number of how many descriptors it needs to allocate in advance. Currently, the dma engine will just fail when put under load by sata_dwc_460ex. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: set cdesc to NULL when free cyclic transfersAndy Shevchenko2016-04-131-0/+2
| | | | | | | | To be sure we have the cyclic transfers already gone we set cdesc to NULL. It will prevent the double free. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: move residue to a descriptorAndy Shevchenko2016-04-132-21/+41
| | | | | | | | | Residue is a property of any active descriptor. So, any descriptor may be in different state but residue is a feature of active descriptor. Check if the asked descriptor is active and return proper residue value for it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: move dwc->initialized to dwc->flagsAndy Shevchenko2016-04-132-5/+5
| | | | | | | | We have already dedicated variable for flags, therefore no need to create an additional storage for that. Covert dwc->initialized to use dwc->flags. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: move dwc->paused to dwc->flagsAndy Shevchenko2016-04-132-8/+6
| | | | | | | | We have already dedicated variable for flags, therefore no need to create an additional storage for that. Convert dwc->paused to use dwc->flags. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: define counter variables as unsigned intAndy Shevchenko2016-04-131-5/+5
| | | | | | | | | | | | | | | | | | | | The code is fixed to satisfy a compiler otherwise we have drivers/dma/dw/core.c: In function ‘dwc_handle_cyclic’: drivers/dma/dw/core.c:568: warning: comparison between signed and unsigned drivers/dma/dw/core.c: In function ‘dw_dma_tasklet’: drivers/dma/dw/core.c:590: warning: comparison between signed and unsigned drivers/dma/dw/core.c: In function ‘dw_dma_off’: drivers/dma/dw/core.c:1103: warning: comparison between signed and unsigned drivers/dma/dw/core.c: In function ‘dw_dma_cyclic_free’: drivers/dma/dw/core.c:1469: warning: comparison between signed and unsigned drivers/dma/dw/core.c: In function ‘dw_dma_probe’: drivers/dma/dw/core.c:1574: warning: comparison between signed and unsigned There is no functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: substitute dma_read_byaddr by dma_readl_nativeAndy Shevchenko2016-04-132-9/+3
| | | | | | | | Since struct dw_dma is allocated and regs member is assigned properly we can use standard IO accessors to the DMA registers. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: clear LLP_[SD]_EN bits in last descriptor of a chainMans Rullgard2016-04-131-0/+2
| | | | | | | | | | The datasheet requires that the LLP_[SD]_EN bits be cleared whenever LLP.LOC is zero, i.e. in the last descriptor of a multi-block chain. Make the driver do this. Signed-off-by: Mans Rullgard <mans@mansr.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: set LMS field in descriptorsMans Rullgard2016-04-132-12/+18
| | | | | | | | | | The LMS field indicates from which master the descriptor is to be read. This patch assumes this is always the same as the memory side in a peripheral transfer which is true for all known systems. Signed-off-by: Mans Rullgard <mans@mansr.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: fix byte order of hw descriptor fieldsMans Rullgard2016-04-132-68/+87
| | | | | | | | | | | | | If the DMA controller uses a different byte order than the host CPU, the hardware linked list descriptor fields need to be byte-swapped. This patch makes the driver write these fields using the same byte order it uses for mmio accesses to the DMA engine. I do not know if this is guaranteed to always be correct. Signed-off-by: Mans Rullgard <mans@mansr.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: set src and dst master select according to xfer directionMans Rullgard2016-04-131-2/+6
| | | | | | | | | | | | | On some architectures the DMA controller can have two masters connected to different buses and thus access to memory is possible only through one and to peripheral through the other. This patch changes the src and dst master setting to match the direction of the transfer. Signed-off-by: Mans Rullgard <mans@mansr.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: rename masters to reflect actual topologyAndy Shevchenko2016-04-139-42/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | The source and destination masters are reflecting buses or their layers to where the different devices can be connected. The patch changes the master names to reflect which one is related to which independently on the transfer direction. The outcome of the change is that the memory data width is now always limited by a data width of the master which is dedicated to communicate to memory. The patch will not break anything since all current users have the same data width for all masters. Though it would be nice to revisit avr32 platforms to check what is the actual hardware topology in use there. It seems that it has one bus and two masters on it as stated by Table 8-2, that's why everything works independently on the master in use. The purpose of the sequential patch is to fix the driver for configuration of more than one bus. The change is done in the assumption that src_master and dst_master are reflecting a connection to the memory and peripheral correspondently on avr32 and otherwise on the rest. Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* dmaengine: dw: fix master selectionAndy Shevchenko2016-04-131-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 895005202987 ("dmaengine: dw: apply both HS interfaces and remove slave_id usage") cleaned up the code to avoid usage of depricated slave_id member of generic slave configuration. Meanwhile it broke the master selection by removing important call to dwc_set_masters() in ->device_alloc_chan_resources() which copied masters from custom slave configuration to the internal channel structure. Everything works until now since there is no customized connection of DesignWare DMA IP to the bus, i.e. one bus and one or more masters are in use. The configurations where 2 masters are connected to the different masters are not working anymore. We are expecting one user of such configuration and need to select masters properly. Besides that it is obviously a performance regression since only one master is in use in multi-master configuration. Select masters in accordance with what user asked for. Keep this patch in a form more suitable for back porting. We are safe to take necessary data in ->device_alloc_chan_resources() because we don't support generic slave configuration embedded into custom one, and thus the only way to provide such is to use the parameter to a filter function which is called exactly before channel resource allocation. While here, replase BUG_ON to less noisy dev_warn() and prevent channel allocation in case of error. Fixes: 895005202987 ("dmaengine: dw: apply both HS interfaces and remove slave_id usage") Cc: stable@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
* Linux 4.6-rc1v4.6-rc1Linus Torvalds2016-03-271-2/+2
|
* Merge branch 'for-linus' of ↵Linus Torvalds2016-03-2622-519/+811
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is quite a bit here, including some overdue refactoring and cleanup on the mon_client and osd_client code from Ilya, scattered writeback support for CephFS and a pile of bug fixes from Zheng, and a few random cleanups and fixes from others" [ I already decided not to pull this because of it having been rebased recently, but ended up changing my mind after all. Next time I'll really hold people to it. Oh well. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits) libceph: use KMEM_CACHE macro ceph: use kmem_cache_zalloc rbd: use KMEM_CACHE macro ceph: use lookup request to revalidate dentry ceph: kill ceph_get_dentry_parent_inode() ceph: fix security xattr deadlock ceph: don't request vxattrs from MDS ceph: fix mounting same fs multiple times ceph: remove unnecessary NULL check ceph: avoid updating directory inode's i_size accidentally ceph: fix race during filling readdir cache libceph: use sizeof_footer() more ceph: kill ceph_empty_snapc ceph: fix a wrong comparison ceph: replace CURRENT_TIME by current_fs_time() ceph: scattered page writeback libceph: add helper that duplicates last extent operation libceph: enable large, variable-sized OSD requests libceph: osdc->req_mempool should be backed by a slab pool libceph: make r_request msg_size calculation clearer ...
| * libceph: use KMEM_CACHE macroGeliang Tang2016-03-251-8/+2
| | | | | | | | | | | | | | Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: use kmem_cache_zallocGeliang Tang2016-03-252-2/+2
| | | | | | | | | | | | | | Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: use KMEM_CACHE macroGeliang Tang2016-03-251-8/+2
| | | | | | | | | | | | | | Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: use lookup request to revalidate dentryYan, Zheng2016-03-252-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If dentry has no lease, ceph_d_revalidate() previously return 0. This causes VFS to invalidate the dentry and create a new dentry for later lookup. Invalidating a dentry also detach any underneath mount points. So mount point inside cephfs can disapear mystically (even the mount point is not modified by other hosts). The fix is using lookup request to revalidate dentry without lease. This can partly solve the mount points disapear issue (as long as the mount point is not modified by other hosts) Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: kill ceph_get_dentry_parent_inode()Yan, Zheng2016-03-252-20/+5
| | | | | | | | | | | | use vfs helper dget_parent() instead Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix security xattr deadlockYan, Zheng2016-03-258-11/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't request vxattrs from MDSYan, Zheng2016-03-251-2/+4
| | | | | | | | | | | | It's uselese because MDS reply does not carry any vxattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix mounting same fs multiple timesYan, Zheng2016-03-251-18/+15
| | | | | | | | | | | | | | Now __ceph_open_session() only accepts closed client. An opened client will tigger BUG_ON(). Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: remove unnecessary NULL checkYan, Zheng2016-03-251-2/+2
| | | | | | | | | | | | | | | | If page->mapping is NULL, releasepage() callback does not get called. Remove the unnecessary NULL check to make static code analysis tool happy Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: avoid updating directory inode's i_size accidentallyYan, Zheng2016-03-251-0/+4
| | | | | | | | | | | | Directory inode's i_size is used by readdir cache. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix race during filling readdir cacheYan, Zheng2016-03-251-2/+7
| | | | | | | | | | | | | | | | | | Readdir cache uses page cache to save dentry pointers. When adding dentry pointers to middle of a page, we need to make sure the page already exists. Otherwise the beginning part of the page will be invalid pointers. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: use sizeof_footer() moreIlya Dryomov2016-03-251-16/+3
| | | | | | | | | | | | | | | | | | Don't open-code sizeof_footer() in read_partial_message() and ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now possible to use con_out_kvec_add() in prepare_write_message_footer(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
| * ceph: kill ceph_empty_snapcIlya Dryomov2016-03-254-34/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_empty_snapc->num_snaps == 0 at all times. Passing such a snapc to ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only for sizing the request message. Further, in all four cases the subsequent ceph_osdc_build_request() is passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps and making ceph_empty_snapc entirely useless. The two cases where it actually mattered were removed in commits 860560904962 ("ceph: avoid sending unnessesary FLUSHSNAP message") and 23078637e054 ("ceph: fix queuing inode to mdsdir's snaprealm"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix a wrong comparisonAnton Protopopov2016-03-251-1/+1
| | | | | | | | | | | | | | | | A negative value rc compared to the positive value ENOENT in the finish_read() function. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: replace CURRENT_TIME by current_fs_time()Deepa Dinamani2016-03-254-6/+6
| | | | | | | | | | | | | | | | | | CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: scattered page writebackYan, Zheng2016-03-251-109/+196
| | | | | | | | | | | | | | | | | | | | This patch makes ceph_writepages_start() try using single OSD request to write all dirty pages within a strip unit. When a nonconsecutive dirty page is found, ceph_writepages_start() tries starting a new write operation to existing OSD request. If it succeeds, it uses the new operation to writeback the dirty page. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: add helper that duplicates last extent operationYan, Zheng2016-03-252-0/+24
| | | | | | | | | | | | | | | | | | | | This helper duplicates last extent operation in OSD request, then adjusts the new extent operation's offset and length. The helper is for scatterd page writeback, which adds nonconsecutive dirty pages to single OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: enable large, variable-sized OSD requestsIlya Dryomov2016-03-253-19/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: osdc->req_mempool should be backed by a slab poolIlya Dryomov2016-03-251-2/+2
| | | | | | | | | | | | | | | | ceph_osd_request_cache was introduced a long time ago. Also, osd_req is about to get a flexible array member, which ceph_osd_request_cache is going to be aware of. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: make r_request msg_size calculation clearerIlya Dryomov2016-03-251-10/+11
| | | | | | | | | | | | | | | | | | | | Although msg_size is calculated correctly, the terms are grouped in a misleading way - snaps appears to not have room for a u32 length. Move calculation closer to its use and regroup terms. No functional change. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: move r_reply_op_{len,result} into struct ceph_osd_req_opYan, Zheng2016-03-253-5/+6
| | | | | | | | | | | | | | | | This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: rename ceph_osd_req_op::payload_len to indata_lenIlya Dryomov2016-03-252-7/+7
| | | | | | | | | | | | | | Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: remove useless BUG_ONYan, Zheng2016-03-251-2/+0
| | | | | | | | | | | | ceph_osdc_start_request() never return -EOLDSNAP Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't enable rbytes mount option by defaultYan, Zheng2016-03-252-4/+3
| | | | | | | | | | | | | | | | When rbytes mount option is enabled, directory size is recursive size. Recursive size is not updated instantly. This can cause directory size to change between successive stat(1) Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: encode ctime in cap messageYan, Zheng2016-03-251-4/+7
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: behave in mon_fault() if cur_mon < 0Ilya Dryomov2016-03-251-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | This can happen if __close_session() in ceph_monc_stop() races with a connection reset. We need to ignore such faults, otherwise it's likely we would take !hunting, call __schedule_delayed() and end up with delayed_work() executing on invalid memory, among other things. The (two!) con->private tests are useless, as nothing ever clears con->private. Nuke them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: reschedule tick in mon_fault()Ilya Dryomov2016-03-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Doing __schedule_delayed() in the hunting branch is pointless, as the tick will have already been scheduled by then. What we need to do instead is *reschedule* it in the !hunting branch, after reopen_session() changes hunt_mult, which affects the delay. This helps with spacing out connection attempts and avoiding things like two back-to-back attempts followed by a longer period of waiting around. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: introduce and switch to reopen_session()Ilya Dryomov2016-03-251-17/+16
| | | | | | | | | | | | | | | | hunting is now set in __open_session() and cleared in finish_hunting(), instead of all around. The "session lost" message is printed not only on connection resets, but also on keepalive timeouts. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph: monc hunt rate is 3s with backoff up to 30sIlya Dryomov2016-03-253-9/+22
| | | | | | | | | | | | | | | | | | | | | | Unless we are in the process of setting up a client (i.e. connecting to the monitor cluster for the first time), apply a backoff: every time we want to reopen a session, increase our timeout by a multiple (currently 2); when we complete the connection, reduce that multipler by 50%. Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>