summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cgroup: rename ->create/post_create/pre_destroy/destroy() to ↵Tejun Heo2012-11-1913-121/+132
| | | | | | | | | | ->css_alloc/online/offline/free() Rename cgroup_subsys css lifetime related callbacks to better describe what their roles are. Also, update documentation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: allow ->post_create() to failTejun Heo2012-11-193-12/+23
| | | | | | | | | | | There could be cases where controllers want to do initialization operations which may fail from ->post_create(). This patch makes ->post_create() return -errno to indicate failure and online_css() relay such failures. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Glauber Costa <glommer@parallels.com>
* cgroup: update cgroup_create() failure pathTejun Heo2012-11-191-7/+14
| | | | | | | | | | | | cgroup_create() was ignoring failure of cgroupfs files. Update it such that, if file creation fails, it rolls back by calling cgroup_destroy_locked() and returns failure. Note that error out goto labels are renamed. The labels are a bit confusing but will become better w/ later cgroup operation renames. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: use mutex_trylock() when grabbing i_mutex of a new cgroup directoryTejun Heo2012-11-191-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | All cgroup directory i_mutexes nest outside cgroup_mutex; however, new directory creation is a special case. A new cgroup directory is created while holding cgroup_mutex. Populating the new directory requires both the new directory's i_mutex and cgroup_mutex. Because all directory i_mutexes nest outside cgroup_mutex, grabbing both requires releasing cgroup_mutex first, which isn't a good idea as the new cgroup isn't yet ready to be manipulated by other cgroup opreations. This is worked around by grabbing the new directory's i_mutex while holding cgroup_mutex before making it visible. As there's no other user at that point, grabbing the i_mutex under cgroup_mutex can't lead to deadlock. cgroup_create_file() was using I_MUTEX_CHILD to tell lockdep not to worry about the reverse locking order; however, this creates pseudo locking dependency cgroup_mutex -> I_MUTEX_CHILD, which isn't true - all directory i_mutexes are still nested outside cgroup_mutex. This pseudo locking dependency can lead to spurious lockdep warnings. Use mutex_trylock() instead. This will always succeed and lockdep doesn't create any locking dependency for it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: simplify cgroup_load_subsys() failure pathTejun Heo2012-11-191-11/+10
| | | | | | | | | | | | | | | | Now that cgroup_unload_subsys() can tell whether the root css is online or not, we can safely call cgroup_unload_subsys() after idr init failure in cgroup_load_subsys(). Replace the manual unrolling and invoke cgroup_unload_subsys() on failure. This drops cgroup_mutex inbetween but should be safe as the subsystem will fail try_module_get() and thus can't be mounted inbetween. As this means that cgroup_unload_subsys() can be called before css_sets are rehashed, remove BUG_ON() on %NULL css_set->subsys[] from cgroup_unload_subsys(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: introduce CSS_ONLINE flag and on/offline_css() helpersTejun Heo2012-11-192-23/+43
| | | | | | | | | | | | | | | New helpers on/offline_css() respectively wrap ->post_create() and ->pre_destroy() invocations. online_css() sets CSS_ONLINE after ->post_create() is complete and offline_css() invokes ->pre_destroy() iff CSS_ONLINE is set and clears it while also handling the temporary dropping of cgroup_mutex. This patch doesn't introduce any behavior change at the moment but will be used to improve cgroup_create() failure path and allow ->post_create() to fail. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: separate out cgroup_destroy_locked()Tejun Heo2012-11-191-15/+25
| | | | | | | | | | | | | Separate out cgroup_destroy_locked() from cgroup_destroy(). This will be later used in cgroup_create() failure path. While at it, add lockdep asserts on i_mutex and cgroup_mutex, and move @d and @parent assignments to their declarations. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: fix harmless bugs in cgroup_load_subsys() fail path and ↵Tejun Heo2012-11-191-1/+14
| | | | | | | | | | | | | | | | | cgroup_unload_subsys() * If idr init fails, cgroup_load_subsys() cleared dummytop->subsys[] before calilng ->destroy() making CSS inaccessible to the callback, and didn't unlink ss->sibling. As no modular controller uses ->use_id, this doesn't cause any actual problems. * cgroup_unload_subsys() was forgetting to free idr, call ->pre_destroy() and clear ->active. As there currently is no modular controller which uses ->use_id, ->pre_destroy() or ->active, this doesn't cause any actual problems. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: lock cgroup_mutex in cgroup_init_subsys()Tejun Heo2012-11-191-0/+4
| | | | | | | | | | | Make cgroup_init_subsys() grab cgroup_mutex while initializing a subsystem so that all helpers and callbacks are called under the context they expect. This isn't strictly necessary as cgroup_init_subsys() doesn't race with anybody but will allow adding lockdep assertions. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: trivial cleanup for cgroup_init/load_subsys()Tejun Heo2012-11-191-3/+3
| | | | | | | | Consistently use @css and @dummytop in these two functions instead of referring to them indirectly. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: make CSS_* flags bit masks instead of bit positionsTejun Heo2012-11-192-5/+5
| | | | | | | | | | | Currently, CSS_* flags are defined as bit positions and manipulated using atomic bitops. There's no reason to use atomic bitops for them and bit positions are clunkier to deal with than bit masks. Make CSS_* bit masks instead and use the usual C bitwise operators to access them. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: cgroup->dentry isn't a RCU pointerTejun Heo2012-11-192-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | cgroup->dentry is marked and used as a RCU pointer; however, it isn't one - the final dentry put doesn't go through call_rcu(). cgroup and dentry share the same RCU freeing rule via synchronize_rcu() in cgroup_diput() (kfree_rcu() used on cgrp is unnecessary). If cgrp is accessible under RCU read lock, so is its dentry and dereferencing cgrp->dentry doesn't need any further RCU protection or annotation. While not being accurate, before the previous patch, the RCU accessors served a purpose as memory barriers - cgroup->dentry used to be assigned after the cgroup was made visible to cgroup_path(), so the assignment and dereferencing in cgroup_path() needed the memory barrier pair. Now that list_add_tail_rcu() happens after cgroup->dentry is assigned, this no longer is necessary. Remove the now unnecessary and misleading RCU annotations from cgroup->dentry. To make up for the removal of rcu_dereference_check() in cgroup_path(), add an explicit rcu_lockdep_assert(), which asserts the dereference rule of @cgrp, not cgrp->dentry. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: create directory before linking while creating a new cgroupTejun Heo2012-11-191-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | While creating a new cgroup, cgroup_create() links the newly allocated cgroup into various places before trying to create its directory. Because cgroup life-cycle is tied to the vfs objects, this makes it impossible to use cgroup_rmdir() for rolling back creation - the removal logic depends on having full vfs objects. This patch moves directory creation above linking and collect linking operations to one place. This allows directory creation failure to share error exit path with css allocation failures and any failure sites afterwards (to be added later) can use cgroup_rmdir() logic to undo creation. Note that this also makes the memory barriers around cgroup->dentry, which currently is misleadingly using RCU operations, unnecessary. This will be handled in the next patch. While at it, locking BUG_ON() on i_mutex is converted to lockdep_assert_held(). v2: Patch originally removed %NULL dentry check in cgroup_path(); however, Li pointed out that this patch doesn't make it unnecessary as ->create() may call cgroup_path(). Drop the change for now. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: open-code cgroup_create_dir()Tejun Heo2012-11-191-25/+5
| | | | | | | | | | | | The operation order of cgroup creation is about to change and cgroup_create_dir() is more of a hindrance than a proper abstraction. Open-code it by moving the parent nlink adjustment next to self nlink adjustment in cgroup_create_file() and the rest to cgroup_create(). This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: initialize cgrp->allcg_node in init_cgroup_housekeeping()Tejun Heo2012-11-191-0/+1
| | | | | | | | Not strictly necessary but it's annoying to have uninitialized list_head around. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove incorrect dget/dput() pair in cgroup_create_dir()Tejun Heo2012-11-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup_create_dir() does weird dancing with dentry refcnt. On success, it gets and then puts it achieving nothing. On failure, it puts but there isn't no matching get anywhere leading to the following oops if cgroup_create_file() fails for whatever reason. ------------[ cut here ]------------ kernel BUG at /work/os/work/fs/dcache.c:552! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU 2 Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs RIP: 0010:[<ffffffff811d9c0c>] [<ffffffff811d9c0c>] dput+0x1dc/0x1e0 RSP: 0018:ffff88001a3ebef8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403 RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58 RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001 R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60 FS: 00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080) Stack: ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000 ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8 ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8 Call Trace: [<ffffffff811cc889>] done_path_create+0x19/0x50 [<ffffffff811d1fc9>] sys_mkdirat+0x59/0x80 [<ffffffff811d2009>] sys_mkdir+0x19/0x20 [<ffffffff81be1e02>] system_call_fastpath+0x16/0x1b Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 RIP [<ffffffff811d9c0c>] dput+0x1dc/0x1e0 RSP <ffff88001a3ebef8> ---[ end trace 1277bcfd9561ddb0 ]--- Fix it by dropping the unnecessary dget/dput() pair. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org
* cgroup_freezer: implement proper hierarchy supportTejun Heo2012-11-092-59/+165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, cgroup_freezer didn't implement hierarchy properly. cgroups could be arranged in hierarchy but it didn't make any difference in how each cgroup_freezer behaved. They all operated separately. This patch implements proper hierarchy support. If a cgroup is frozen, all its descendants are frozen. A cgroup is thawed iff it and all its ancestors are THAWED. freezer.self_freezing shows the current freezing state for the cgroup itself. freezer.parent_freezing shows whether the cgroup is freezing because any of its ancestors is freezing. freezer_post_create() locks the parent and new cgroup and inherits the parent's state and freezer_change_state() applies new state top-down using cgroup_for_each_descendant_pre() which guarantees that no child can escape its parent's state. update_if_frozen() uses cgroup_for_each_descendant_post() to propagate frozen states bottom-up. Synchronization could be coarser and easier by using a single mutex to protect all hierarchy operations. Finer grained approach was used because it wasn't too difficult for cgroup_freezer and I think it's beneficial to have an example implementation and cgroup_freezer is rather simple and can serve a good one. As this makes cgroup_freezer properly hierarchical, freezer_subsys.broken_hierarchy marking is removed. Note that this patch changes userland visible behavior - freezing a cgroup now freezes all its descendants too. This behavior change is intended and has been warned via .broken_hierarchy. v2: Michal spotted a bug in freezer_change_state() - descendants were inheriting from the wrong ancestor. Fixed. v3: Documentation/cgroups/freezer-subsystem.txt updated. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: add ->post_create() and ->pre_destroy() and track online stateTejun Heo2012-11-091-2/+40
| | | | | | | | | | | | | | | | | | | | | | A cgroup is online and visible to iteration between ->post_create() and ->pre_destroy(). This patch introduces CGROUP_FREEZER_ONLINE and toggles it from the newly added freezer_post_create() and freezer_pre_destroy() while holding freezer->lock such that a cgroup_freezer can be reilably distinguished to be online. This will be used by full hierarchy support. ONLINE test is added to freezer_apply_state() but it currently doesn't make any difference as freezer_write() can only be called for an online cgroup. Adjusting system_freezing_cnt on destruction is moved from freezer_destroy() to the new freezer_pre_destroy() for consistency. This patch doesn't introduce any noticeable behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]Tejun Heo2012-11-091-8/+47
| | | | | | | | | | | | | | | | Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of the two flags. This is to prepare for full hierarchy support. freezer_apply_date() is updated such that it can handle setting and clearing of both flags. The two flags are also exposed to userland via read-only files self_freezing and parent_freezing. Other than the added cgroupfs files, this patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: make freezer->state mask of flagsTejun Heo2012-11-091-33/+27
| | | | | | | | | | | | | | | | freezer->state was an enum value - one of THAWED, FREEZING and FROZEN. As the scheduled full hierarchy support requires more than one freezing condition, switch it to mask of flags. If FREEZING is not set, it's thawed. FREEZING is set if freezing or frozen. If frozen, both FREEZING and FROZEN are set. Now that tasks can be attached to an already frozen cgroup, this also makes freezing condition checks more natural. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: prepare freezer_change_state() for full hierarchy supportTejun Heo2012-11-091-18/+30
| | | | | | | | | | | | | | | * Make freezer_change_state() take bool @freeze instead of enum freezer_state. * Separate out freezer_apply_state() out of freezer_change_state(). This makes freezer_change_state() a rather silly thin wrapper. It will be filled with hierarchy handling later on. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: trivial cleanupsTejun Heo2012-11-091-22/+19
| | | | | | | | | | | | * Clean-up indentation and line-breaks. Drop the invalid comment about freezer->lock. * Make all internal functions take @freezer instead of both @cgroup and @freezer. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup: implement generic child / descendant walk macrosTejun Heo2012-11-092-0/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cgroup doesn't provide any generic helper for walking a given cgroup's children or descendants. This patch adds the following three macros. * cgroup_for_each_child() - walk immediate children of a cgroup. * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup in pre-order tree traversal. * cgroup_for_each_descendant_post() - visit all descendants of a cgroup in post-order tree traversal. All three only require the user to hold RCU read lock during traversal. Verifying that each iterated cgroup is online is the responsibility of the user. When used with proper synchronization, cgroup_for_each_descendant_pre() can be used to propagate state updates to descendants in reliable way. See comments for details. v2: s/config/state/ in commit message and comments per Michal. More documentation on synchronization rules. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujisu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: use rculist ops for cgroup->childrenTejun Heo2012-11-092-5/+4
| | | | | | | | | | | | | | | Use RCU safe list operations for cgroup->children. This will be used to implement cgroup children / descendant walking which can be used by controllers. Note that cgroup_create() now puts a new cgroup at the end of the ->children list instead of head. This isn't strictly necessary but is done so that the iteration order is more conventional. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: add cgroup_subsys->post_create()Tejun Heo2012-11-092-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there's no way for a controller to find out whether a new cgroup finished all ->create() allocatinos successfully and is considered "live" by cgroup. This becomes a problem later when we add generic descendants walking to cgroup which can be used by controllers as controllers don't have a synchronization point where it can synchronize against new cgroups appearing in such walks. This patch adds ->post_create(). It's called after all ->create() succeeded and the cgroup is linked into the generic cgroup hierarchy. This plays the counterpart of ->pre_destroy(). When used in combination with the to-be-added generic descendant iterators, ->post_create() can be used to implement reliable state inheritance. It will be explained with the descendant iterators. v2: Added a paragraph about its future use w/ descendant iterators per Michal. v3: Forgot to add ->post_create() invocation to cgroup_load_subsys(). Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Glauber Costa <glommer@parallels.com>
* cgroup: set 'start' with the right value in cgroup_path.Tao Ma2012-11-081-2/+2
| | | | | | | | | 'start' is set to buf + buflen and do the '--' immediately. Just set it to 'buf + buflen - 1' directly. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com>
* device_cgroup: add lockdep assertsTejun Heo2012-11-061-0/+12
| | | | | | | | | | | | device_cgroup uses RCU safe ->exceptions list which is write-protected by devcgroup_mutex and has had some issues using locking correctly. Add lockdep asserts to utility functions so that future errors can be easily detected. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
* Merge branch 'cgroup/for-3.7-fixes' into cgroup/for-3.8Tejun Heo2012-11-061187-12717/+16301
|\ | | | | | | | | | | | | This is to receive device_cgroup fixes so that further device_cgroup changes can be made in cgroup/for-3.8. Signed-off-by: Tejun Heo <tj@kernel.org>
| * device_cgroup: fix RCU usageTejun Heo2012-11-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_cgroup->exceptions is protected with devcgroup_mutex for writes and RCU for reads; however, RCU usage isn't correct. * dev_exception_clean() doesn't use RCU variant of list_del() and kfree(). The function can race with may_access() and may_access() may end up dereferencing already freed memory. Use list_del_rcu() and kfree_rcu() instead. * may_access() may be called only with RCU read locked but doesn't use RCU safe traversal over ->exceptions. Use list_for_each_entry_rcu(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: stable@vger.kernel.org Cc: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
| * device_cgroup: fix unchecked cgroup parent usageAristeu Rozanski2012-11-061-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 4cef7299b478687 ("device_cgroup: add proper checking when changing default behavior") the cgroup parent usage is unchecked. root will not have a parent and trying to use device.{allow,deny} will cause problems. For some reason my stressing scripts didn't test the root directory so I didn't catch it on my regular tests. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
| * Linux 3.7-rc4v3.7-rc4Linus Torvalds2012-11-041-1/+1
| |
| * Merge tag 'nfs-for-3.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2012-11-0311-35/+110
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: - Fix a bunch of deadlock situations: * State recovery can deadlock if we fail to release sequence ids before scheduling the recovery thread. * Calling deactivate_super() from an RPC workqueue thread can deadlock because of the call to rpc_shutdown_client. - Display the device name correctly in /proc/*/mounts - Fix a number of incorrect error return values: * When NFSv3 mounts fail due to a timeout. * On NFSv4.1 backchannel setup failure * On NFSv4 open access checks - pnfs_find_alloc_layout() must check the layout pointer for NULL - Fix a regression in the legacy DNS resolved * tag 'nfs-for-3.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS4: nfs4_opendata_access should return errno NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly SUNRPC: return proper errno from backchannel_rqst NFS: add nfs_sb_deactive_async to avoid deadlock nfs: Show original device name verbatim in /proc/*/mount{s,info} nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts nfs: Check whether a layout pointer is NULL before free it NFS: fix bug in legacy DNS resolver. NFSv4: nfs4_locku_done must release the sequence id NFSv4.1: We must release the sequence id when we fail to get a session slot NFS: Wait for session recovery to finish before returning
| | * NFS4: nfs4_opendata_access should return errnoWeston Andros Adamson2012-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctlyTrond Myklebust2012-11-011-1/+1
| | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * SUNRPC: return proper errno from backchannel_rqstWeston Andros Adamson2012-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The one and only caller (in fs/nfs/nfs4client.c) uses the result as an errno and would have interpreted an error as EPERM. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: add nfs_sb_deactive_async to avoid deadlockWeston Andros Adamson2012-10-315-3/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue context. This avoids a deadlock where rpc_shutdown_client loops forever in a workqueue kworker context, trying to kill all RPC tasks associated with the client, while one or more of these tasks have already been assigned to the same kworker (and will never run rpc_exit_task). This approach is needed because RPC tasks that have already been assigned to a kworker by queue_work cannot be canceled, as explained in the comment for workqueue.c:insert_wq_barrier. Signed-off-by: Weston Andros Adamson <dros@netapp.com> [Trond: add module_get/put.] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * nfs: Show original device name verbatim in /proc/*/mount{s,info}Ben Hutchings2012-10-314-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit c7f404b ('vfs: new superblock methods to override /proc/*/mount{s,info}'), nfs_path() is used to generate the mounted device name reported back to userland. nfs_path() always generates a trailing slash when the given dentry is the root of an NFS mount, but userland may expect the original device name to be returned verbatim (as it used to be). Make this canonicalisation optional and change the callers accordingly. [jrnieder@gmail.com: use flag instead of bool argument] Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu> Reference: http://bugs.debian.org/669314 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: <stable@vger.kernel.org> # v2.6.39+ Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeoutsScott Mayhew2012-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In very busy v3 environment, rpc.mountd can respond to the NULL procedure but not the MNT procedure in a timely manner causing the MNT procedure to time out. The problem is the mount system call returns EIO which causes the mount to fail, instead of ETIMEDOUT, which would cause the mount to be retried. This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to the rpc_call_sync() call in nfs_mount() which causes ETIMEDOUT to be returned on timed out connections. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
| | * nfs: Check whether a layout pointer is NULL before free itYanchuan Nian2012-10-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The new layout pointer in pnfs_find_alloc_layout() may be NULL because of out of memory. we must do some check work, otherwise pnfs_free_layout_hdr() will go wrong because it can not deal with a NULL pointer. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFS: fix bug in legacy DNS resolver.NeilBrown2012-10-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DNS resolver's use of the sunrpc cache involves a 'ttl' number (relative) rather that a timeout (absolute). This confused me when I wrote commit c5b29f885afe890f953f7f23424045cdad31d3e4 "sunrpc: use seconds since boot in expiry cache" and I managed to break it. The effect is that any TTL is interpreted as 0, and nothing useful gets into the cache. This patch removes the use of get_expiry() - which really expects an expiry time - and uses get_uint() instead, treating the int correctly as a ttl. This fixes a regression that has been present since 2.6.37, causing certain NFS accesses in certain environments to incorrectly fail. Reported-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| | * NFSv4: nfs4_locku_done must release the sequence idTrond Myklebust2012-10-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | If the state recovery machinery is triggered by the call to nfs4_async_handle_error() then we can deadlock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
| | * NFSv4.1: We must release the sequence id when we fail to get a session slotTrond Myklebust2012-10-311-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | If we do not release the sequence id in cases where we fail to get a session slot, then we can deadlock if we hit a recovery scenario. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
| | * NFS: Wait for session recovery to finish before returningBryan Schumaker2012-10-311-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we will schedule session recovery and then return to the caller of nfs4_handle_exception. This works for most cases, but causes a hang on the following test case: Client Server ------ ------ Open file over NFS v4.1 Write to file Expire client Try to lock file The server will return NFS4ERR_BADSESSION, prompting the client to schedule recovery. However, the client will continue placing lock attempts and the open recovery never seems to be scheduled. The simplest solution is to wait for session recovery to run before retrying the lock. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
| * | Merge branch 'release' of ↵Linus Torvalds2012-11-033-6/+9
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management & ACPI update from Zhang Rui, Ho humm. Normally these things go through Len. But it's just three small fixes, I guess I can pull directly too. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: exynos4_tmu_driver_ids should be exynos_tmu_driver_ids. ACPI video: Ignore errors after _DOD evaluation. thermal: solve compilation errors in rcar_thermal
| | * | exynos4_tmu_driver_ids should be exynos_tmu_driver_ids.Jonghwan Choi2012-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com> Reviewed-by: Amit Daniel Kachhap <amit.kachhap@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
| | * | ACPI video: Ignore errors after _DOD evaluation.Igor Murzov2012-11-031-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are systems where video module known to work fine regardless of broken _DOD and ignoring returned value here doesn't cause any issues later. This should fix brightness controls on some laptops. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47861 Signed-off-by: Igor Murzov <e-mail@date.by> Reviewed-by: Sergey V <sftp.mtuci@gmail.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
| | * | thermal: solve compilation errors in rcar_thermalDevendra Naga2012-11-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | following were the errors reported drivers/thermal/rcar_thermal.c: In function ‘rcar_thermal_probe’: drivers/thermal/rcar_thermal.c:214:10: warning: passing argument 3 of ‘thermal_zone_device_register’ makes integer from pointer without a cast [enabled by default] include/linux/thermal.h:166:29: note: expected ‘int’ but argument is of type ‘struct rcar_thermal_priv *’ drivers/thermal/rcar_thermal.c:214:10: error: too few arguments to function ‘thermal_zone_device_register’ include/linux/thermal.h:166:29: note: declared here make[1]: *** [drivers/thermal/rcar_thermal.o] Error 1 make: *** [drivers/thermal/rcar_thermal.o] Error 2 with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) Signed-off-by: Devendra Naga <develkernel412222@gmail.com> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
| * | | Merge branch 'i2c-embedded/for-current' of ↵Linus Torvalds2012-11-033-175/+22
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.pengutronix.de/git/wsa/linux Pull i2c embedded fixes from Wolfram Sang: "Two patches are usual stuff. The bigger patch is needed to correct a wrong decision made in this merge window. We hoped to get the PIOQUEUE mode in the mxs driver working with DMA, but it turned out to be too broken (leading to data loss), so we now think it is best to remove it entirely and work only with DMA now. The patch should be in 3.7. IMO, so users never get the chance to use both modes in parallel." * 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux: i2c: tegra: set irq name as device name i2c-nomadik: Fixup clock handling i2c: mxs: remove broken PIOQUEUE support
| | * | | i2c: tegra: set irq name as device nameLaxman Dewangan2012-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When watching the irqs name of tegra i2c, all instances irq name shows as tegra_i2c. Passing the device name properly to have the irq names with instance like tegra-i2c.0, tegra-i2c.1 etc. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
| | * | | i2c-nomadik: Fixup clock handlingPhilippe Begnic2012-11-021-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to clk_prepare as well as clk_enable. Signed-off-by: Philippe Begnic <philippe.begnic@stericsson.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>